Where Is Semantic Search?

booleanstrings Boolean

semantic

I was preparing a “Boolean for everyone” presentation and went over general concepts of searching in databases and search engines. This made me thinking – why is it that the strongest semantic search we are seeing is in Google (where it’s very hard to implement because of the volume of data and diversity of web pages) and not in databases (where the data is structured, and its size is so much smaller)? The most “advanced” interpretation databases do, is convert VP to “vice president”; none I know of would search for ‘software developer’ if we ask to search for ‘software engineer.’

Let’s take a look at a simple search for a professional. I will try to find Software Engineers who write in LISP. This language is pretty obscure and has its fans (with some of whom I had good luck to work in my previous career). People who write in JavaScript and took LISP in college are not the ones I want to find.

Searching semantically would mean that the system understands the searcher’s intent. Let’s compare a search on LinkedIn and X-Ray LinkedIn on Google.

(LinkedIn) Current title=software engineer; keywords=lisp

The top result (for me) is a profile that says: “Over 5+ experience in the areas of Software Development, Design & Analysis of GIS Applications and Customization of CAD applications using C#, VB.Net , ASP.NET, VB6, Visual Lisp, VBA, Object ARX .Net & C++ and Arc Objects.”

It’s the wrong result for many reasons. “Visual Lisp” is not LISP at all in the sense I had meant; it’s a variation of LISP used in AutoCAD, to “program” geometry (forms and shapes of objects to make). The person uses a ton of other programming languages. And, the job title “Software Engineer” is “present” only because he forgot to put the end date on the last job; he currently is a “GIS Analyst.”

The second result is a “shallow” profile that has LISP among the skills; however, the person lists a certification for “Java SE 6” and did mobile development until recently – this was likely not done in LISP. (A quick search shows she’s the only one at her company with LISP on the profile.) Wrong result.

Let’s switch to Google. X-Raying is tricky. We have less control over results since we can only use keywords; unlike LinkedIn or another database that has fields such as “job title”, Google deals with “just” pages.

(Google)site:linkedin.com/in software engineer lisp

The first profile belongs to a person who calls himself a “Code Gardener.” His profile not only has LISP, but also lists LISP “dialects” – it says: “Development in Common Lisp, Clojure, LFE (Lisp Flavoured Erlang), Scheme, and Shen.”  Very relevant!

The second result is all right; it also lists LISP and its dialects.

The third result is a profile of a big-time LISP fan and expert. He wrote in LISP at every one of his jobs and even “developed Lisp Machines.” Very relevant.

Further Google results also start showing synonyms found instead of the entered keywords, for example, it finds “software developer” (a synonym for “software engineer”).

How come databases (LinkedIn, many resume job boards, and people aggregators, as some examples) don’t automatically offer results that reflect some query understanding? (To be fair, Monster.com has implemented some semantic search features; it has been a while since I tried that search.) “Understanding” queries – at least simple ones – should be doable, especially by systems like LinkedIn, that have tons of data, from which the system can “learn”. They have long lists of similar job titles, related skills, etc., plus they get data on search results relevance from tracking users’ behavior. I am not suggesting that a database would auto-transform an entered query to a very long Boolean OR string – but rather, just show me what I might like to see as results.

Slightly semantic search interpretation in a database search is also not as hard to implement as matching profiles to job descriptions (which many claim to do and nobody does well – understandably since it is very hard.)

I guess it’s a question to the tool developers (those who’d read the blog post). Let’s see what they have to say.