More on OR: the Google Boolean Dilemma

booleanstringsBoolean, Google 5 Comments

 

Anthropology and Psychology study human behavior. Ethology is the study of animal behavior. As a Sourcer, I am finding myself more and more studying software behavior. (Is there a term for that?)

Studying a software application may sound odd. Isn’t writing software also called “programming” which means that its behavior is fully predictable?

There are two factors related to studying software behavior:

  1. We usually don’t have access to the code, and documentation covers its behavior only partially.
  2. If a user is interacting with the system, the user’s input triggers various code behaviors. Trying various inputs, we can derive, with some confidence, what a particular application does.

Sure enough, there’s code complexity, bugs, and other factors interfering with our study, but the knowledge we gain is worth the effort. It provides invaluable practical tips on using software.

This post is a result of studying Google search behavior. Here is a hypothesis on how Google search responds when we use OR statements.

It’s well known that Google has semantic search capabilities. As part of that, for every keyword in the search string, Google would also search for variations of the word (called stemming: manager/management) and synonyms (developer/programmer).

However, as our study shows, for keywords in an OR expression, Google stops looking for variations or synonyms. It searches for the exact expression as if we put the word in the quotation marks (meaning “no variations”).

Take a look:

Do you see what is happening? When a term is part of an OR statement, it is used exactly, with no variations or synonyms (search on the left). With no OR used, we see variations in the results (search on the right).

When I source, I prefer to let Google bring suggestions. I almost never user OR statements, leaving Google’s semantic power at work. For somebody who can sit down and list every possible synonym and variation you would want to see, the OR statement is for you.

Conclusion

Do not use OR statements for synonyms and similar terms; Google will bring in additional relevant results, and your job would be just to collect them.

If you have in mind variations of a term to search for, that you want to make sure are included, you can still search several times using each of the terms separately.

If you feel you need to be in tight control of the terms to use, or if you have a longer list of terms that are not synonyms (for example, names of target companies), use an OR statement.

We will soon be offering a fully updated webinar Boolean Basics; if you are interested, keep an eye on our schedule >>> https://sourcingcertification.com/upcomingwebinars/

Comments 5

    1. Post
      Author

      Yes, of course. (There is also a related term “reverse engineering”.) I was, half-jokingly, asking for a term meaning – studying what software does. You can say the software is a black box in this case.

  1. Irina,

    The OR has lost part of its interest since Google has improved its algorithm synonymy capacity by using machine learning (ML) built natural language processing (NLP). Such capacity can be seen in the Word2vec software which Google made open source some 2 years ago.

    Your post makes a good point. Although being a librarian researcher, I myself very rarely use the OR boolean operator in Google any more.

    *But* there are instances where Google overdoes the synonym thing. Or gets it slightly wrong, depending on the researcher’s needs.

    For some tough questions in specialised areas such as law, you might find some use for the OR. Rarely, I agree. But like or not, law is in its larger part a matter of words, so their choice is of the utmost importance.

    For instance, in French law, the notion of being held liable for damages is called “responsabilité civile”. This very phrase appears in most commentaries but not all of them. Some just write about “responsabilité”. There is some case law where not even “responsabilité” appears. And Google is not always that good on specialised language.

    At the same time, an experienced lawyer knows his legal domain’s vocabulary. And in the French civil law field, there aren’t many ways to talk about liability. Here are 99% of the synonyms : responsabilité (civile), responsable/s, faute, 1382 (previous number of the Civil Code article on civil liability) and 1240 (current number).

    So one might want to be sure the algorithm searches all those words.

    It does not make a difference in the first 30 results. But it does afterwards. And one might not want to do make 5 queries instead of one and compare them. So one, in some rare instances, may want to use Google as a full text retrieval software.

    See for yourself (go directly to page 5 of the results for each query) :
    – “responsabilité civile” accident automobile https://www.google.fr/search?q=%22responsabilit%C3%A9+civile%22+accident+automobile
    – “responsabilité civile” OR 1382 OR 1240 accident automobile https://www.google.fr/search?q=%22responsabilit%C3%A9+civile%22+or+1382+or+1240+accident+automobile This time, at least at this very time and day on my France-base smartphone, on page 9 appears a result from a legal publisher’s website (Dalloz Etudiant). There is no such result in the first 100 results of the first version of the query. Same thing with a page from the BNR Assurances website.

    1. Post
      Author

      Emmanuel,
      I absolutely agree that when you are searching for highly specialized synonyms, you need to keep control over the synonyms to include. Sometimes, Google surprises us, but we can’t expect it to be knowledgeable on all narrow subjects.

      In your example, it seems like or’s in Google search accidentally switched to lower case?

      Thanks
      Irina

Leave a Reply

Your email address will not be published. Required fields are marked *