Sourcing Revolution! Googling in Ways You Never Have

booleanstrings Boolean

data

I am about to describe ways to search the Internet that you likely have never used. Please be patient; I need to start with some background, to explain how this works. Read on.

There is a difference we face in searching within a database vs. searching the Internet. In a database, we access structured data, i.e. records with predefined fields, and can do a faceted search. As an example, in the LinkedIn advanced people search, we can search for the name, title, company, and location. Web pages, on the other hand, have titles, URLs, content – and not much else (or so it seems!). A web page content is unstructured data, and it’s much harder to search.  Those of us, who are familiar with advanced search syntax, create Boolean Strings based on web pages’ patterns, to try and find specific kinds of data within the pages.

As an example, we use advanced search syntax or Custom Search Engines to Google for LinkedIn profiles. That searching approach is based on the profiles’ URL structure and helps to see only profiles in the results. Then, to narrow down to job titles, companies, and locations, we use additional advanced search syntax, and often with only so much precision.

Most web pages have no structure to rely on for searching. However, many sites and pages – LinkedIn public profiles included – do have some structured data that is hidden from the viewer but is “seen” and collected by Googlebot. To view that hidden structured data of a page, paste the page URL into Google’s Structured Data Testing Tool. Take a look at various Social Network profiles in the Tool and you will see varying amounts of hidden structured data in public profiles – on LinkedIn, Google-Plus (of course!), Meetup, Github, and many other sites. Some hidden structured data follows the standard in Schema.org; some sites use other more-or-less standard ways to name data and the fields.

Google takes advantage of the hidden structured data, which it collects, by giving us rich previews (“snippets”) of search results. As an example, if we Google for LinkedIn profiles, previews will display taglines and locations underneath the profile links – those two pieces of information are that kind of data:

data

If we Google for some products or movies, we may see the ratings shown as colored stars in the snippets; that is also structured data previewed by Google for the end user.

Unfortunately, we cannot search for structured data on Google.com. Advanced Google search syntax doesn’t include that capability.

Here’s a way to search for it. It’s relatively new (though it has not been widely used by people other than webmasters) – and I am about to announce this loudly so that you can take advantage of it! Ready?

You CAN search for any structured data in any Google Custom Search Engine.

Here’s how it works. There is additional Boolean search syntax that only Google Custom Search engines “understand”. The syntax is as follows:

more:p:<data-field-name>:<data-value>

Here are some examples of using that syntax, based on the Custom Search Engine” Search Everything” – that does search “everything” on the web. (Feel free to bookmark the shortcut: http://bit.ly/SearchEverythingCSE):

There’s a lot more these searches can do!

Further, we can “shoot in the dark”, meaning – search by structured data without restricting to a site. This approach works – and gives some fantastic results, which we have no other ways to search for. When “shooting in the dark”, we can try different ways that various sites may be using to name hidden structured data. Examples:

  1. title=engineer location=Chicago
  2. title=engineer location=Chicago

How cool is that?

You may wonder how to pinpoint various structure field names while website creators do not always follow the same standard. I will be sharing more on that soon. Stay tuned!