How Search Works

booleanstrings Uncategorized

I have tried to define how search works – in one page. In reality things are more complicated than this. However, I believe that understanding search mechanisms even a little bit helps searching.

Here you go:


Internet search engines:

  • —  …crawl the web, going through links and memorizing the content of pages.
  • —  A page has little in terms of its contents’ structure: a title, a URL, words, phrases, images.
  • —  Search can be done in terms of keywords or phrases (KEYWORD=”engineer”).
  • —  Pages are ranked. One of the factors in getting a higher rank is having many links point to the page.

ž

Databases:

  • —  …are much smaller than the web.
  • —  Examples: Job boards, LinkedIn, your ATS.
  • —  Every record is structured and has predefined fields (as an example, name, title, company).
  • —  A database is indexed, so that search can be done by the fields in its records (TITLE=“engineer”).
  • ž  Some databases (such as LinkedIn) expose part of the content to the surface web.

žThe Boolean logic (AND, OR, NOT) allows to combine search terms and provide instructions on how to search on databases and on the web. Search capability and syntax are different in different places.

Display of results:

  • Search engines at first show preliminary results and an estimated number of them.
  • If we go through the pages of results and select “show all results, omitted included”, the engine finalizes the list of results.
  • No engine will show more than 1,000 results.
  • Databases can show all results (some, like LinkedIn, charge extra to show more).

Food for thought: If we get a limited number of results from the web, parse them and put them into our own database we can search again, in a more powerful manner.