The Shallowest Deep Web Where Hackerranker Lives

booleanstringsBoolean Leave a Comment

Search Engines like Google index pages on the Surface Web, i.e. (roughly speaking) pages that do not require a login.

Not all of those pages are indexed. Sites can tell Google not to index parts of them. The mechanism is via robots.txt files or <meta> directives on individual pages. Even though you can view those pages in incognito, you won’t find them via X-Raying.

One example is Airtable, suggested by Alla Pavlova. There are multiple contact lists of laid-off people or conference attendees, yet site:airtable.com finds none.

Another example is Hackerrank, covered by Balazs. (In the case of Hackerrank, they have introduced those <meta> directives on every profile.)

There are two ways to discover these pages. These pages are not as “deeply” hidden as some other parts of the web, and you can sort of X-Ray for them.

1. Search on Social Networks. (No directives exist in HTML preventing links from being shared!)

See:

Play some keyword games to eliminate irrelevant pages from Airtable (such as documentation).

You can continue with sites like Reddit, Discord, etc. “Closed networks” often hide their users’ professional backgrounds, along with their names. But “shared information” describes most of the popular posted content.

Note, though, that you can search for the words surrounding links to pages but not the page’s content. So the search needs to have those “description” or “comment” keywords – what the link is about: contact lists, attendees, events, directories, etc. In this way, it is similar to intitle: and inanchor: searches. It invites searching with “natural language” since it is likely someone’s comments on the document in addition to sharing its title and URL.

2. Apply the same logic to Google search. Look for pages with links to sites like Airtable or Hackerrank – accordingly, with words describing those links.

Example: “airtable.com” -site:airtable.com tech layoffs.

These discoverable and viewable sites are “the Shallow Part of the Deep Web.”

For an update on Google search algorithm, check out the brand-new class on Thursday, November 17th,

Search Is No Longer Boolean.

 

Leave a Reply

Your email address will not be published. Required fields are marked *