Search Engines like Google index pages on the Surface Web, i.e. (roughly speaking) pages that do not require a login.
Not all of those pages are indexed. Sites can tell Google not to index parts of them. The mechanism is via robots.txt files or <meta> directives on individual pages. Even though you can view those pages in incognito, you won’t find them via X-Raying.
Another example is Hackerrank, covered by Balazs. (In the case of Hackerrank, they have introduced those <meta> directives on every profile.)
There are two ways to discover these pages. These pages are not as “deeply” hidden as some other parts of the web, and you can sort of X-Ray for them.
1. Search on Social Networks. (No directives exist in HTML preventing links from being shared!)
- site:linkedin.com/posts “airtable.com” – I don’t think there is a way to uncover these links using a LinkedIn post search, but Google X-Ray helps.
Play some keyword games to eliminate irrelevant pages from Airtable (such as documentation).
You can continue with sites like Reddit, Discord, etc. “Closed networks” often hide their users’ professional backgrounds, along with their names. But “shared information” describes most of the popular posted content.
Note, though, that you can search for the words surrounding links to pages but not the page’s content. So the search needs to have those “description” or “comment” keywords – what the link is about: contact lists, attendees, events, directories, etc. In this way, it is similar to intitle: and inanchor: searches. It invites searching with “natural language” since it is likely someone’s comments on the document in addition to sharing its title and URL.
2. Apply the same logic to Google search. Look for pages with links to sites like Airtable or Hackerrank – accordingly, with words describing those links.
These discoverable and viewable sites are “the Shallow Part of the Deep Web.”
For an update on Google search algorithm, check out the brand-new class on Thursday, November 17th,