Social Networks want to be found, so they make some information – most notably, profiles – public, visible to search engines. At the same time, they want members to join and sometimes pay for the search. They also worry about their members’ data privacy. It is a balance for each site – which pages and how much info to let Google find. It is not uncommon for a social site to decide to hide previously visible public info.
So the Surface Web is not only expanding but shrinking too!
Meetup.com has recently “lost” public profiles. Github.com used to have public email addresses, then switched to them being visible only for logged-in users.
One more surface data loss has just happened. A few days ago, as my friend and colleague Karen Azulai messaged me, Google has stopped showing Github’s “repositories” profile tabs in results. It is unfortunate for us because that tab contains the programming languages. We could X-Ray, say, for a combination of Java, Scala, and Python by using a template site:github.com inurl:tab=repositories Java Scala Python. But now, even the open-ended site:github.com inurl:tab=repositories Germany was producing only a handful of results.
Apparently, examining the file with the crawling rules https://github.com/robots.txt you can find a string
which is responsible for the now-hidden pages.
There are two ways around the challenge.
1. One is to omit the “tab” piece – since the “top” repositories along with the languages are on the main profile as well, it won’t work too badly to search just for
site:github.com “sign in to view email” Java Scala Python
2. Another method is curious. It turns out that Custom Search Engines still find results that Google no longer does – at least for now. Try this:
site:github.com inurl:tab=repositories Java Scala Python
Right now the screenshot shows results – 8 on Google and 23 on a CSE:
It is funny that CSEs “remember” cached pages longer than Google (as the screenshot shows). But these results are going away. The method “One” is the one to use.
Finding new approaches when “surface” data leaves us is a useful skill for anyone who searches on the web. Check out our OSINT-themed webinar Advanced Google and LinkedIn for #OSINT Research coming up this week. It is going to be packed with useful tips.
Pingback: Secjuice Squeeze 57
Pingback: Can You X-Ray for Profiles? A Simple Test | Boolean Strings