Source Code Search Engines

booleanstrings Boolean

softwarecode

As a follow-up to a previous post Sourcing Developers in [software] Source Code, I’d like to go over some alternatives to “plain” X-Raying for searching open source code.

Google used to have advanced search that “understood” regular expressions for https://code.google.com/ but it was shut down in 2012. At this time the most advanced search you can do there is to search the Google-hosted code using “labels” that shows 100 results at a time; as an example,: search for python. The Chromium project source code still has the advanced search ability; but it has only so many developers involved.

The good news is that there are multiple sites that offer open source code search across hosting platforms. You can select the programming language on all of those code search engines. As we have seen in the previous post (and in the screenshot above), code authors may leave “signatures” in the code that allow us to locate them.

  • Open HUB (formerly Ohloh, formerly Koders.com) searches impressive 20,000,000,000+ lines of code – unfortunately, rather slowly and with rather limited search syntax. You can also search for its own 60K+ registered members and you can look at the most popular contributors to each of the supported 40+ programming languages.

krugle

 

  • SearchCode is another code search engine that has indexed a significant amount of code. It displays the sources it searches on the front page, and you can pick and choose from them. Here is a search similar to the above: C++ code with @samsung.com email addresses. The site is run by a single (super-) Developer – Ben Boyter.

(I must admit that I am not certain about the usage of the special character @ for searching here. The results do have email addresses – and that’s what I am after. I will leave it for the reader to figure out how special characters are processed.)

Searching for contact info in the Source code is a rather unusual way to locate Developers. While we can’t know the location for the email owners, cross-referencing is quick and will identify people at certain locations, for whom we’d already know the programming language and possibly the employer (depending on how we search). As an example, a similar Krugle search for Google.com-based emails AND the C++ programming language reveals the LinkedIn profiles of 6 solid Software Developers in the San Francisco Bay Area, 5 of whom currently work for Google – and we can now email them directly.

The same set of email addresses (@google.com with C++ skills) reveals a good number of Google-Plus profiles:

plus-profiles

I have yet to figure out how to quickly narrow these down to a specific location (for the companies that have multiple offices, such as Google). When I do, I will let you know!