GitHub is Paradise But Its Syntax is Jungle

booleanstringsBoolean 1 Comment

Technical Recruiters cannot afford to stay away from GitHub due to its rich data about Software Developers. It is Tech Recruiter Paradise. But GitHub user search is more of a  “Jungle.” Its syntax is incredibly complex. The documentation is helpful, yet it also contains several errors and omissions, which I will outline.

Here is an awesome list of search operators (which GitHub calls “qualifiers”) by my friends Sofia Broberger and Suzanna Frazier:

https://bit.ly/LUSOG.

Now, let us proceed to discuss

How GitHub User Search Works

If you are just getting started, practice with the advanced search dialog – it will create the right search strings for you. But it is restrictive (same as on search engines or Indeed).

Keywords

The docs say that keywords find users by name and username. But they also find things you see on the profile under the image – bio, site, X, and company. Locations and languages will not be found.

Keywords support “normal” Boolean logic operators – AND, OR, NOT.

Partial Keywords

Partial words will be found (I haven’t seen this explained.) Example:

googl language:C

Operators

The most useful for sourcing are the operators language: and location:.

  • language: takes standard values
  • location: can be any text.

The note above in the “Keywords” “chapter” means that, when searching by location, you may also want to look for it as a keyword separately – like chicago -location:chicago.

If you search for a language that is not on their list, GitHub will ignore it; language:lisp language:nonexistent is the same as language:lisp (make sure you do not get caught here.)

The operator location: does work with accented characters; it is important for global Recruiters. Example: location:Київ.

Special characters under operators for location and language serve as a divider – and give you a Boolean AND –  location:NYC*SF. They are ignored at the beginning and end of the parameter.

Partial operator arguments will not be found.

Knowing that any user can only be found by one “main” language is essential.

With the operators, the Boolean OR is the default. (Incorrect in the docs.) NOT, for a change, is written as minus.

The other two operators search for the numbers of:

  • followers:
  • repos: (repositories).

The number format accommodates for “numrange” like 2..5, and these two – >, <. Example: followers:>1000 repos:>6 language:C. You can also write followers:=<3 or followers:<=3 (but not followers:=3)

A drop-down on GitHub search allows you to sort results by “best match,” followers, repositories, and join date. (Interestingly, GitHub user search API takes extra operators followers:, repositories:, and joined:, as well as sort: to run the same functionality).

Phrases

Phrases in keywords should be in quotation marks. (Nice that they didn’t do anything unusual here 😉 ) Example: “NYC SF”.

However, spaces between quoted words under location: work as a Boolean AND! Example: location:”francisco san.” I haven’t seen this documented.

Parentheses

Parentheses are ignored. OR is executed first, then AND (same as on Google). I haven’t seen this documented.

The Good News

The good news is that, in practice, language: and location: “cover a large territory” and are often sufficient for collecting a sizeable promising user list to investigate further. Keep the search syntax subtleties covered above in mind.

Please join us for a deep dive into Github sourcing –

Leveraging GitHub: Advanced Data Mining and User Profiling

on August 30th, 2023, Wednesday.

Comments 1

  1. Fascinating, thanks Irina. I agree that the basic parameters of language, followers, repos, and location will cover many profiles. I am also not sure that exploring deeper and playing with more advanced operators will be worth the effort and time spent.

Leave a Reply

Your email address will not be published. Required fields are marked *