How to Search for Bios on Github

booleanstrings Boolean Leave a Comment

 

Just like we can’t search for LinkedIn headlines within LinkedIn or by X-Raying, we can’t search for Github Bios – either within Github or by X-Raying. However, we can search for LinkedIn headlines with Custom Search Engines (CSEs). It turns out that we similarly can search for Github Bios with CSEs!

We will be searching using Github X-Ray CSE. I will start off providing sample search strings to look within Bios, then, will give some explanations.

Here you go. “GitHub Bio contains”:

You can change the arguments, add keywords, and combine with other Google’s and Custom Search Engine operators specific to Github. As you may have noticed, you can use the asterisk * for ANDs and comma , for ORs in the special operators.

You Can Stop Reading Now and Go Enjoy the Searches 🔎🔎🔎

But wait, I also want to tell you that our tool Social List uses CSE operators in the background, and you won’t need to write any operators – just enter your terms and collect results. Here is what a search looks like:

Check it out if you haven’t.

Now, if you are wondering how I came up with the horrible-looking operator more:p:metatags-og_description: (and what is behind the search algorithm in Social List), read on.

CSE – Special Advanced Syntax

Special CSE operators depend on the website and structure of its pages. More specifically, operators depend on what Schema.org, Microformats, and other objects and values are (invisibly) included in the pages’ source code.

The general CSE search operator format is this:

more:pagemap:<data-field-name>:<data-value>:<value>

where data-field-name is an object like Person, data-value is a value, such as “org” (i .e. organization, a Person’s employer), and value is a string like “IBM”finds pages containing the object Person with a matching “org” value.

Alternative syntax uses just p instead of pagemap:

more:p:<data-field-name>:<data-value>:<value>

Google.com doesn’t “understand” the more:… search syntax, but any Google Custom Search Engine does.

Objects and Values to Query

Objects (like Person of schema.org) and values (like employer=”IBM”) are invisibly included in web pages’ source code, in its part called “PageMap”. The big deal is – you can search within objects and their values using CSE operators. PageMap includes data following a variety of standards: Schema.org, Microformats, and others, and also a part called “Metatags”.

In our particular case, a GitHub Bio is stored in Metatags under the tag “og:description” (and is also duplicated under “twitter:description”). I found it by examining the JSON output from a CSE API call:

“metatags”: [
{
“viewport”: “width=device-width”,
“fb:app_id”: “1401488693436528”,
“twitter:image:src”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“twitter:site”: “@github”,
“twitter:card”: “summary”,
“twitter:title”: “garris – Overview”,
“twitter:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“og:image”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“og:site_name”: “GitHub”,
“og:type”: “profile”,
“og:title”: “garris – Overview”,
“og:url”: “https://github.com/garris”,
“og:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“profile:username”: “garris”,
“pjax-timeout”: “1000”,
“request-id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-host”: “collector.githubapp.com”,
“octolytics-app-id”: “github”,
“octolytics-event-url”: “https://collector.githubapp.com/github-external/browser_event”,
“octolytics-dimension-request_id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-dimension-region_edge”: “iad”,
“octolytics-dimension-region_render”: “iad”,
“analytics-location”: “/\u003cuser-name\u003e”,
“google-analytics”: “UA-3769691-2”,
“dimension1”: “Logged Out”,
“hostname”: “github.com”,
“expected-hostname”: “github.com”,
“js-proxy-site-detection-payload”: “MTUyMTUyNGE4ODJhNTRkMmFkZGU3NjFlOTA5ZTllNTNmZDg1NzZmN2UwZTM1YzhlOWQ5YjAxNGEyZTBhMDk0Ynx7InJlbW90ZV9hZGRyZXNzIjoiNjYuMjQ5LjY2LjIxNiIsInJlcXVlc3RfaWQiOiI4OTVFOjQxRDg6N0IzMDpFMDQyOjVEM0YyM0Q5IiwidGltZXN0YW1wIjoxNTY0NDE5MDM1LCJob3N0IjoiZ2l0aHViLmNvbSJ9”,
“enabled-features”: “MARKETPLACE_FEATURED_BLOG_POSTS,MARKETPLACE_INVOICED_BILLING,MARKETPLACE_SOCIAL_PROOF_CUSTOMERS,MARKETPLACE_TRENDING_SOCIAL_PROOF,MARKETPLACE_RECOMMENDATIONS,MARKETPLACE_PENDING_INSTALLATIONS”,
“html-safe-nonce”: “1ef7c04a79f7c74d7ed950ed690d277292296f65”,
“browser-stats-url”: “https://api.github.com/_private/browser/stats”,
“browser-errors-url”: “https://api.github.com/_private/browser/errors”,
“theme-color”: “#1e2327”
}]

One last step and you will catch up with me on the subject. I am going to tell you how I obtained the JSON sample pasted above.

Running CSE API Calls

The APIs query CSEs from software code. It’s also possible to run an API call from your browser address bar.

Using the APIs requires obtaining a KEY (long coded string) from Google, available here. Input for an API call is a KEY, a CSE ID (a value you can copy from the Control Panel), a query string, and (optional) parameters.

You can run an API query from your browser in the following fashion:

https://www.googleapis.com/customsearch/v1?key=KEY&cx=CSEID&q=a

– it will look like this:

An API call produces a JSON-formatted output page that you can browse to figure out the operator formats.

While you can examine a page’s structure with various tools (including CSEs themselves), these API JSON outputs provide “the” most accurate information for assembling CSE operators.

Final Word

Querying structured info on the web is incredibly powerful. It may seem “too technical”, but that is mostly due to odd-looking strings of parameters that create that impression. (But you don’t need to “read” parameters, you just need to copy and paste.) Maybe one day, Google (or someone) will attach a friendly UI to Google CSEs’ structured web search. In the meantime, follow the links to search in Github Bios and definitely try Social List.

 

Hack: Use 500 Keywords, Not 32, on Google

booleanstrings Boolean Leave a Comment

Google’s limit of keywords is 32. It’s a challenge for long OR searches, especially for diversity sourcing – for example, searching for women’s first names, Latino last names, or diversity colleges. I am no fan of long ORs on Google (definitely not to list synonyms for a word), but in cases like the above, or searching for target companies or locations, I admit, long ORs would be useful, and 32 is limiting.

I have recently thought of a way to push the number of search terms much further. You can do it via Google Custom Search Engines (CSEs), its Synonyms feature.

Google and CSEs will automatically search for synonyms – it is a “built-in” feature. However, if you want to identify related words that may not quite be considered synonyms, the Synonyms mechanism in CSEs allows that. The limit is 500 terms and 10 synonyms for each term. Synonyms can be defined in a special XML file and uploaded. Keeping synonyms in a file and editing with a simple tool like Notepad++ seems more convenient than editing online.

You can define legitimate synonyms (for example, CV = “curriculum vitae” = resume) to help yourself and your end-users. But nobody will check whether the synonyms you enter are “correct”. You may want to play with the setting, defining words with different meanings as synonyms and see what happens. (Define “top sourcer” as <your name>? Just kidding.)

If you love long OR statements, you can enter up to 500 “synonyms” – some of which can be phrases – for an “artificial” keyword like mysynonyms, in the CSE Synonyms setting, and you will be able to push the limit of keywords from 32 to beyond 500!

By the way, David Galley and I have a CSE eBook in the works; stay tuned! In the meantime, check out our first two books, “300 Boolean Strings” and “Sourcing Hacks”.

Googling: Science or Art?

booleanstrings Boolean Leave a Comment

When Google started out, it had a database of indexed pages searchable by keywords and advanced search operators such as site:. Gradually, Google began adding semantic search features. (It has been reworking its storage, Index, accordingly, to contain “knowledge” type of data about stored pages). Here are the most significant semantic-oriented additions over the years.

  • A while ago, Google started searching for words with the same root (“auto-stemming”).
  • About five years ago, we also started seeing keyword synonyms in the results.
  • In recent years, Google has started showing featured previews and Knowledge Graph objects, in addition to search results. Pages containing structured information are rewarded by custom snippets (an example would be a page about a movie, shown along with a star rating).
  • In 2016, with the introduction of RankBrain, Google started to look at the query context.
  • A month ago, with the BERT update, Google has started paying attention to short words that it previously discarded as stop words, to discover query meaning.
  • Both RankBrain and BERT are AI-based, so they will work better with time.
  • And here is something new and interesting – Google is taking personalized search a step further, having just filed a patent on building user graphs.

Science vs. Art

Google keeps its support for operators, giving us control over search results, and at the same time is greatly expanding semantic features, providing us with the most relevant results. If you use only operator-based search, put your keywords in the quotation marks, or run all searches in the Verbatim mode, you are missing out on powerful semantic search capabilities. Running queries out of a saved spreadsheet or Boolean builder has never worked well but will provide even more limited results than before.

Semantic search functionality is not just for the “simple-minded” user who Googles with a few keywords; it’s applicable to advanced research. The trick for best search experience (how do you like the expression? 😉) is to take advantage of both operator-controlled searches and Google’s interpretation of searches. (It’s not either-or – Google interprets all queries, including those containing operators).

As a metaphor, advanced Googling is like a combination of Science and Art, with the Art part continuing to grow. In practice, Googling needs both the left and right sides of your brain activated. Googling requires knowledge of search operators but also intuition, creativity, curiosity, resourcefulness, and natural intelligence. Everyone has these qualities; it’s a matter of putting them to work.

Looking at multiple examples of creative Googling, reproducing them, and modifying to fit your practice are the fastest ways to up your Google-Fu. You can do so as soon as next week by attending

Advanced Google Sourcing Workshop.

and asking follow-up support questions. Hope to “see” you there!

Only 20% Queries Need to Be Boolean

booleanstrings Boolean Leave a Comment

Sourcing includes three types of search:

  1. Research – finding info on terminology, target companies, schools, job titles, locations, and industry news
  2. Search – finding professionals with promising backgrounds
  3. Cross-referencing – finding additional qualifying professional info and contact info.

“[1] Research” and “[3] Cross-referencing” rarely require complex searching. You can accomplish most of the tasks by Googling for a few keywords and using Chrome extensions.

While “[2] Search” has a technical aspect where you create complex Boolean AND-OR-NOT searches (on LinkedIn or a job board). Advanced Google operators are highly applicable as well. However, you can accomplish quite a bit without any “Boolean complexities”.

Here are some simple – non-technical, “non-Boolean” – approaches to these search tasks. (And there are many more!)

  • Have a short question? Google it. While you can’t Google a job description and expect to see anything useful, you can Google for sites where potential candidates might be present, for example:
  • Have a lead (a perfect candidate, perhaps someone who had declined an offer, or lives in the wrong place, or is already working in a similar role)? Google his or her name along with the company name or skill keywords. Also, Google the email address in the quotation marks. You will find additional background and may find sites with other professionals “like this one”.
  • Search for qualifying phrases someone might have written such as “hired as managing director”. (Sometimes this is mistakenly called “Natural Language Search” – this term means asking your queries in English vs. some computer-oriented notation).

While complex Boolean search must remain part of any sourcing process at this time (please don’t believe that “AI is there” – it is not), you can do around 80% of searches without. Join us at the Sourcing without Boolean webinar to learn all about masterful Googling without using operators and other techniques like the one I described above.

 

Numrange, BERT, and Natural Intelligence

booleanstrings Boolean 1 Comment

There are two significant recent developments in Google’s algorithm.

[1] Google’s Operator Numrange is back!

Numrange seems to be working. (Knock on wood!)

[2] BERT – Search Naturally

The latest Google’s algorithm change, BERT, affects a serious 10% of all queries. Google is now paying attention to “insignificant”, short words, that it had previously ignored as “stop words”. It is noticing words like “at”, “to”, “as”, “if” where they create meaning (try sf to nyc). With BERT, Google’s search is becoming even more semantic and less formal, database-like. (For those wanting “database-like” search experience, Google keeps its Verbatim option).

What BERT tells us is to search natural-language-like, especially if we have a short question. For example, start a query with “what is”, “how many”, “top companies in”, “competitors of”, etc.

BERT (as part of other semantic-oriented changes) teaches us to be friends with working, evolving semantic search systems like Google’s. For better results on Google, search as simple as possible. It’s better to take advantage of machine-learned capabilities vs. suppress search interpretation by using long OR strings or Boolean Builders. Of course, any serious practitioner will use advanced operators, but using ORs is outdated (I mean it).

There are no textbooks on writing useful Google queries. It’s someone’s “natural intelligence” that matters in developing the “search” skill.

Join us for a webinar “Sourcing with Natural Intelligence” on Tuesday, November 12th, where we’ll share the important thinking patterns and multiple concrete examples of this (most) productive Sourcing approach.

Knowledge Graph Objects in Google CSEs (True Semantic Search!)

booleanstrings Boolean Leave a Comment

 

As I was finishing the “Hacks” slides for my favorite conference, Sourcing Summit Europe, I stumbled across something I hadn’t seen before. Google Custom Search Engines (CSEs) got a new setting in the control panel:

We can now select Knowledge Graph Objects to restrict the search! I was intrigued; David Galley and I spent some time researching what the new option does.

We have been taking advantage of including Schema.org objects in CSEs since 2016. (See an example of the technology in this post, and we use it in Social List). The new setting is quite different.

While selecting Schema.org Objects rely on metadata placed there by web page creators, choosing a KG Object does not. Instead, it reflects Google’s “idea” of which KG Objects are relevant to the page. Therefore, the new option covers a much wider range of sites that can show up in the results. I say this is a true semantic search in a global search engine!

There is a just-announced API for selecting Knowledge Graph Objects, but there is not much explanation from Google how it works. (It could be quite complicated in the back-end, and is undoubtedly updated continuously.) The best way to examine what happens when we use the setting is to create CSEs and see what shows up in the results. (Get your hands dirty and your mind clear 😉).

We can restrict to Schema.org or KG Objects only in Custom Search Engines, but not on Google.com, and it’s time for Sourcers to get to apply the technology. It may sound quite technical, but it’s not; everyone should able to get KG Object searches going. Please come learn all about the new CSE capacity (and everything else about CSEs) at our webinar on October 15th, 2019.

Here are a couple of details on the KG selection, and four examples.

We can select up to five KG Objects, and the search will look for an OR of the five. However, we can search for an AND of two objects with the use of the CSE refinements.

Examples of CSEs that search for:

  1. Females –  example search: java developer san francisco
  2. Jobs – java developer san francisco
  3. Curriculum Vitae – java developer san francisco
  4. OSINTmaltego

Come join us at Become A Custom Search Engines Expert class coming up this week, for an in-depth look at all the options you can set in the CSEs. Seating is limited.

 

 

 

New: Google College Search

booleanstrings Boolean 4 Comments

While Google has posted a blog about college search, they did not tell us how to get to that advanced college search dialog, which looks like this for me:

You can search through tons of useful info – Program, Location, Average cost. Tuition, Type, State, Acceptance rate, Size, and Campus setting.

The secret to getting to this dialog is adding this piece – &ibp=htl;splinter – to your search URL. This was my search.

The new capability doesn’t work outside of the US yet (you will see a “not supported” dialog if you are outside of the US) and shows US schools only. But I expect it will be expanded globally. If you want to use the university dialog and are located outside of the US, run one of the IP address changers such as Tunnelbear.

For other Sourcing Hacks, please check the second edition of our eBook!

Talent Pipeline Decline?

booleanstrings Boolean 2 Comments

One of the few paid Sourcing tools I use is RPS (LinkedIn Recruiter, or LIR, for Agencies). It is our highest yearly expense, but we have been choosing to stay with it for a number of years now. My favorite feature of LIR has always been “Talent Pipeline” (the name doesn’t really fit), which is, in fact, the import function. As you import Excel files into LIR, the records are matched to LinkedIn profiles, and, as import finishes, you can search across the combined info. It’s a powerful Sourcing technique and has been my frequent go-to when Sourcing. The import function has also been quite reliable.

With the just-released new UI, we still have the import function, but LinkedIn has cut off its features to the point where its usefulness sounds like a question mark to me. Import appears to be a much weaker function in the new version.

Here is a brief summary of the changes.

There is no longer a way to match imported fields with LinkedIn’s.

The file to import (it seems) must be CSV-formatted with exactly these columns:

  1. First name
  2. Last name
  3. E-mail address
  4. Phone number

We used to be able to have one column with first/last, giving us some flexibility. But what is worse, I don’t see a way to import any additional values (that used to go to Notes – and we could then search by them!) Import crashes if a column is added to the import file.

(Before the changes, I was thinking that import would become even more powerful if we could match imported columns with custom fields. Forget about that now!)

And what’s worst of all – at least in my experiments, LIR can import only so many records at once. I was able to import a file with 119 records, but anything over ~140 failed with an error message.

We used to be able to import up to 5K records at a time. Is this a bug? Unless this is fixed (or we find a workaround 😉 ), the Recruiter import feature, a.k.a. Talent Pipeline will have much less value for Sourcers.

Let’s stay optimistic and hope that we will get the full functionality back.

Don’t miss our Productivity Tools for Sourcing webinar on Wednesday, September 11th! We’ll be talking about today’s best tools for all aspects of your Sourcing life. (Most tools covered are free).

How to Restore Image Search Functions

booleanstrings Boolean 2 Comments

Quite unfortunately, Google has just removed three useful options in its Image Search:

  1. Option “Face” from “Types”,
  2. Option “Photos” from “Types”, and
  3. An ability to enter an exact image size.

Here is an example of what it used to look like until just a few days ago. This Image search looks for “Faces”, size 200×200 – with the idea to find profile pictures on LinkedIn:

… and here is what we see now (with no ability to enter an exact image size either):

(Why did Google do this?)

Here is how to compensate for the losses.

1.-2. Solution:

You will get both “Face” and “Photos” filter back by using Google Advanced Image Search URL.

3. Solution:

You will get searching by an exact size back by using the search operator imagesize: (did you know about it?)

(Note that when you press “Enter”, the operator will disappear, but the search will be filtered).

By the way, there is another Google Image Search operator with similar behavior (disappearing after “Enter” is pressed) – and that is filetype:, followed by one of the Image filetypes – JPG, GIF, PNG, BMP, SVG, WEBP, ICO, or RAW. Interestingly, after you have used the filetype: operator, you will get an extra menu for filetypes:

All three lost features can also be accessed by altering the search URL. Let’s memorize what those URL parameters spell out like for image types, sizes, and colors, and keep these strings, in case Google takes more options away from us in the menu.

  • Large images: &tbs=isz:l
  • Medium images: &tbs=isz:m
  • Icon sized images: &tba=isz:i
  • Image sized exactly 200×200: &tbs=isz:ex,iszw:200,iszh:200
  • Images in full color: &tbs=ic:color
  • Images in black and white: &tbs=ic:gray
  • Images that are red: &tbs=ic:specific,isc:red (orange, yellow, green, teal, blue, purple, pink, white, gray, black, brown)
  • Image type Face: &tbs=itp:face
  • Image type Photo: &tbs=itp:photo
  • Image type Clipart: &tbs=itp:clipart
  • Image type Line drawing: &tbs=itp:lineart
  • Image type Gif: &tbs=itp:animated
  • Show image sizes in search results: &tbs=imgo:1
  • Search for filetypes: &as_filetype=png (will get you a new filetype menu as when searching by the operator)
  • X-Ray: &as_sitesearch=linkedin.com (will instert a string site:linkedin.com into search)
  • Localize to country: &cr=countryNZ (a two-letter country abbreviation goes at the end)

So here, we have learned how to restore each piece of the disappeared functionality – and also about additional rather “hidden” filters.

Don’t miss the Productivity Tools for Sourcing webinar on Wednesday, September 11th!

How to Do Executive Job Title Research

booleanstrings Boolean Leave a Comment

Often, especially when sourcing for executives, we need to answer questions like these):

What are possible job titles at a particular level of seniority, in a given industry (or at a company), with given functions?

Equipped with this intelligence, we can start constructing filtered people searches. Without this research upfront, we would be encountering both false positives and false negatives when searching.

While doing open-ended searches (on LinkedIn, for example) and eyeballing results is useful, at times, we want to get lists of job titles that are as full as possible. For that, we can X-Ray some sites with profiles, such as Zoominfo, and scrape search results with a tool like Instant Data Scraper. In X-Rays, we’d include keywords for job titles we are looking for, such as chief, head, director, senior vice president, etc.

For example, we can search like this:

site:zoominfo.com/p intitle:accenture “* director”

and scrape results. We will need to clean up the collected data a little, but we can get a reasonably full list of job titles, that include the word director, at Accenture. Note that if we are able to get Google to highlight the exact job titles in the results (for example, by searching for “chief * officer”), we would get a clean output of the titles in a separate column with Instant Data Scraper.

The contact-finding site contactout.com has public profiles and we can X-Ray it in the same manner (then, scrape results):

site:contactout.com intitle:”credit suisse” director “united arab emirates”.

Yet another site, RocketReach, can be used for the same:

site:rocketreach.co intitle:walmart intitle:chief “chief * officer”.

As an example output, here are the titles of Chief Officers at Walmart found with the above string:

Chief Administrative Officer
Chief Business Development Officer
Chief Communications Officer
Chief Compliance officer
Chief Culture Diversity&Inclusion Officer
Chief Customer Officer
Chief Data Officer
Chief Ethics & Compliance Officer
Chief Ethics Compliance Officer
Chief Ethics Officer
CHIEF EXECUTIVE OFFICER
Chief Information Officer
Chief Information Security Officer
Chief Legal Compliance Officer
Chief Marketing Officer
Chief Merchandising Officer
Chief People Officer
Chief Procurement Officer
Chief Product Officer
Chief Revenue Officer
Chief Technical Officer
Chief Technology Officer

Sure enough, we can also X-Ray LinkedIn for the same purpose. Constructing searches is straightforward since public LinkedIn profiles have both job titles and company names in the page titles.

We can get a combined job title list from each of these types of X-Ray searches, and this would inform our people searches.

Join me for a brand-new webinar Executive Sourcing Techniques on Tuesday, September 17th to learn more!