Non-Technical Scraping in Sourcing

booleanstrings Boolean 2 Comments

 

Web Scraping tools are rapidly gaining popularity among Sourcers and Recruiters. They are an essential part of a productive Sourcer’s toolbox.

Here is an example of how scraping may help your work. Suppose you see a webpage like this (listing some promising prospects):

If you wanted to examine and save the records, this format is quite inconvenient. Without tools, you’d be endlessly copying and pasting “by hand.”

But, take a scraper like Instant Data Scraper, and in seconds you will get an Excel file with parsed information:

Clearly, this format is easier to work with. You can search, sort, and filter columns, add info to records, upload to other systems, and use this data in combination with other data.

As an example, if you have a LinkedIn Recruiter account, you can upload the file using Talent Pipeline. Uploading will combine the scraped data with LinkedIn’s, and you can then search within the “enriched” LIR records.

While some scraping techniques require coding, for the majority of our tasks, we can get results with simple-to-use tools like Instant Data Scraper, Data Miner, and Phantombuster. Outwit Hub is a more sophisticated tool, but it can do simple things simply and “knows” how to extract contact information.

Join us for the first-ever class “Web Scraping for Recruiters” on Tuesday, July 30 at 9 am PDT, with an optional workshop on Wednesday, July 31st, to learn all about non-technical scraping for Sourcers and Recruiters, and start using this technique.

[Edited] Repeating the sold-out webinar on August 6th, 2019! Register here: https://sourcingcertification.com/webscraping/.

 

Google filetype: News

booleanstrings Boolean Leave a Comment

Google.com has quietly improved its filetype: searches. Previously, it was looking (simply) for the part of a page URL that ends in the filetype: argument, and it can still do so:

filetype:tonini (finds Facebook profiles in particular).

However, for standard, common file types, Google now also searches for all files of a given kind (such as MS Excel), with a variety of extensions. I.e. Google now searches according to the “true” file format (vs. just the string in a URL after the last period). For example, consider this search:

attendees filetype:xlsx -xlsx.

You will be finding MS Excel files with extensions other than XLSX, such as XLS, XLTX, etc. The same is true for other standard file types such as MS Word or PDF – Google will find other file extensions. That is quite helpful for our searches. (Note that Google’s help doesn’t tell us either about non-standard extensions like tonini or searching for true file types such as Excel).

In a hurry? You can use a shorter operator ext: on Google – it works the same way as filetype:.

Check out the fully updated 4th edition of my eBook Boolean Book, that has just “shipped”. The eBook includes X-Ray templates for Sourcing – including multiple filetype: searches – as well as several new Custom Search Engines and ideas on how to search productively.

 

The Complete Guide to X-Raying LinkedIn

booleanstrings Boolean Leave a Comment

Have you been finding that you are not getting the right results with some LinkedIn X-Ray strings that used to work? That is because the structure of public profiles has changed in several ways over the past few months. Here is what can and cannot be done as of now.

X-Ray LinkedIn for:

  • Current Job Title: possible using the operator intitle:
  • Current Company: possible using the operator intitle:
  • Headline: yes, through special operators in Custom Search Engines (CSEs) and in Social List – these are the only two ways to search by headline out there, LinkedIn Recruiter included.
  • Location: unfortunately, we have experienced a double-loss regarding locations, due to public profile HTML changes:
    • (1) You can no longer get the right results by searching for “location * * <location-name>” (take a note of it!)
    • (2) You can no longer query the location with a special Custom Search Engine operator, similar to this one.
    • All you can do now is just search for location names, which only works well if it’s a distinct LinkedIn-defined area name (such as “San Francisco Bay Area”). Or, X-Ray for a specific location such as “Oakland, California” to find people who have chosen to display their location this way.
    • (As a side note, I have noticed that even when a public profile has a location like “Oakland, California”, the location shows a generic name like “San Francisco Bay Area” when I’m logged in. This means that you may get lucky and find the exact location on a public profile vs. logged-in).
    • You can X-Ray LinkedIn for countries by using a location setting in Google’s advanced search dialog, or use two-letter country codes under the site: operator.
  • Industry: we can no longer get the right results by searching for “industry * * <industry-name>”.
  • School: recently became available using a CSE operator in the format more:p:organization-name:<school>. Example. That’s a gain.

Our Sourcing X-Ray tool Social List takes advantage of all of the above and doesn’t require you to write any operators, just the search terms. Check it out if you haven’t! (Otherwise, writing more:p operators is tiresome, at least from my experience).

There are also ways to X-Ray LinkedIn to find:

and we can probably think of other creative X-Ray strings, depending on the search.

Check out the 90-min recording of our recent class “Linked Hacks” for other X-Ray examples and sourcing hacks for LinkedIn as well.

 

The Job Function Hack

booleanstrings Boolean 3 Comments

So, LinkedIn operators have briefly reappeared last week and are now gone again. Oh well.

Let me show you a LinkedIn search hack I have found that does currently work, and that is – searching by the job function. To use the hack, you need to know LinkedIn codes for job functions, and they are as follows:

Code Description
1 Accounting
2 Administrative
3 Arts and Design
4 Business Development
5 Community & Social Services
6 Consulting
7 Education
8 Engineering
9 Entrepreneurship
10 Finance
11 Healthcare Services
12 Human Resources
13 Information Technology
14 Legal
15 Marketing
16 Media & Communications
17 Military & Protective Services
18 Operations
19 Product Management
20 Program & Product Management
21 Purchasing
22 Quality Assurance
23 Real Estate
24 Research
25 Sales
26 Support

To search by a job function, add &facetCurrentFunction=<value> to your search URL – that’s it.

Example: this is a search for people with keywords plant and manufacturing, whose job function is Operations.

This is a search for Vice Presidents whose job function is Engineering.

If you wanted to search for two or more functions at the same time, you can do so by appending (for example) &facetCurrentFunction=[“16″,”17″,”20”] to the search URL. Example.

The job function search hack produces the same results as LinkedIn Recruiter (except in Recruiter, you cannot combine functions). Keep in mind that it is a calculated (vs. entered by the user) field, and LinkedIn’s judgment at times may not coincide with yours.

You will find many other hacks in the second edition of our ebook “Sourcing Hacks”. 😉 For an interactive presentation, come to our webinar on July 2nd!

 

New Hacks Replace Old Hacks

booleanstrings Boolean 1 Comment

It’s never boring with sourcing tools and sites. Things change all the time and, at times, we lose tools.

A major #OSINT publication Bellingcat writes that last week was “a hell week for research”: The past week dealt several blows to open source researchers. First, Facebook made major changes to its Graph search interface, effectively breaking previous methods to search public posts from the site. The service has been valuable in, for example, the investigation of war crimes. Next, the popular people search service Pipl shut down their free service. On top of that, Twitter announced they are removing precise location tagging from tweets. And, on top of that, we have lost LinkedIn search operators – they briefly reappeared on Monday but now are gone again.

Let me give you an update on the Facebook Graph. Lots of researchers have been trying hard to find workarounds. As of now, we have exactly one promising solution that restores much of (though not all) Facebook Graph Search. Almost as soon as Graph was gone, developers who make an OSINT tool “Social Links” have found this brilliant hack: Facebook graph search workaround.

This Firefox Extension – SearchBook – implements the hack. Note that to use the extension, you have to carefully follow the installation instructions. If you try it out, be prepared that the UI is far from being intuitive! You need to enter your search expression in a language similar to Graph’s (e.g. 104958162837/employees/ becomes employees(104958162837)). Then, you perform any search – doesn’t matter what – and as you scroll down, the search results are replaced by the ones matching your terms entered into the tool’s search box.

Some other conversions from the Graph into SearchBook language are:

  • 104958162837/employees/past becomes past((employees(104958162837))
  • 104958162837/employees/present becomes present((employees(104958162837))
  • 104958162837/employees/106078429431815/residents/intersect becomes intersect(employees(104958162837),residents(106078429431815))
  • You can also search for employees(pages-named(str(hospital)))!

This is what it looks like:

As it is, the SearchBook UI is too inconvenient, I think, to be using the tool on a daily basis. But if you can’t find the info you are looking elsewhere, it’s a good choice. Let’s hope a nicer UI will be developed.

With all the tool changes, it was time for us to update our book “Sourcing Hacks”. I am happy to announce that the second edition is out as of this morning! Several hacks are out, several new ones are in. Get the book both to enrich your Sourcing Toolbox and to get inspired by exploring new techniques.

On Tuesday, July 2nd, we will hold a Sourcing Hacks webinar, going through the hacks with multiple examples. One month of support is included, as always.

Bye, Facebook Graph Search

booleanstrings Boolean 5 Comments

On Friday, June 7th, Facebook Graph Search stopped working. “Hand-made” URLs and tools Intelligence Search, SearchIsBack, StalkScan, and others, no longer work. (I believe it’s a permanent change).

Needless to say, the drop in functionality badly affects Sourcers and, especially, OSINT people.

Searches that used to have URLs with Facebook IDs and pieces like “visitors” and “likers” combined, now look cryptic. For example, this is a search for people who like Python:

https://www.facebook.com/search?f=Abq-pIiVvwHAJmfSUMl_vchT_biNIpETyYhtdUCE4QM7kBQbp0V6cMOjtQL-wx1JksGSgJpmf-RPgvjMPwC9tsAu4oQYqjxBNKgRlv0GAOM-sw.

OSINT Researchers, notably, Henk van Ess, have been hard at work trying to figure out which former-Graph searches can still be performed.

Dan Nemec has posted ways to decode various search URLs.

I have been playing with various URLs and found that this search would find people who like Python and are living or lived in London. I constructed it combining two URLs – one, for people who live in London, and two, for people who like Python. This method can be reused to bring back some of the Graph.

However, at this point, we don’t know what we can and cannot search for, compared with the good old Graph. For example, we don’t know whether we can find Spanish-speaking female Software Engineers living in San Francisco.

It’s an interesting challenge for Sourcers!

We will share ways to work around this in the upcoming Facebook Webinar on June 18th, 2019. Seating is limited at the webinar – sign up now!

 

 

 

21 Tools and Sites I Use Everyday

booleanstrings Boolean Leave a Comment

  1. Chrome
  2. Advanced Google Search
  3. Google Custom Search Engines, using Advanced Operators
  4. Yahoo Reverse Image Search
  5. LinkedIn.com People and Content Search
  6. LinkedIn Recruiter
  7. Social List
  8. Social List Contact Finder
  9. Connectifier Social Links
  10. Lusha
  11. Rocketreach
  12. Hunter.io
  13. Pipl.com
  14. Finding People by Email on LinkedIn
  15. Intelligence Search for Facebook [gone now]
  16. Email Extractor
  17. Instant Data Scraper
  18. Outwit Hub
  19. Github Search for Users and Files
  20. Github Hacks for Finding Emails
  21. Grammarly

What about you?

Hidden Github Resumes

booleanstrings Boolean 2 Comments

Github has tens of thousands of resumes that Google won’t find. Intrigued? Read on.

Unknown to many, Github is widely used not only to work on software code but to store documents such as resumes. Just as software code, documents are stored in the code section. This search – “my resume” – would reveal some, but it’s not well targeted.

Here are some ways to locate Github-stored resumes – and why Google doesn’t index them.

Let me start with Google. A favorite format for resumes on Github is JSON. Originally created as “JavaScript Object Notation”, the format is now used for non-code documents as well. Unfortunately, Google still considers JSON as not worth being indexed. resume filetype:json produces no results. As Google’s Dan Russel told me, when I had asked him why, JSON files are “generally created dynamically rather than stored long term. Because it’s dynamic, we can’t index them.”

Too bad. But we have ways to source for these – and other document formats – within Github.

Github has search operators extension: allowing to search for files of given types and filename: to search for document names (these are similar to Google’s filetype: and intitle:). Armed with the operators, we can run a more targeted search. Here is an example:

extension:json filename:resume “san francisco” javascript.

Another less known resume format, popular among Github users, is TEX. Example search:

filename:cv extension:tex “data scientist”.

Of course, we don’t have to necessarily include searching for file types. And, there are plenty of non-developer’s resumes as well. The above example finds Data Scientists. Here are a couple more examples:

filename:resume “product manager”

filename:resume “vice president” engineering

Check out the fully updated webinar “How To Find and Attract Technical Talent” to learn these (we provide a Github search syntax tip sheet) and other IT Sourcing techniques!

 

 

Broken LinkedIn Boolean Explained and Webinar

booleanstrings Boolean 2 Comments

 

I have figured out how it works and will briefly explain below.

We have announced a new delivery of the LinkedHacks webinar, live on May 1st, to fully update you. (Don’t miss it; seating is limited). Come to learn how to not only work with Premium and Basic accounts but take advantage of a new hidden feature!

Of course, nobody would search like this in practice; the search should produce zero results if we had working Boolean search. However, these results shed some light on what is going on. Let me explain.

If you search for senior software engineer (without the NOT additions; try it), LI would include:
a) synonyms and variations for “senior” such as “sr” and “snr”, “software” such as “sw”, “engineer” such as “engineering”
b) translations into other languages, such as “Ingeniero” etc.
It would look in all titles present – and past as well.

You need not write (senior OR sr OR snr) (etc.) – synonyms are included automatically.

With the awareness that the search is not Boolean, you can take advantage of the current implementation: keyword search for a job title returns all the members for whom this is a present or a past title. This search is otherwise unavailable in LinkedIn.com.

Meanwhile, LinkedIn Recruiter has its own, different ways to include job title synonyms.

We have observed lots of changes in LinkedIn search in the last few weeks. Come to LinkedHacks webinar (May 1st, optional hands-on practice May 2nd) to get full explanations, new and updated hacks (some, unavailable anywhere online), and get support on everything LinkedIn Sourcing for thirty days.

Update on LinkedIn People Search

booleanstrings Boolean 1 Comment

What happened?

As of two days ago, search operators (that I have covered in previous posts) have stopped working on LinkedIn. This includes not only undocumented operators like headline: but also the officially documented operators firstname:, lastname:, title:, company:, and school:. (They were introduced in 2017). It’s quite unfortunate! [Edited April 22, 2019: LinkedIn has taken down that help page.]

As always, we will be looking into alternative ways for Recruiters and Sourcers to overcome LinkedIn limitations.

We may get the operators back; we’ll see.

In the meantime, if you are searching with a basic or premium account, keep in mind that:

  1. LinkedIn Keywords Boolean Search Is Compromised. For better results, use the advanced search dialog.
  2. You can X-Ray LinkedIn for current company and job title. The advantage is, you would be looking for the “true” current job and avoiding finding jobs that members have not “closed”.
  3. If you are up for writing complex Custom Search Engine (little-known) special operators, you can use the technique described in Fascinating: Custom Headline Search. While LinkedIn search operators are not working, this is the only way to search by headlines (unavailable in any LinkedIn accounts, including Recruiter.)
  4. If you are not up for writing complex CSE search operators, use our Sourcing Tool Social List, which used the CSE technology via APIs and hides all the complexity for end-users. (We have also added a “Contact Finder” to the tool. Check out Dean Da Costa’s blog and video).

Thanks for reading! I’ll keep you updated.