Five Tools to Scrape Search Results

booleanstringsBoolean Leave a Comment

When you X-Ray on Google or search on LinkedIn, Facebook, or Github (etc.), you see results that are links with previews (called “snippets” in Google). The problem is that snippets never provide enough information to qualify a result. You can try very hard to phrase your search yet you should always expect false positives. It is time-consuming to click and review every result. Additionally, saving “good” results is a challenge.

Here is an example. If you X-Ray LinkedIn trying to narrow to a location, false positives are unavoidable. This search – site:linkedin.com/in “san francisco” intitle:sourcer intitle:facebook – will find not only Sourcers at Facebook in the Bay Area but also those who used to work in San Francisco and now live elsewhere.

Scraping the information under results’ links and exporting it in Excel can speed up individual reviews many times. This is because, in Excel, you can sort, search, and filter columns (such as “Location”). If you have access to such functionality, you can do wide searches and catch results you would not find otherwise after filtering.

Another use case for scraping under links is delivery to your client. For example, you might have a Recruiter project with identified prospects and need to put the results in a Google doc for sharing with a client. (This is what I do every day).

Here is a list of the best five non-technical tools for under-links scraping that I am aware of. None of them require any coding.

  1. Phantombuster. Search on LinkedIn and parse the results. The output is impressive, having lots of variables scraped. However, you cannot do volumes (hundreds).
  2. Outwit Hub – somewhat tiring to use since it slows down fast. But pointing Outwit Hub to scrape within each result is just one mouse click away. It can scrape your connections, including email addresses, out of the box. (On a fast computer, I got 7K+ records!)
  3. Ally from include.io (Beta). Ally allows you to scrape search results (on Google, LinkedIn, Facebook, or other sites), save results in an internal list, and do a second round of scraping the links. The advantage is that you get data from search previews as well as results themselves, combined.
    Since I have started using Ally in Recruiter, both for results filtering and sharing with the clients, my sourcing speed went up. (I do 80% less copying and pasting).
  4. ScrapeStorm is a new and promising application. It is downloadable. (I hope they will do a web version). I was able to tune ScrapeStorm to go through LinkedIn X-Ray just in a few minutes.
  5. Social List. While the underlying technology does not rely on scraping (we use Google CSE APIs), you can search and export results in Excel. A big plus is that Social List gets its data through Google Custom Search Engine APIs and does not even “touch” LinkedIn.
    Here is what a result (using “Github Agent”) may look like:

(Please note that new Social List users must submit a credit card. But you will not be charged if you cancel early.)

P.S. As a matter of caution, all sites have protection against scrapers. Do not do too much at a time. On LinkedIn, especially.

Leave a Reply

Your email address will not be published. Required fields are marked *