Scrape Google Maps

booleanstrings Boolean Leave a Comment

If you are sourcing for professionals who provide services such as Accounting, or looking for office locations for a specific company, searching on Google Maps will provide valuable information. Scraping a Maps search results page allows you to filter and enrich the data, and you can do it quite simply, with no coding involved.

PhantomBuster is a wonderful collection of automation scripts; they recently released a Google Maps Search Export. However, my preference for scraping Maps is a favorite tool Instant Data Scraper because it goes over all the results vs. partial. (There are also many specialized map scraping tools but I don’t think anything more complex is more beneficial).

Let’s look at an example. I have searched for accountant near manhattan and ran the Instant Scraper:

Within seconds, I got this table with 309 rows, type of business, whether it is a Certified Public Accountant, office addresses, websites, ratings, and numbers of ratings submitted:

Where else would you find a list of CPAs in Manhattan this fast? 😉

Please join me for an updated sourcing class Web Scraping For Recruiters on Tuesday, February 11th, with an optional hands-on practice the next day. Seating is limited, and the class is one of our most popular, so register now!

 

 

 

 

 

Sourcing for Devs on Dev.To

booleanstrings Boolean Leave a Comment

Dev.to is a relatively new site where developers discuss how to build software, and it is quite a Sourcer-friendly site. 😉

A profile can have a location, education, skills, links to other profiles, content keywords, and, often, public email contact:

Internal search for members is too weak to be useful. But we can X-Ray the site for profiles like so:

site:dev.to -site:dev.to/*/* “united kingdom” javascript.

Since Google recognizes hashtags in search, and the site widely uses hashtags, we can search in a more targeted way, for example:

site:dev.to -site:dev.to/*/* #javascript “gmail.com”.

With a Google search like the above, it is not hard to scrape the results and get a list of prospects along with emails. It is a matter of picking the right scraping tool.

Please note, we are going to run an updated version of our always-popular Scraping Webinar on Tuesday, February 11th. If you are curious about the topic, it is a place to be.

This Custom Search Engine – http://bit.ly/devtoprofiles (X-Ray Dev.To) – will look for Dev.to profiles. Using it eliminates the need for the operator site: and allows to search for members’ profiles, for example:

#javascript amsterdam

There is not a whole lot of “structure” on the profiles (much less than on Github’s, for example), but we can search specifically for bios with a Custom Search Engine:

more:p:person-description:java.

I hope the resource will prove useful to some! Let me know if you have any questions.

We are running the How to Find and Attract Technical Talent on Tuesday, February 18th. Join to get a complete IT Sourcing toolbox. (Some tools are cross-industrial.) Seating is limited.

 

 

 

Check Out These Nineteen Example Social Lists

booleanstrings Boolean 2 Comments

The beauty of our tool Social List is that you can get a list of professionals in split seconds. As I offered to source for colleagues in my post, I got many beautifully diverse requests. Here are some sourced lists, as examples of what Social List can do, shared on the Boolean Strings Group on Facebook.

  1. Franchise Business Consultant in restaurants based in DFW
  2. API Developer in Arizona
  3. Java Developer, Montréal
  4. Dental Hygienist
  5. Commercial construction general contractor (estimator), Los Angeles
  6. Search Engine Advertising Consultant, Berlin
  7. Enterprise Account Executive, Sydney, Australia
  8. Clinical research associate, Pharma, CRO
  9. Statistician in Austin, TX
  10. Software engineer, front end, React/JavaScript, Toronto
  11. Design engineer, solidworks, rf or microwave, Providence RI
  12. Product designer with B2B & SaaS, Mountain View
  13. Software engineer mid-level PHP JavaScript, Kyiv
  14. Executive Housekeeper, Wisconsin
  15. Data analytics professionals in Procurement industry in Dublin Ireland
  16. Mechanical Engineer
  17. System Architect, precision mechanics/machine building, Eindhoven (The Netherlands)
  18. Citrix engineer in Fribourg / Switzerland

You should try Social List if you haven’t!

Revisit Social List & Contact Finder in 2020

booleanstrings Boolean 4 Comments

Developed by “Sourcers Who Code,” Social List is worth your attention in 2020. Especially so, given how poorly LinkedIn is serving us (growing prices, reduced functionality, confusing UX, endless bugs, irresponsive and unknowledgeable customer support).

Social List is a sourcing tool that lets you instantly generate lists of target social profiles based on your requirements The tool searches for public profiles on LinkedIn, Github, Meetup, HackerRank, Research Gate, Google Scholar, XING, and more. You do not need any (paid or free) accounts anywhere to search. Run a filtered search – by job title, company, location, and more – and collect results precisely matching the search filters; enrich results with contact email addresses on request.

Basically, this is a structured search for the web. That is what makes the tool unique.

Designed with Sourcers and Recruiters (initially, just us) in mind, the tool can also serve Business Development and OSINT professionals.

Social List is complementary to other tools you already use for Sourcing. It is simple to use, and affordable, yet provides a productivity boost and surfaces new results, specifically, compared to LinkedIn Recruiter. It works in any location or industry. I use Social List daily in sourcing projects. We have used it in competitive intelligence searches as well (ask me how).

The technology Social List utilizes in the back-end is Google Custom Search Engines APIs. Through the APIs, Social List fetches precisely matching profile pages from Google’s index. We don’t keep any databases with information; we get results from Google, format as an Excel table, and deliver them to you (the tool is GDPR compliant).

An additional advantage in using APIs is that you won’t be “touching” LinkedIn in your searches and contact info look-ups (unless you start reviewing results, of course, which is a small volume and shouldn’t matter), so you are guaranteed to have no problems with LinkedIn.

Social List offers exporting lists of profiles that it finds. Many of our users start their sourcing here, by generating lists of prospects to explore (which can be done in seconds). Note that exporting gives you an additional way to filter through Excel, making reviewing results faster.

The 2019 addition to Social List was its Contact Finder, which looks for emails, phone numbers, and social links for a person based on his/her LinkedIn profile URL. (I use Contact Finder quite a bit when sourcing). Compared with others, the Contact Finder is competitive in the % of matches and pricing; it queries several databases, providing you with the best result. Note that it is not a Chrome extension and therefore LinkedIn cannot track it.

And here is a story from our development team. In the last few months, the CSE APIs have stopped working for LinkedIn profiles; nobody (including Google support) knows the reasons. (Feel free to inquire on technical details; it is a mystery). We went through a brief scare: since LinkedIn Agent is by far the most popular, we were facing closing the service down. Today I am happy to report that we have found workarounds, and LinkedIn Agent is now working fine! Whew.

Please note that we ask new users to provide a credit card when you sign up. (This is due to past issues with people misusing the Contact Finder, for which we pay). But then, you have a full seven-day trial and see how you like the tool. Ask me if you need help searching (also check out help pages for users). After the trial, Social List is subscription-based, month-to-month, and affordable.

A piece of advice: to get better results, search simply.

If you would like me to run a sourcing query for you, please ping me, and I will be happy to forward the results.

 

Webinar “Sourcing Skills Assessment and Development”, Now with Recording

booleanstrings Boolean 2 Comments

You are invited to a rare free webinar on Wednesday, January 22nd “Sourcing Skills Assessment and Development”.
[Edited: the webinar is over; please find the materials at the end of the article]
Are you interested in assessing your or your team’s sourcing skills? Or are you looking for interview questions for new Recruiter hires? Or have you decided to get Certified in Sourcing? Then this webinar is for you. Everyone is invited; please share it with your colleagues!
The webinar coincides with the release of our eBook “Sourcing Answers”, outlining our sourcing skills assessment methodology and providing 120 sourcing challenges and solutions. The book is a new addition to our exam-taking offering; we have been certifying professional Sourcers globally for eight years.
In the presentation, we will share our experience designing and grading exams, assessing recruiting teams, and lessons learned. We will share our methodology, six core areas of proficiency that we test, and sample exam questions from the book.
Everyone who signs up will have a chance to win the new eBook “Sourcing Answers” (one of five copies).
Wednesday webinar outline:
  • Sourcing Function and Metrics in Recruitment
  • Sourcing Skill Assessment Methodology
  • Six Core Areas of Competence
  • Assessment and Interview Aid: new eBook “Sourcing Answers” (120 Questions and Solutions)
  • Example Questions
  • Strengthen Your Skills – Get Educated
  • Get Certified
  • Assessment, Certification, and Education for Teams
Hope to “see” you there!
P.S. Thanks to everyone who came to the presentation! We only announced two days prior – and almost 600 people signed up, 300 logged in live, and we went through many questions (and answers) from the audience.
Access the materials here:

Tool Alert: OneSearch.com

booleanstrings Boolean Leave a Comment

Hello Sourcers:

A brand-new “privacy-first” web search engine OneSearch.com from Verizon is out.

Unlike Google, Bing, or Yandex, OneSearch does not have its own index. The search results are Bing-based.

There is no documentation on search operators, but it looks like the operator site: works, and so do a few more, including the Bing-only operator, contains:, which looks for pages with links to files of specific formats. Oddly, as Balazs has pointed out, OneSearch has trouble understanding NOT. I hope we’ll collectively solve that mystery!

The “privacy” aspect can be improved, as some critics say. What is of interest to us though, is that private search engines like OneSearch and DuckDuckGo are not putting users in a “filter bubble.” We can use them to widen our searches.

Web search has a drop-down of the page age restrictions; we can set different time intervals by manipulating the search URL (for example, replace &age=1m with &age=1y). I don’t know if it’s possible to restrict to a language or locality.

OneSearch has a nice set of image search options.

I have found it slightly annoying that OneSearch doesn’t consistently show and highlight keywords in the results. It could be Bing’s quality too; I do not use Bing that often and will need to run some tests. On the positive side, it seems much faster than Bing itself.

Please share your OneSearch observations in the comments or on our

Boolean Strings Facebook Group.

“Sourcing Answers” (Skills Assessment)

booleanstrings Boolean Leave a Comment

I am happy to announce a new eBook, “Sourcing Answers,” that David Galley and I have just e-published. We had initially conceived the book as an aid to prepare for our Sourcing Certification Exams. However, the book can also serve to help to assess your and your team’s Sourcing skills (whether your goal is to get certified or not) – or perhaps choose questions from the book to use at interviews when hiring Recruiters.

In our industry, assessment of Sourcing skills is vital due to the apparent lack of adequate performance measurements for a Sourcer. Assessments can improve Recruiting performance by identifying Sourcing skill gaps and taking action to fix them. Reviewing solutions and correct answers to the concrete challenges are always educational, as well. This book can serve as an aid in assessments.

By trying to solve the sample questions and comparing your solutions with the ones provided, you will improve your sourcing skills, knowledge, and confidence. The eBook offers “learning by doing” as well as “learning by example.” You would be experiencing optimal approaches to a task, the right way of thinking, and proper tool selections.

The questions in the book are just like those we give at our exams. There is no need for paid subscriptions to any sites. Each question should take you between one and six minutes to answer.

Our test-takers have reported feeling accomplished, having improved their skills and understanding of Sourcing – and having fun taking the exams! We hope you enjoy the questions as well. If you decide that you are ready to get certified, the next Exam week is January 25-31, 2020.

This is our third book, following “300 Best Boolean Strings” (now in its 4th edition) and “Sourcing Hacks” (2nd edition).

Happy Sourcing!

 

How to Search for Bios on Github

booleanstrings Boolean 1 Comment

 

Just like we can’t search for LinkedIn headlines within LinkedIn or by X-Raying, we can’t search for Github Bios – either within Github or by X-Raying. However, we can search for LinkedIn headlines with Custom Search Engines (CSEs). It turns out that we similarly can search for Github Bios with CSEs!

We will be searching using Github X-Ray CSE. I will start off providing sample search strings to look within Bios, then, will give some explanations.

Here you go. “GitHub Bio contains”:

You can change the arguments, add keywords, and combine with other Google’s and Custom Search Engine operators specific to Github. As you may have noticed, you can use the asterisk * for ANDs and comma , for ORs in the special operators.

You Can Stop Reading Now and Go Enjoy the Searches 🔎🔎🔎

But wait, I also want to tell you that our tool Social List uses CSE operators in the background, and you won’t need to write any operators – just enter your terms and collect results. Here is what a search looks like:

Check it out if you haven’t.

Now, if you are wondering how I came up with the horrible-looking operator more:p:metatags-og_description: (and what is behind the search algorithm in Social List), read on.

CSE – Special Advanced Syntax

Special CSE operators depend on the website and structure of its pages. More specifically, operators depend on what Schema.org, Microformats, and other objects and values are (invisibly) included in the pages’ source code.

The general CSE search operator format is this:

more:pagemap:<data-field-name>:<data-value>:<value>

where data-field-name is an object like Person, data-value is a value, such as “org” (i .e. organization, a Person’s employer), and value is a string like “IBM”finds pages containing the object Person with a matching “org” value.

Alternative syntax uses just p instead of pagemap:

more:p:<data-field-name>:<data-value>:<value>

Google.com doesn’t “understand” the more:… search syntax, but any Google Custom Search Engine does.

Objects and Values to Query

Objects (like Person of schema.org) and values (like employer=”IBM”) are invisibly included in web pages’ source code, in its part called “PageMap”. The big deal is – you can search within objects and their values using CSE operators. PageMap includes data following a variety of standards: Schema.org, Microformats, and others, and also a part called “Metatags”.

In our particular case, a GitHub Bio is stored in Metatags under the tag “og:description” (and is also duplicated under “twitter:description”). I found it by examining the JSON output from a CSE API call:

“metatags”: [
{
“viewport”: “width=device-width”,
“fb:app_id”: “1401488693436528”,
“twitter:image:src”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“twitter:site”: “@github”,
“twitter:card”: “summary”,
“twitter:title”: “garris – Overview”,
“twitter:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“og:image”: “https://avatars1.githubusercontent.com/u/447033?s=400&v=4”,
“og:site_name”: “GitHub”,
“og:type”: “profile”,
“og:title”: “garris – Overview”,
“og:url”: “https://github.com/garris”,
“og:description”: “Works at LinkedIn. Lives in Berkeley. Likes a nice hike. – garris”,
“profile:username”: “garris”,
“pjax-timeout”: “1000”,
“request-id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-host”: “collector.githubapp.com”,
“octolytics-app-id”: “github”,
“octolytics-event-url”: “https://collector.githubapp.com/github-external/browser_event”,
“octolytics-dimension-request_id”: “895E:41D8:7B30:E042:5D3F23D9”,
“octolytics-dimension-region_edge”: “iad”,
“octolytics-dimension-region_render”: “iad”,
“analytics-location”: “/\u003cuser-name\u003e”,
“google-analytics”: “UA-3769691-2”,
“dimension1”: “Logged Out”,
“hostname”: “github.com”,
“expected-hostname”: “github.com”,
“js-proxy-site-detection-payload”: “MTUyMTUyNGE4ODJhNTRkMmFkZGU3NjFlOTA5ZTllNTNmZDg1NzZmN2UwZTM1YzhlOWQ5YjAxNGEyZTBhMDk0Ynx7InJlbW90ZV9hZGRyZXNzIjoiNjYuMjQ5LjY2LjIxNiIsInJlcXVlc3RfaWQiOiI4OTVFOjQxRDg6N0IzMDpFMDQyOjVEM0YyM0Q5IiwidGltZXN0YW1wIjoxNTY0NDE5MDM1LCJob3N0IjoiZ2l0aHViLmNvbSJ9”,
“enabled-features”: “MARKETPLACE_FEATURED_BLOG_POSTS,MARKETPLACE_INVOICED_BILLING,MARKETPLACE_SOCIAL_PROOF_CUSTOMERS,MARKETPLACE_TRENDING_SOCIAL_PROOF,MARKETPLACE_RECOMMENDATIONS,MARKETPLACE_PENDING_INSTALLATIONS”,
“html-safe-nonce”: “1ef7c04a79f7c74d7ed950ed690d277292296f65”,
“browser-stats-url”: “https://api.github.com/_private/browser/stats”,
“browser-errors-url”: “https://api.github.com/_private/browser/errors”,
“theme-color”: “#1e2327”
}]

One last step and you will catch up with me on the subject. I am going to tell you how I obtained the JSON sample pasted above.

Running CSE API Calls

The APIs query CSEs from software code. It’s also possible to run an API call from your browser address bar.

Using the APIs requires obtaining a KEY (long coded string) from Google, available here. Input for an API call is a KEY, a CSE ID (a value you can copy from the Control Panel), a query string, and (optional) parameters.

You can run an API query from your browser in the following fashion:

https://www.googleapis.com/customsearch/v1?key=KEY&cx=CSEID&q=a

– it will look like this:

An API call produces a JSON-formatted output page that you can browse to figure out the operator formats.

While you can examine a page’s structure with various tools (including CSEs themselves), these API JSON outputs provide “the” most accurate information for assembling CSE operators.

Final Word

Querying structured info on the web is incredibly powerful. It may seem “too technical”, but that is mostly due to odd-looking strings of parameters that create that impression. (But you don’t need to “read” parameters, you just need to copy and paste.) Maybe one day, Google (or someone) will attach a friendly UI to Google CSEs’ structured web search. In the meantime, follow the links to search in Github Bios and definitely try Social List.

 

Hack: Use 500 Keywords, Not 32, on Google

booleanstrings Boolean Leave a Comment

Google’s limit of keywords is 32. It’s a challenge for long OR searches, especially for diversity sourcing – for example, searching for women’s first names, Latino last names, or diversity colleges. I am no fan of long ORs on Google (definitely not to list synonyms for a word), but in cases like the above, or searching for target companies or locations, I admit, long ORs would be useful, and 32 is limiting.

I have recently thought of a way to push the number of search terms much further. You can do it via Google Custom Search Engines (CSEs), its Synonyms feature.

Google and CSEs will automatically search for synonyms – it is a “built-in” feature. However, if you want to identify related words that may not quite be considered synonyms, the Synonyms mechanism in CSEs allows that. The limit is 500 terms and 10 synonyms for each term. Synonyms can be defined in a special XML file and uploaded. Keeping synonyms in a file and editing with a simple tool like Notepad++ seems more convenient than editing online.

You can define legitimate synonyms (for example, CV = “curriculum vitae” = resume) to help yourself and your end-users. But nobody will check whether the synonyms you enter are “correct”. You may want to play with the setting, defining words with different meanings as synonyms and see what happens. (Define “top sourcer” as <your name>? Just kidding.)

If you love long OR statements, you can enter up to 500 “synonyms” – some of which can be phrases – for an “artificial” keyword like mysynonyms, in the CSE Synonyms setting, and you will be able to push the limit of keywords from 32 to beyond 500!

By the way, David Galley and I have a CSE eBook in the works; stay tuned! In the meantime, check out our first two books, “300 Boolean Strings” and “Sourcing Hacks”.

Googling: Science or Art?

booleanstrings Boolean Leave a Comment

When Google started out, it had a database of indexed pages searchable by keywords and advanced search operators such as site:. Gradually, Google began adding semantic search features. (It has been reworking its storage, Index, accordingly, to contain “knowledge” type of data about stored pages). Here are the most significant semantic-oriented additions over the years.

  • A while ago, Google started searching for words with the same root (“auto-stemming”).
  • About five years ago, we also started seeing keyword synonyms in the results.
  • In recent years, Google has started showing featured previews and Knowledge Graph objects, in addition to search results. Pages containing structured information are rewarded by custom snippets (an example would be a page about a movie, shown along with a star rating).
  • In 2016, with the introduction of RankBrain, Google started to look at the query context.
  • A month ago, with the BERT update, Google has started paying attention to short words that it previously discarded as stop words, to discover query meaning.
  • Both RankBrain and BERT are AI-based, so they will work better with time.
  • And here is something new and interesting – Google is taking personalized search a step further, having just filed a patent on building user graphs.

Science vs. Art

Google keeps its support for operators, giving us control over search results, and at the same time is greatly expanding semantic features, providing us with the most relevant results. If you use only operator-based search, put your keywords in the quotation marks, or run all searches in the Verbatim mode, you are missing out on powerful semantic search capabilities. Running queries out of a saved spreadsheet or Boolean builder has never worked well but will provide even more limited results than before.

Semantic search functionality is not just for the “simple-minded” user who Googles with a few keywords; it’s applicable to advanced research. The trick for best search experience (how do you like the expression? 😉) is to take advantage of both operator-controlled searches and Google’s interpretation of searches. (It’s not either-or – Google interprets all queries, including those containing operators).

As a metaphor, advanced Googling is like a combination of Science and Art, with the Art part continuing to grow. In practice, Googling needs both the left and right sides of your brain activated. Googling requires knowledge of search operators but also intuition, creativity, curiosity, resourcefulness, and natural intelligence. Everyone has these qualities; it’s a matter of putting them to work.

Looking at multiple examples of creative Googling, reproducing them, and modifying to fit your practice are the fastest ways to up your Google-Fu. You can do so as soon as next week by attending

Advanced Google Sourcing Workshop.

and asking follow-up support questions. Hope to “see” you there!