Guest post by Glenn Gutmacher
In a recent post, Irina described an intriguing follow-up to a discovery made by Dan Russell. In short, the initial discovery was that Google used a completely different index to store its web-crawled image results from its index of regular webpages. What Irina realized is that the same query run on Google Images could yield very different results than on regular Google search.
Why that’s significant, as she explained, is that you might get only a limited number of results from, say, a regular LinkedIn x-ray (site: search), but if you ran that same search on Google Images, you’d find many additional relevant results missing from the regular results. Using simple web scraping and filtering in Excel, you can quickly get all the unique results out of the combined set. It often exceeds 1,000 results in total — the maximum that Google used to return in the old days, but rarely displays even a third of that today!
SO WHAT’S NEW TO LEARN?
Why was I invited to write this guest post? Because Irina and I discussed whether this phenomenon might be the case on other search engines, and I agreed to help prove it on the other biggie known for good LinkedIn search results: Bing.com.
I used Bing’s default Safe Search: Moderate filter in all cases. For Bing’s default (“All”) search, I obtained 997 results for her same query of site:www.linkedin.com/in “registered nurse” dallas tx
Note that the total is significantly more than the approximately 350 that Irina reported for this query on Google (I got 313 results when I googled it). This alone might motivate us to use Bing for more LinkedIn searching!
However, it gets even more exciting when you add in Bing images search. To be as apples-to-apples as possible, I initially used Bing Images’ filters to try to match the same criteria/settings that Irina used in her Google test: People à All (this gets faces as well as photos) and image size of 200 x 200:
This yielded 485 results. However, I found that searching images with the same query but without any filters yielded a few more results (total of 507, or 22 additional), but all remained solely LinkedIn individual people profile results, given the specificity of the site:linkedin.com/in portion of the query.
THE BIG REVEAL
Now for the amazing conclusion: of the 507 Bing image search results and 997 regular Bing search results, the overlap of URLs was only 10. Yes, ten! So that provides 1,504 total profiles, and the astonishingly low 1% overlap is remarkably similar to what Irina found in her Google test.
I should note there were a handful of false positives in both the regular and image search results where the profile was not the page of a registered nurse nor someone in Dallas. However, it wasn’t really an error because if you looked in the “People also viewed” right-hand column, there was inevitably one Registered Nurse and yet somebody else with a Dallas TX location! In any case, this doesn’t take away from the fact that the LinkedIn results generated from the image search were almost all *completely different* than the regular results.
So it appears that both Google and Bing are harvesting and processing images in a completely different way than other content, and it would behoove all sourcers to search each filetype if the goal is exhaustive, unique results for your query.
My thanks to Irina for inviting me to write this, and I hope we can do it again sometime on another sourcing topic!
Editor’s Note: If you are not familiar with the common methods to download and deduplicate results like this, you can learn to use scraping tools in our class on December 8, 2020.
About the Author: Glenn Gutmacher was one of the early online sourcing trainers (back when it was called “internet recruiting”) and started one of the first job/resume boards for a New England newspaper chain in the ’90s. Since the new millennium, he has been a full-time sourcer or sourcing manager in multi-year stints at several multibillion revenue companies including Getronics, Microsoft, Avanade and currently State Street Corporation.