A “Dream Software” Design Proposal

booleanstringsBoolean 4 Comments

Disclaimer. This post is somewhat technical and doesn’t contain specific sourcing tips. It is relevant to my SourceCon Presentation, where I go over a specific kind of sourcing tools. Those tools are apparently gaining attention among recruiters. I am going to post detailed reviews of the tools – listed at the bottom of this post for your reference – here on the blog over the next few weeks.

Here’s a “Dream Software” Design Proposal. Both the most challenging and the key piece for the dream software is connecting the parts of a distributed profile. End-users of the dream software don’t appreciate the challenge! For a human, the connection between two online profiles can be clear, while for the computer it is not as easy, since all of the informal clues need to be formally coded.

If the dream software vendor is reasonably careful and tries not to glue profiles together unless it’s very clear that they should be, the end-user will complain about duplicates. If the vendor is boldly making guesses, then profiles from different individuals will be incorrectly collected as one record, which is, in fact, even worse. (My coworker David Galley tests out the software using his own name; if you are one of the vendors, I recommend to try it out using your tool.)

In the proposed design we solve the “matching profiles from different sites” challenge upfront, by only working with unique identifiers, such as an email address, either work or private, a phone number, a combination of a person’s name and a company name that fits only one person, or an image.  If a company uses an email format, then for not-very-common first-and-last names we can reliably construct the work email address that can be verified (using the logic like this) in the process of creating a record. An excellent identifier is a person’s photo that is often the same across different social profiles.

We start building the database with those identifiers. There’s a variety of ways to collect those from the open web. As an example, we could start with recent resumes posted online and get email addresses from them as the IDs. That would collect a very large number of those IDs. (Remember our sourcing challenge asking “How many resumes are there on the Internet?”) There are also sites that list attendees, members, etc. – as we teach each other in people sourcing discussions and classes. There are lists of professionals with contact info in excel and PDF files. If that’s not enough, there are obscured email addresses across email list archives and the like.

From the unique IDs we go to various social networks and blogs to pick additional information by cross-referencing. We know that an email address identifies the member on all major networks, including LinkedIn, Twitter, Facebook, Google+, and more. If we can be friends with Rapportive/LinkedIn, or just with LinkedIn, we get a head start on cross-referencing. In fact, having an agreement with LinkedIn is especially important; worst case, if this is not accomplished, a public LinkedIn profile can be picked dynamically.

For any social profile that lists other profiles – that often happens on Google+, but not only – we add those profiles to the person’s record as well. Mind you, we are still confident that it’s the same person’s social profiles.

We don’t do much else. Rather, we carefully parse and collect the info obtained by cross-referencing into our database and provide reasonable faceted search for the end-user. Parsing can be specifically implemented for a few dozen social networks and forums (which we’ll need to watch for updates of the HTML formats). For online resumes we can rely on a resume parsing tool.

If there are other proven ways (not to guess but) to cross-reference more social profile data from the already-collected data in the records, we’d implement that as well.

While every People Sourcer and all the Dream Software tools do cross-referencing, we’ll need to be extremely careful about privacy issues and explore how to best address them.

If anyone is up for funding the proposal, just give me a ring, will you?

Thanks! and I will be reviewing the existing Dream Software tools in the upcoming blog posts. I will also be sharing an additional design idea for the existing tools that comes directly from experiencing the LinkedIn’s Talent Pipeline.

In the meantime, please take a look at some of the tools (repeating, just in case: I am not affiliated with any):

  1. TalentBin
  2. Dice Open Web
  3. TheSocialCV
  4. Entelo
  5. RemarkableHire
  6. Gild

Please stay in touch about your experiences!


Comments 4

  1. Great post Irina! We’re happy you’re drawing attention to the challenges of matching of social profiles. While other aspects of our platforms get more coverage in our marketing, this aspect is a challenging and very important feature that, if done poorly, results in some really confusing and frustrating results for our customers. At RemarkableHire, we tend to error on the side of conservatism, as we believe that it’s slightly preferable to showing two profiles for the same individual versus the alternative of merging two different people into the same profile. Humans can identify and accomodate the former, but the latter can result in frustrated sourcers, annoyed candidates, or worse.

  2. I’m on the go so will post more later but suffice to say Irina, you’ve hit the nail on the head in terms of one of the biggest challenges in building the “Dream Software”. We’ve been very focused on “data quality” (our internal term for this challenge) from day one and, based on our spot checks and those of other customers who’ve evaluated Entelo vs. other tools, we feel strongly that we have the best tool in the market. Our goal is not to max out the absolute number of results returned for given search (incidentally, poor matching can often lead to more search results which looks impressive until you dig into the quality of those results). Rather, it’s about making sure that profiles are matched with utmost accuracy and that relevant people are returned for a given search.

    We should discuss a longer analysis of this as I’d be happy to share more of my thoughts. In the meantime, I did run a search for “David Galley” just now and we return accurate LinkedIn, Github, Google+, Twitter, StackExchange, Facebook, personal website and email address for him. Not sure where else he’s hanging out online but this was a pretty successful spot check. 🙂

    Looking forward to continuing the conversation!

  3. Excellent post. The problem with unique identifiers is that they are considered sensitive information by the sites where you are collecting the information. Sites like LinkedIN can not openly provide that information unless that individual OK’s it. For instance Linkedin’s own API is vary easy to use and sign up for, but the user still has to make it’s email address available to you as seen here: http://developer.linkedin.com/documents/profile-fields#email

    Unless you can convince the user to give their permission to use their unique identifiers by authenticating, any software that you create will be nothing more than an aggregator with their inherent inaccuracies.

    1. Post

      Thank you, Jason! It’s true in many ways and is a (or “the”) point to be addressed. However, Rapportive is owned by LinkedIn and does exactly *that* without asking a permission, which provides some food for thought…

      Also what I should have pointed out in the design description is that the IDs should not be easily visible. They can be used on the back end but should not be available for mass-exporting.

Leave a Reply

Your email address will not be published. Required fields are marked *