Are You an Advanced Sourcer or Researcher? Can You Answer These 12 Questions?

booleanstrings Boolean 29 Comments

researcher

 

Hello Sourcers and Internet Researchers:

Here are some questions, that I would like to offer, based on just one collection of professional data, that has an interesting implementation, and that is – documents uploaded to LinkedIn by its members.

How good are you at understanding what data can be found and how?

Please post your answers as comments – and please provide your reasoning for the answer. Responses to only some of the questions are welcome. I anticipate a nice discussion here!

[Updated!]  We have launched a contest based on these questions!

Deadline: Monday October 24th.

Here you go:

If a LinkedIn member, say, Joe D. wants to upload a resume (or another document) to his profile from his PC, LinkedIn will ask him whether he would like to store the document on Slideshare or not. (If you haven’t, try it, you’ll see.)

Suppose Joe said “no” to posting the resume on Slideshare and uploaded it to the profile. When we view Joe’s profile, while logged-in, we now see Joe’s document’s preview as an image (or series of images if there is more than one page).

  1. Joe said “no” to storing on Slideshare, so where (to which site) did it go?
  2. Is a preview available on Joe’s public profile? That means, is it available when you are not logged-in?
  3. Can the preview image(s) be viewed in an incognito window, i.e. without logging into LinkedIn?
  4. [Difficult] Can we download the original resume or document (say, PDF or Word)? From which site? Please note, the question is about finding the original doc, not trying to recreate it by using “print to PDF” or character recognition. 10 points
  5. (Easy) What does LinkedIn call the part of the profile that stores those uploaded documents?
    • (5a, harder) Is there a URL pointing to that part of the profile (for logged-in members)?
  6. Can we find Joe’s profile by searching on LinkedIn “people search” …
    • (6a) for the resume keywords, if they are not included elsewhere on his LinkedIn profile?
    • (6b) for keywords in the title and description, that Joe adds when he uploads the document to the profile?
  7. Is there a URL that you can share with a colleague, for them to see the resume if the resume has more than one page, for example, three pages, – not on LinkedIn, but on that site, that stores the resume?
  8. Will Google find the uploaded original document if there are no other copies of it online? The options are “yes,” “no,” and “sometimes, when…”
  9. (This is a tough one!) Will Google find the uploaded original document’s image preview in the Image search? The options are “yes,” “no,” and “sometimes, when…”
  10. Can Yandex find Joe’s original resume?
  11. Can Bing find it?
  12. On which cloud is the document stored (Amazon, etc.)?
  13. BONUS Q: If a LinkedIn member opts-out of Slideshare when uploading a document, the document is still posted on Slideshare (as we now know) under some “user” account. With what email domain has that “user” registered with Slideshare? (you might want to find that user’s profile URL for starters). 10 points

To help you here’s a (randomly picked) example of a profile with an uploaded resume.

What say you? 🙂

I’ll reveal the answers in a future post.

-Irina

P.S. Just launched a formal contest around these questions. I think, these are great questions for anyone who searches, to understand the Surface, Deep, and Dark Web.

Comments 29

  1. Hi Irina,

    Answers for the above post,

    1) Even if user says “no” to storing on Slideshare it will go to slideshare, but it will not be updated on slideshare profiles.

    2) No ( Preview will be available only for logged in users ).

    3) Yes preview image(s) can be viewed in an incognito window.

    4) Yes, we can download resumes as pdf or doc from slideshare.net.

    5) Add Media
    5a) Yes (Link under Add media for URL pointing)

    6) 6a) No (It will not consider)
    6b) No (It will not consider)

    7) Yes (We can give embed_code of slideshare link)

    8) No

    9) Sometimes (when its uploaded or saved in slideshare)

    10) No

    11) No

      1. 13) Slideshare account is created automatically based on linkedin registered email address, if they are not having account on slideshare.

  2. My answers are mostly wrong.

    1. Still goes to slideshare http://www.slideshare.net/secret/
    2. Yes if we are connections
    3. No
    4. I like this option
    https://www.slideshare.net/slideshow/embed_code/key/Gmo5sXbAxhFzXR
    Either print to PDF or save to Google Drive and convert

    5a. CreativeWork
    5b. https://media.licdn.com/media-proxy/ext

    6a. No
    6b. No

    7. https://image-private.slidesharecdn.com
    https://www.slideshare.net/slideshow/embed_code/key/

    8. Sometimes when it is indexed. Finding TOP Presentations here linkedin.com/topic/

    9. Sometimes when they are indexing slideshare and an https://image-private.slidesharecdn.com has not expired

    10 & 11. No clue ran out of time

    1. 7- After viewing a resume, scroll to the end and right click on “view again” to copy the url to track back to a resume.
      https://www.slideshare.net/slideshow/embed_code/key/NyYmwYV23Y1W6T#

      Then, run a view-source command, entering this into your browser:
      view-source:https://www.slideshare.net/slideshow/embed_code/key/NyYmwYV23Y1W6T#

      Using the view-source command, I can find the exact url for the file listed above, https://www.slideshare.net/linkedincontent_4a5/mcl2016or-pdf but if the media is private, it will not be shown.

      To find the url for the private image, scroll through the source information until you find a url that begins with https://image-private.slidesharecdn.com like below –
      https://image-private.slidesharecdn.com/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/95/slide-1-638.jpg?hdnea=exp=1476394099~acl=/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/95/slide-1-638.jpg*~hmac=b7db717ebe8458602d3f25e0b754b644049704e3111a527c7f03233b1f322161&cb=1452663401

      To check for multiple pages, look for and so forth, each page will have a unique url. Thus, the second page of this resume is url – https://image-private.slidesharecdn.com/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/85/slide-2-320.jpg?hdnea=exp=1476394099~acl=/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/85/slide-2-320.jpg*~hmac=3ec5512cad61f6d688862bbcac110d9f47813df2f37c2b52151fb0f11f91d4dd&cb=1452663401

    2. #4 – when we open preview on Linkedin we can open image in new tab by right clicking.

      https://image-private.slidesharecdn.com/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/95/slide-1-638.jpg?hdnea=exp=1476630176~acl=/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/95/slide-1-638.jpg*~hmac=fc578b1e69c694c31722d663a4c8d6642e32462145074dba514bee3729f5c6b9&cb=1452663401

      this is the image with expiration and hash. We can replace the domain as shown and remove everything after ? to get the permanent address

      https://image.slidesharecdn.com/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/95/slide-1-638.jpg

      Alternatively you can get the embed code from the source with the proper key
      https://www.slideshare.net/slideshow/embed_code/key/NyYmwYV23Y1W6T

      In Dev Tools > Sources > image-private.slidesharecdn.com > 95, you have the small and larger versions of the 2 pages where they can easily be saved as jpg (1024 x 768)

      I don’t think the original file exists once it is uploaded, only the rendering of it. As Katie has shown, you can get the link to the the PDF, but only if it is not marked as private content.

      #8 – The image title is saved as metadata, so the images may be indexed with this data. Their OCR data appears to index other words in the text as well.

      #12 – Amazon S3 buckets are should not be indexed and should not be searchable.

  3. Brain break! Here’s what I have so far…

    1 – It is still stored within slideshare, either image-private.slidesharecdn.com or cdn.slidesharecdn.com
    2 – yes, if you are logged in to Linkedin
    3 – no
    4 – https://www.slideshare.net/search/slideshow
    5 – media
    5a – https://media.licdn.com/mpr/mpr/filename.filetype
    example of the above: https://media.licdn.com/mpr/mpr/shrink_100_100/AAEAAQAAAAAAAAIRAAAAJGM1NjNjYWJlLTgwN2EtNDk5OS1hNzNlLTk5NTM5NTU4MDA2Mg.jpg

  4. #13 the profile for Matthew or (in aggregate, profiles on slideshare with media)? Matthew’s is email domain is gmail.

  5. 8 – sometimes, when the document is uploaded and made public
    9 – yes, if google can find the document, then the image is also available and a preview image can be viewed.
    10 – yandex does not appear to index this content
    11 – yes, bing indexes this content

  6. 1. Item will be stored on Slideshare
    2. Yes, if connected
    3. No
    4. Can be downloaded directly from Slideshare
    5. Professional Gallery
    6. a no
    6. b yes
    7. Yes, on Slideshare
    8. Yes,
    9. No
    10. no
    11. no
    12 IDK

  7. 1.It’s stored on the “content delivery network” of Slideshare, e.g. (private review) image-private.slidesharecdn.com or (public preview) image.slidesharecdn.com or (files) cdn.slidesharecdn.com, similarly others *.slidesharecdn.*
    2. Nope. Have to be logged into LinkedIn to get the preview links. Although, if you do already have the link to the preview then that is a publically viewable link, e.g. https://goo.gl/6kU3f8

    3. Yes, if you already have a link to the preview, e.g. https://goo.gl/8tx8Ly

    4. You can only download original documents published publically, private files are print to PDF. Find the full Slideshare link in the SourceCode of the preview (Inspect): <iframe src="https://goo.gl/FhngMO&quot; …

    Sometimes this works for private files (kinda buggy): https://goo.gl/3cvKMi

    5. LinkedIn Engineers tagged it “treasury”. The URL sub-domain calls it “portofolio”. LinkedIn Help call it Summary, Education, and Experience sections on your profile (https://goo.gl/jhfuz6)

    5a. Absolutely click on share on any piece of portofolio and you will see something like this: https://goo.gl/9tJXmr

    Similarly you can download people’s profile’s into a PDF like this: https://goo.gl/6wIf52

    Or view person’s profile using their UID: https://goo.gl/ms3TyA

    6a. No
    6b. No

    7. Yes. Simply change a number that you think most likely is a page number which usually start with 1 and is at the end:

    https://goo.gl/3bNnS6
    https://goo.gl/7AJgkT

    Or just simply:

    https://goo.gl/Mcx2lJ
    https://goo.gl/gRXTzR

    8. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/pLqycL (first one, visit page)

    9. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/pLqycL (first one, visit image)

    10. No luck, I don’t know how to use it. 😛

    11. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/VgKAsf

    12. It used to be AWS (https://goo.gl/AtN42C), but now it’s a combinations of their Expresso platform (https://goo.gl/lTV28V) running from leased Data Centers (https://goo.gl/kFeKZd)

  8. Post
    Author

    Hey Sourcers:
    Monday October 24, 2016 is the deadline.
    Here is the run down on responses so far. Question #1-3 – we have the answers. Nobody has answered the question #4 correctly!!! Raising to 10 points. We did get the answer for #5 #6, #7. (you can still respond with the correct answers and get points; those who submitted first get an extra point). #9 had people say related things, but need better explanation – so still open.
    I have added a bonus question, also 10 points.

    Best of luck!

  9. Updated #4 WIP:

    Current progress: signature doesn’t match @ https://goo.gl/pgSVwI

    (To reproduce this and get a new AWS file request with a non-expired request, download any file from Slideshare and get the URL while monitoring Network. Then, try to put the APIAccessKey into a API call or just brute force the signature #LOL)

    API Key: AKIAJ6D6SEMXSASXHDAQ

    Inspect the Sources and look for embedded content key folder and check its meta in the source code you will find:

    slideshow_url: https://www.slideshare.net/linkedincontent_4a5/mcl2016or-pdf

    More info here: https://goo.gl/23JR1o

    slideshow_id=56988771
    user id=linkedincontent_4a5
    filename=mcl2016or-pdf

    Good Read: https://goo.gl/KfnixI

    Try this method? https://goo.gl/QveIWX

    Updated #12: I was wrong. All documents are still stored on AWS.

    13. My guess is LI registers the user with the primary LI email (similar to SSO) unless the user already has an account on Slideshare with a different email. If not registered then user by default is: http://www.slideshare.net/secret/

      1. Post
        Author
    1. Post
      Author
  10. #Update 9: Yes, if the file has then publicly been uploaded to Slideshare through Slideshare, without changing the filename. The generated preview image obviously will be the same. Googlebot will index some of the preview images when people X-ray into LinkedIn from Google while being logged into LI and then click to see the image. Sometimes Googlebot happens to crawl and index the image.

    #Update 13: domain=linkedin.com

Leave a Reply to booleanstrings Cancel reply

Your email address will not be published. Required fields are marked *