Hello Sourcers and Internet Researchers:
Here are some questions, that I would like to offer, based on just one collection of professional data, that has an interesting implementation, and that is – documents uploaded to LinkedIn by its members.
How good are you at understanding what data can be found and how?
Please post your answers as comments – and please provide your reasoning for the answer. Responses to only some of the questions are welcome. I anticipate a nice discussion here!
[Updated!] We have launched a contest based on these questions!
Deadline: Monday October 24th.
Here you go:
If a LinkedIn member, say, Joe D. wants to upload a resume (or another document) to his profile from his PC, LinkedIn will ask him whether he would like to store the document on Slideshare or not. (If you haven’t, try it, you’ll see.)
Suppose Joe said “no” to posting the resume on Slideshare and uploaded it to the profile. When we view Joe’s profile, while logged-in, we now see Joe’s document’s preview as an image (or series of images if there is more than one page).
- Joe said “no” to storing on Slideshare, so where (to which site) did it go?
- Is a preview available on Joe’s public profile? That means, is it available when you are not logged-in?
- Can the preview image(s) be viewed in an incognito window, i.e. without logging into LinkedIn?
- [Difficult] Can we download the original resume or document (say, PDF or Word)? From which site? Please note, the question is about finding the original doc, not trying to recreate it by using “print to PDF” or character recognition. 10 points
- (Easy) What does LinkedIn call the part of the profile that stores those uploaded documents?
- (5a, harder) Is there a URL pointing to that part of the profile (for logged-in members)?
- Can we find Joe’s profile by searching on LinkedIn “people search” …
- (6a) for the resume keywords, if they are not included elsewhere on his LinkedIn profile?
- (6b) for keywords in the title and description, that Joe adds when he uploads the document to the profile?
- Is there a URL that you can share with a colleague, for them to see the resume if the resume has more than one page, for example, three pages, – not on LinkedIn, but on that site, that stores the resume?
- Will Google find the uploaded original document if there are no other copies of it online? The options are “yes,” “no,” and “sometimes, when…”
- (This is a tough one!) Will Google find the uploaded original document’s image preview in the Image search? The options are “yes,” “no,” and “sometimes, when…”
- Can Yandex find Joe’s original resume?
- Can Bing find it?
- On which cloud is the document stored (Amazon, etc.)?
- BONUS Q: If a LinkedIn member opts-out of Slideshare when uploading a document, the document is still posted on Slideshare (as we now know) under some “user” account. With what email domain has that “user” registered with Slideshare? (you might want to find that user’s profile URL for starters). 10 points
To help you here’s a (randomly picked) example of a profile with an uploaded resume.
What say you? 🙂
I’ll reveal the answers in a future post.
P.S. Just launched a formal contest around these questions. I think, these are great questions for anyone who searches, to understand the Surface, Deep, and Dark Web.
Answers for the above post,
1) Even if user says “no” to storing on Slideshare it will go to slideshare, but it will not be updated on slideshare profiles.
2) No ( Preview will be available only for logged in users ).
3) Yes preview image(s) can be viewed in an incognito window.
4) Yes, we can download resumes as pdf or doc from slideshare.net.
5) Add Media
5a) Yes (Link under Add media for URL pointing)
6) 6a) No (It will not consider)
6b) No (It will not consider)
7) Yes (We can give embed_code of slideshare link)
9) Sometimes (when its uploaded or saved in slideshare)
12) Documents is stored in Amazon S3 (AWS)
13) Slideshare account is created automatically based on linkedin registered email address, if they are not having account on slideshare.
1. Docs are still saved on a slideshare server https://cdn.slidesharecdn.com// etc
4. yes Slideshare
6. 6a. no 6b. Seems not to (but not sure as I uploaded something not a long time ago so maybe it wasn’t indexed yet?)
7. Yes and it’s a shortened url lnkd.in
9. Seems to not…
My answers are mostly wrong.
1. Still goes to slideshare http://www.slideshare.net/secret/
2. Yes if we are connections
4. I like this option
Either print to PDF or save to Google Drive and convert
8. Sometimes when it is indexed. Finding TOP Presentations here linkedin.com/topic/
9. Sometimes when they are indexing slideshare and an https://image-private.slidesharecdn.com has not expired
10 & 11. No clue ran out of time
7- After viewing a resume, scroll to the end and right click on “view again” to copy the url to track back to a resume.
Then, run a view-source command, entering this into your browser:
Using the view-source command, I can find the exact url for the file listed above, https://www.slideshare.net/linkedincontent_4a5/mcl2016or-pdf but if the media is private, it will not be shown.
To find the url for the private image, scroll through the source information until you find a url that begins with https://image-private.slidesharecdn.com like below –
To check for multiple pages, look for and so forth, each page will have a unique url. Thus, the second page of this resume is url – https://image-private.slidesharecdn.com/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/85/slide-2-320.jpg?hdnea=exp=1476394099~acl=/89d604e1-cf8c-4e54-b63f-2915c43ced71-160113053635/85/slide-2-320.jpg*~hmac=3ec5512cad61f6d688862bbcac110d9f47813df2f37c2b52151fb0f11f91d4dd&cb=1452663401
#4 – when we open preview on Linkedin we can open image in new tab by right clicking.
this is the image with expiration and hash. We can replace the domain as shown and remove everything after ? to get the permanent address
Alternatively you can get the embed code from the source with the proper key
In Dev Tools > Sources > image-private.slidesharecdn.com > 95, you have the small and larger versions of the 2 pages where they can easily be saved as jpg (1024 x 768)
I don’t think the original file exists once it is uploaded, only the rendering of it. As Katie has shown, you can get the link to the the PDF, but only if it is not marked as private content.
#8 – The image title is saved as metadata, so the images may be indexed with this data. Their OCR data appears to index other words in the text as well.
#12 – Amazon S3 buckets are should not be indexed and should not be searchable.
Brain break! Here’s what I have so far…
1 – It is still stored within slideshare, either image-private.slidesharecdn.com or cdn.slidesharecdn.com
2 – yes, if you are logged in to Linkedin
3 – no
4 – https://www.slideshare.net/search/slideshow
5 – media
5a – https://media.licdn.com/mpr/mpr/filename.filetype
example of the above: https://media.licdn.com/mpr/mpr/shrink_100_100/AAEAAQAAAAAAAAIRAAAAJGM1NjNjYWJlLTgwN2EtNDk5OS1hNzNlLTk5NTM5NTU4MDA2Mg.jpg
6a – no
6b – no
1. It still goes to Slideshare which is stored privately in https://image-private.slidesharecdn.com/
2. No, preview only available for logged in users.
3. Yes, copying the image address.
12. Amazon Web Services
#13 the profile for Matthew or (in aggregate, profiles on slideshare with media)? Matthew’s is email domain is gmail.
scratch that. the domain is LI.i18n.register
#12 Amazon Web Services
revised #4 – https://www.slideshare.net/linkedincontent_4a5/mcl2016or-pdf
5 – document
5a – http://www.slideshare.net/clrmepurple/documents
4, another expression of same url format, but this content is public – http://www.slideshare.net/clrmepurple/resume-overview
8 – sometimes, when the document is uploaded and made public
9 – yes, if google can find the document, then the image is also available and a preview image can be viewed.
10 – yandex does not appear to index this content
11 – yes, bing indexes this content
1. Item will be stored on Slideshare
2. Yes, if connected
4. Can be downloaded directly from Slideshare
5. Professional Gallery
6. a no
6. b yes
7. Yes, on Slideshare
1.It’s stored on the “content delivery network” of Slideshare, e.g. (private review) image-private.slidesharecdn.com or (public preview) image.slidesharecdn.com or (files) cdn.slidesharecdn.com, similarly others *.slidesharecdn.*
2. Nope. Have to be logged into LinkedIn to get the preview links. Although, if you do already have the link to the preview then that is a publically viewable link, e.g. https://goo.gl/6kU3f8
3. Yes, if you already have a link to the preview, e.g. https://goo.gl/8tx8Ly
4. You can only download original documents published publically, private files are print to PDF. Find the full Slideshare link in the SourceCode of the preview (Inspect): <iframe src="https://goo.gl/FhngMO" …
Sometimes this works for private files (kinda buggy): https://goo.gl/3cvKMi
5. LinkedIn Engineers tagged it “treasury”. The URL sub-domain calls it “portofolio”. LinkedIn Help call it Summary, Education, and Experience sections on your profile (https://goo.gl/jhfuz6)
5a. Absolutely click on share on any piece of portofolio and you will see something like this: https://goo.gl/9tJXmr
Similarly you can download people’s profile’s into a PDF like this: https://goo.gl/6wIf52
Or view person’s profile using their UID: https://goo.gl/ms3TyA
7. Yes. Simply change a number that you think most likely is a page number which usually start with 1 and is at the end:
Or just simply:
8. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/pLqycL (first one, visit page)
9. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/pLqycL (first one, visit image)
10. No luck, I don’t know how to use it. 😛
11. Sometimes, when indexed and file has been shared publically and you are searching e.g. https://goo.gl/VgKAsf
12. It used to be AWS (https://goo.gl/AtN42C), but now it’s a combinations of their Expresso platform (https://goo.gl/lTV28V) running from leased Data Centers (https://goo.gl/kFeKZd)
Revised link to #4 take the source of iframe :<iframe src="https://goo.gl/JO5NRt" …
Monday October 24, 2016 is the deadline.
Here is the run down on responses so far. Question #1-3 – we have the answers. Nobody has answered the question #4 correctly!!! Raising to 10 points. We did get the answer for #5 #6, #7. (you can still respond with the correct answers and get points; those who submitted first get an extra point). #9 had people say related things, but need better explanation – so still open.
I have added a bonus question, also 10 points.
Best of luck!
Updated #4 WIP:
Current progress: signature doesn’t match @ https://goo.gl/pgSVwI
(To reproduce this and get a new AWS file request with a non-expired request, download any file from Slideshare and get the URL while monitoring Network. Then, try to put the APIAccessKey into a API call or just brute force the signature #LOL)
API Key: AKIAJ6D6SEMXSASXHDAQ
Inspect the Sources and look for embedded content key folder and check its meta in the source code you will find:
More info here: https://goo.gl/23JR1o
Good Read: https://goo.gl/KfnixI
Try this method? https://goo.gl/QveIWX
Updated #12: I was wrong. All documents are still stored on AWS.
13. My guess is LI registers the user with the primary LI email (similar to SSO) unless the user already has an account on Slideshare with a different email. If not registered then user by default is: http://www.slideshare.net/secret/
13. Update, the user is usually linkedincontent_*, and it seems to be registered with slidesharecdn domain.
Just take another look? 🙂
What is the answer to #4?
Can you explain the answer?
What is the user profile URL?
#4 is evernote and the exact image name is slide-1-638.jpg
email domain would follow the structure of [email protected]
#Update 9: Yes, if the file has then publicly been uploaded to Slideshare through Slideshare, without changing the filename. The generated preview image obviously will be the same. Googlebot will index some of the preview images when people X-ray into LinkedIn from Google while being logged into LI and then click to see the image. Sometimes Googlebot happens to crawl and index the image.
#Update 13: domain=linkedin.com