My story from crawling Onedrive for a week
I’m a fan of OneDrive, this is Microsoft’s answer to cloud storage for consumers and although it has been re-branded a few times the data that sits there haven’t.
Last week I spend a few hours every night doing research and discovering some interesting data that live exposed on the public.
My plan of attack
For this experiment I decided to not use a custom crawler but take advantage of Google’s advance search option.
You can simply pass the site you want to search Google’s index against and your keywords, as you can see below I’m using skydrive.live.com for the site and “Vegas” for my in URL search.
The results are tons of pictures, articles, zip files, documents with the keyword Vegas in them. Now ok that’s kind of cool but you can start seeing the picture here.
What if I’m able to get personal and private information using this method? Well it turns out its pretty easy, this data sits on the public waiting for people to take advantage of them - and trust me they do.
After a few experiments and tons of data, I was able to get personal data like SSN, 401K, Names, DOB and various data that could be used maliciously - I will not be posting this information here.
My intent is not to exposure user’s personal information, but want to make you aware how a simple mistake can cost you in the long run.
Review your online presence, especially your cloud storage solutions and ensure you have the correct permissions - don’t share with the public unless it’s harmless.
OneDrive should implement a no-crawl policy (ex. via robots.txt) against its site(s), it would make this less discoverable.
I hope you enjoyed this.