Saturday, 22 December 2012

How to download the historical versions of pages from google cache

Well you can get the historical versions of web pages from the google cache. I am not sure how old are these, if any one of you knows and cares to look it up please let me know too :P ?

The query you need to use to get an older version of webpage www.mypage.com from Google Cache is as follows:
http://webcache.googleusercontent.com/search?q=cache:www.mypage.com

To download the page using the wget linux command, you need to pass user-agent as google doesn't allow wget to fetch data, so here is how you can automate it:

wget --output-document=out.htm --user-agent=AGENT --level=1 http://webcache.googleusercontent.com/search?q=cache:www.mypage.com

Interesting isn't it :)

No comments:

Post a Comment