Sites that let you dig up the Past

by Mahesh Kukreja · 1 comment

in Webmaster Tutorials


Every online search engine has to face the inevitable question: Can it top Google? But I’m going to tell you a few sites which will help you unearth data from the past.

These online search utilities, instead of crawling through current web pages, sift through archived data.

1. Cached Search

The quickest way to dig up dated content on web is using the cached search results on search engines like Google, Bing, Yahoo, etc.

To directly search a cached page on Google, just enter cache: www.examplesite.com/examplepage. You’ll be directly taken to cached page of the entered URL. For Bing and Yahoo search, you need to enter the URL as query and select Cached besides the page you want to view as cached.

If these don’t bring up the results you were looking for, try switching to specialized search engines like Exalead and ScrubTheWeb. They work just like regular search engines, but store cached content for much longer (up to seven months).

2. Web Archives

For older content, try Archive.org, they carry snapshots of websites, news print archives and old research papers from early 1996.

The main aim of this resource – according to its founder Brewester Kahle – is to “help people make sense of the world and give accountability to what’s been published before”.

To do this, the archive regularly releases a robot program called Heritrix, which collects data from about 4 billion sites in each crawl. These are then saved “Wayback Machine”.

Looking for music from the seventies, or classics perhaps? Well, the archive is a treasure of over 100,000 individual shows by thousands of bands – including Beatles and Pink Floyd – freely downloadable. The site also contains old-time radio shows and large number of songs recorded from 78 rpm records.

Internet Archive also has a section for space buffs who want to look at images from NASA’s early missions such as Apollo Program at nasaimages.org.

3. Book Search

Two resources – Open Library and Project Gutenberg – catalogue books and even provide access to electronic copies; all free of cost. The free material includes plays and poems of Rabindranath Tagore, The Golden Threshold by Sarojini Naidu, War and Peace by Leo Tolstoy, the works of William Shakespeare, including classics such as Hamlet, Romeo and Juliet, etc..

The e-books available on these sites have been released by the authors under Creative Commons License.

Open Library has around 20 million records and Project Gutenberg has over 33,000 free e-books that can be downloaded freely on PC or e-book readers like Kindle, or Apple iDevices.

Subscribe to our mailing list

* indicates required

{ 1 comment… read it below or add one }

Kirsty March 7, 2011 at 7:36 am

I know a lot of people that have business sites are taking consideration of the cached sites. I know a business owner who do competitor checking and also getting details of the cached search of the competitors sites.

Reply

Leave a Comment

Previous post:

Next post: