June 11, 2007

The Wayback Machine


The Wayback Machine lets you visit archived versions of web sites. Would you like to see how LIAS looked 10 years ago? No problem! Go there and type in the web address: www.lias.psu.edu

How did the Wayback Machine get created?
In 1996, a non-profit organization called the Internet Archive (IA) was started by Brewster Kahle, a man famous for internet accomplishments. Kale’s stated goal is “Universal Access to all Knowledge.” The Wayback Machine is the name of IA’s snapshot archives of the web. In addition to web pages, IA maintains extensive collections of digital media (moving images, audio, text, etc.). This Welcome statement from IA’s web site is very much in keeping with Kahle’s notion of “universal access”:

The Internet Archive is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public.

How Big Is It?
85
billion web pages, from 1996-present. The Wayback Machine currently contains more than twopetabytes” of data! To give a sense of scale, IA reports, “This eclipses the amount of text contained in the world's largest libraries, including the Library of Congress.”

Is Everything on the Wayback Machine??
Some things are not included. Owners of pages can choose not to have their pages added. Also, sites that are database driven, generate dynamic web pages or have robots to exclude capturing cannot be archived. Only publicly accessible web pages are included; pages that require a password or those o secure sites are not included.

Obviously, Wayback is Special, But How Special??
Research indicates that the average life of a web page, in terms of accuracy or relevancy, is less than 100 days. In addition, much now gets published to the Web in lieu of paper. The Web is a real, and often fleeting, inventory of information. That’s why Kahle saw the need to back up the Internet, at a time when there were only 50 million or so URLs.

Archive-It
A parallel subscription-based service is called Archive-It.
According to their website, Subscribers to Archive-It can create distinct full-text searchable Web archives called "collections", containing only the content they are interested in harvesting, at whatever frequency suits their needs. The subscriber can catalog and manage collections created with Archive-It. None of these features are available in the general archive, accessed via the Wayback Machine.

No comments: