Thursday, May 22, 2008

The Internet! Is that thing still around?


I borrowed the title for this post from Homer Simpson in an old episode of "The Simpsons" (Groening, 2006), but there is a kernel of truth here that I think needs a little more scrutiny. How many times have you gone back a bookmark or shortcut only to find that the page you were looking for was gone? Internet-only information is ephemeral, sometimes to an astonishing degree. How can this information be preserved, and how much of it should be preserved?

According to some people, we should preserve anything that someone else is not already preserving because no one can accurately say now what will be of value at a later point in time (Feldman, 1997). Obviously this will not work for everyone who preserves information because preservation is not free; storage costs, access methods, copyright fees, and personnel all cost money, and the last time I checked not too many people had endless supplies of cash on hand for the preservation of information.

What then is to be saved and what should be lost to the mists of time so to speak? As an individual, my first inclination is to say that nothing should be lost, but in reality, this is a much bigger issue that first meets the eye. Because of the aforementioned monetary issues, many organizations have established guidelines for what will be saved and what will be left for others to preserve. A good example of this is the Internet Archive, who's F.A.Q. states the following:

"we collect only publicly accessible Web pages. We do not archive pages that require a password to access, pages tagged for "robot exclusion" by their owners, pages that are only accessible when a person types into and sends a form, or pages on secure servers. If a site owner properly requests removal of a Web site through http://www.archive.org/about/exclude.php, we will exclude that site from the Wayback Machine."

Because not all information on the Internet is necessarily saved by every organization, things like Internet forums or personal online journals are not recorded, and therefore some other source must preserve this information if it is to be preserved at all. While some groups who create these kinds of information do preserve their content (for example the alternative process listserv archives) not all do, and that is an inherent preservation problem with the Internet (Beagrie, 2005).

In order to better understand this issue, the British Library conducted a study called "the Digital Lives research project" which found that there are a number of issues that will need to be dealt with in order to find a workable and sustainable solution to the preservation of this sort of information. One of the more significant problems they study found was that there was no common format for this information, which complicates efforts to preserve it. The study also found that there was no clear consensus regarding what was worth preserving and what was not (Pencock, 2006).

Until these issues are resolved, and they may never be fully resolved, the preservation of information that originated in an electronic form will be a complicated matter that those involved in the preservation of information will have to deal with.

Suggested Further Reading:

Report of the Task Force on Preserving Digital Information (OCLC)

How to Preserve Authentic Electronic Records


works cited:

Beagrie, N. (2005).
"Plenty of Room at the Bottom? Personal Digital Libraries and Collections". D-Lib Magazine 11(6).

Feldman, S E. (1997). "It was here a minute ago!" Archiving the Net. Searcher, 5, 52-64.

Groening, M. & Brooks, J. L. (Producer). (2006). The Simpsons [Television series]. U.S.A.:Fox.

Pennock, M. (2006). Digital Preservation Coalition Forum on Web Archiving. Ariadne, (48), 1-.

No comments: