Contact: Patricia Donovan, 716-645-2626 or [email protected]
or Christopher Densmore, 716-645-2916 or [email protected]

PROBLEMS WITH ARCHIVING ELECTRONIC MATERIALS MAY MAKE 20TH CENTURY "WORST-DOCUMENTED" PERIOD IN HISTORY

BUFFALO, N.Y. -- The electronic age has transformed nearly all fields of human endeavor and one of the conundrums in its wake -- how to preserve historic records -- turns out to be enormous and ironic.

Christopher Densmore, archivist at the University at Buffalo, speaks for an international network of archivists when he says that because of the explosion in information technologies, the late 20th century will be one of the worst-documented periods in history.

The problem, he says, is that the preservation of information produced and stored in digital form is far more difficult, time-consuming and expensive than it is to save documents on paper and microfiche. Continuing improvements in electronic media of all kinds have provoked legal, organizational and financial nightmares for archivists, librarians, museums and other information administrators.

"New technologies allow us to produce, alter and dispose of records and documents with unusual efficiency and facility," Densmore acknowledges.

There is much expense and many complications involved in their preservation. However, he and his colleagues have serious concerns about the stability, longevity and historical significance of computer-generated and electronically filed documents.

They warn that however jacked-in you are or however many killer software applications come down the pike, if you want to insure that your work product and process will be available to future scholars, it's a good idea to save it in hard copy.

It's impossible, of course, for paper copies to capture the nature of many complex, ephemeral, colorful, often-animated and scored electronic documents with their hypertext references and links to a daunting phalanx of Web sites around the world. This indicates the enormous difficulty of archiving such records for historical, legal and other purposes. Still, a prodigious effort has been launched by archivists to avoid future problems.

In defining the problem, experts say that as we increasingly digitize research, literature, journals, financial and tax records, legal documents, family photos and even those email love letters, we should be aware of two facts in particular.

"One," Densmore says, "is that digital information is extremely fragile. Little is known about the stability even of old technologies like magnetic tape, which lasts only about 10 years. Much less is known about the generations of disks (floppy or hard) and CDs that have evolved so far." It is known, however, that magnetic impulses deteriorate and that various coatings and physical materials used in these products degenerate at different rates under different conditions.

"Second," he adds, "little is known about how to retrieve information from the hundreds of different kinds of obsolete hardware and software that produced and now store millions of significant documents. They were, no doubt, stored this way with the assumption that they would be readily and permanently available for use, which turns out not to be the case. It is safe to assume that today's documents will be equally difficult to retrieve using tomorrow's hardware and software."

Citing a 1996 report on online electronic documents and distributed databases produced by the SUNY Office of Archives and Records Management, Densmore notes that electronic information systems are not inherently designed to serve as record-keeping systems in the archival sense of that term.

"No one knows how stable electronic files are or how long they'll last," he says. "Right now, the average book published on acid-free paper by university presses and stored in a library is expected to remain useable for 500 years. That's the archival standard for paper documents, photographs and microfilm. So material stored today in those formats will be available to our progeny in the year 2498.

"Contrast this with the stuff of floppies, which, with great care, might last until 2028 and, without care, only 10 years, or until 2008. Optical disks might survive intact until 2058," he says. "No one really knows for sure."

As Jeff Rothenberg noted in Scientific American four years ago, today's CDs may last for 30 years and tomorrow's DVDs (the next generation of CDs) may last for 50. Even if such materials are stable for hundreds of years, however, they won't be readable unless the hardware and the operating system that produced them are available.

Given the variety of hardware and software programs that have been heralded and then discarded over the years, Rothenberg is describing a colossal retrieval headache.

Densmore agrees.

"Already, information produced by now-defunct software on retired hardware can't be read by today's computers," he says. "And it is very difficult to find an old computer that can read it for several reasons. Not only may the document be readable only by a specific generation of old computer, but by vintage software that may not be available any more.

"Of course, today's computers and programs will eventually be defunct themselves, soon enough, raising questions about the viability of documents being produced as I speak. On top of that, although most disks have their interface system on the disk itself, computer-operating systems also degenerate, so although they may look like they work, they may be useless."

Densmore acknowledges that technological changes in records production have been an issue among archivists long before the dawn of the new electronic media.

"Archivists like to have the authentic original document in their collection for evidentiary reasons," Densmore points out.

He explains that a file kept by a particular office or official is the "official file" and, archivally speaking, ideally contains the original documents produced by that office or individual. If you lose control of that original, official material -- if, because of copying, the copy (not the original) is in the file or if copies went to everyone, then everybody's got the file or parts of the file and no one may have the original documents.

"Years ago, mimeographing and photocopying, for instance, raised problems because they produced multiple copies of records, which later were found in the hands of many people. Now it's possible that all the copies are identical to the original," Densmore says. "It's also the case that the original or its "copy" could be altered -- perhaps officially -- and copied again, making it very difficult to identify and authenticate the original document."

Today, he says, printers produce virtually identical copies, with none of the degeneration manifested by mimeos and Xerox copies. So it is also almost impossible to identify the original document at all. They all have the same characteristics.

"Also, because documents are frequently mailed electronically," he says, "the archivist may not know who received copies and was, therefore, in on a decision. The original item also may have been edited electronically, making it very difficult to document the process by which the decisions were made to change a curriculum, build a science building, promote a professor."

New issues that confront archivists are perplexing and difficult to resolve. One of them is the question of whether libraries should maintain a collection of equipment and operating systems that can read old electronic materials. This would be a formidable task.

Densmore cites another new problem to be aware of as well. Today, photographs taken on digital cameras are stored on zip drives or some other media. They take up a great deal of storage space and so the data often are discarded to make room for storage of a new digital photo, so the primary document is gone, even as it's used. He warns that published versions of such photos are not nearly as reproducible as traditional negatives and even if saved, the disks may not be readable in the not-so-distant future.

Even more difficult to deal with is the electronic manipulation of photographs that are then used to "document" an activity or person in a more attractive form.

"The result may be more appealing visually," Densmore admits, "but if you go around straightening or whitening teeth, moving trees around, changing hair color or adding characters to a scenario, altering bodies to conform to current standards of attractiveness, then you are no longer documenting fact, but producing an aesthetic, but inaccurate, document that may not reflect reality at all. That's fine for advertising, I suppose, but historians trying to keep a record need to know how accurate these things are."

Finally, in reviewing the scope of the changes that confront archivists in this regard, he says that the biggest problem may not be technology, planning or the availability of trained personnel, but the cost of these undertakings.

"It is an expensive and enormously complex task to maintain old hardware and software so library users in 2030 can read what a professor typed into his Mac Classic six years ago or into several incarnations of Dell PCs from 1994," he admits.

"It is also expensive to regularly migrate vast bodies of software to new generations of technology and impossible to maintain the depth of reference of the hypertext originals.

"So we'll have to carefully assess these costs and compare them to the costs of traditional methodologies as we set about to digitize the entire contents of a library, for instance, or accept archival materials in digital form, which will require expensive upkeep."

Archiving Tips:

If your records are in digital format, you need to have the wherewithal to preserve and maintain them. Solutions recommended by a variety of state, federal and professional- association records-management administrators that can be applied to both institutional and personal electronic documents include:

Keep it simple

ï Files in a standard format are more likely to be readable in the distant future. When you store formatted word processing files, accompany them with simple text versions that will better stand the test of time. Image files stored as simple bitmaps without compression are much more likely to survive.

ï Store data with the particular version of the software that created it.

ï Keep two copies of digital data carefully in two separate places. Fire, flood, mildew and assorted insects can destroy digital materials as well as paper.

ï Keep saved software and hardware in a cool, dry place.

ï Use high-quality media. Avoid brands you never heard of or that are particularly cheap.

ï Inspect and refresh data even on optical media (see data migration strategy, below)

ï Develop an archival plan before you upgrade hardware or software. The latest version of either may not work well with the material created in earlier versions.

ï Keep your archives where you can get at them, not in a place that may be hard to remember and find years from now.

Know how long digital data storage media will last compared to other storage media

ï Paper storage is bulky, but advantageous in that documents on paper can be read easily, last a very long time and degrade gracefully. Longevity is significantly increased if the paper is acid-free and stored carefully at lower temperatures and humitidies.

ï The life expectancy of data storage media depends on several factors: the quality with which the media was manufactured, the number of times it's been used over its lifetime, the care with which it is handled, storage temperature and humidity, the cleanliness of the storage environment and the quality of the recorder used to write the media.

ï Testing by Imation/3M and Kodak, for instance, indicates that their CD-ROM media will last intact for 100 years. Optical disk media can last for several decades. Magnetic tape will last for about 10 years and digital magnetic tape 30 years. If you want to be certain your documents will be here 500 years hence without concerns about hardware and software, preserve them in both paper and digital form.

ï Life expectancy is enhanced if media is stored in a clean, dry storage case and not left sitting around on desktops; not flexed or twisted (diskettes). Do not touch the media exposed in 5 1/4" diskette windows; do not write on diskettes with a hard-tipped pen or pencil; keep from exposure to magnetic fields and, if the diskette is used frequently, it should backed up with another copy in case the original disk wears out.

Develop a migration strategy to move records from one generation of technology to another

ï Ensure the preservation of imaged records on existing media by paying careful attention to environmental storage. You don't want to migrate damaged records.

ï Maintain the functionality of existing hardware and software through upgrades of equipment and source code.

ï Make plans to migrate optical imaging systems, images, indexes, data files and related information through successive generations of technology. Remember that these will include hardware and programs not yet a twinkle in the eye of the granddaughter of today's computer-engineering student.

These suggestions were culled from sources that include the National Technology Alliance Web site at and "Guidelines for Ensuring the Long-Term accessibility and Usability of Records Stored as Digital Images," which is publication No. 22 in the Government Records Technical Information Series. To obtain copies of the publication, call the Government Records Services office of the State Archives and Records Administration at 518-474-6926 and ask for publication No. 22.

###