SIL Electronic Working Papers 2006-003

Ensuring that digital data last: the priority of archival form over working form and presentation form

Author  Simons, Gary F.

One of the great ironies of writing technology is that as technologies for writing become more advanced, the products of writing become less durable. The most enduring written records in history are those that were carved into stone by the ancients. By contrast, digital word processing is our most advanced writing technology to date, but it is also the most ephemeral. Hardware and software technologies are changing so rapidly that a typical storage medium or file format is obsolete within 5 to 10 years. Unless linguists take special measures to counter this, their digital records of endangered languages are in danger of dying out before the languages themselves.

A linguist must do two things in order to ensure that digital data endure: (1) the materials must be put into an enduring file format, and (2) the materials must be deposited with an archive that will make a practice of migrating them to new storage media as needed.  The paper addresses the first of these issues.  Most projects tend to focus on the working form  of data (that is, the form in which the materials are stored as they are worked on from day to day) and the presentation form (the form in which the materials will be presented to the public).  But these forms are closely tied to particular pieces of software and thus tend to become obsolete when the software does. The paper thus argues for the priority of the archival form (a form that is self documenting and software independent) as the object of language documentation. Many file formats for textual data are discussed and illustrated with the ultimate conclusion that descriptive XML markup represents best current practice for the archival form.

Published  2006
Subject  Computer programs
Keywords  computing; XML; archiving; data preservation; descriptive markup