Adding Metadata

Few repositories are currently equipped to handle the full range of preservation metadata. Creators of systems that do accommodate it must provide answers to the following questions:

Elements: What metadata will be created & which attributes of the objects will be important in the future?
OAIS provides only a conceptual framework for preservation metadata. Several groups and institutions around the world have built on that framework to identify and specify metadata elements. Each took a slightly different approach and developed different metadata element specifications. Here are the element sets proposed or used by some of those bodies:

Creation: Who will do the actual work of creating the metadata?
The most efficient and accurate process starts with a common metadata framework, used by both the Producer and the Archive. The actor who is closest to the information to be used as metadata adds that information to the framework. For example, the creator of a digital object knows best the technical information about the creation of the object. Accuracy is best served if the producer adds Pre-Ingest technical metadata to the framework. Context Information and Fixity Information are also known at creation time. The Archive, on the other hand, holds instances of metadata that can be shared by many objects—format standards, for example. The process is most efficient if the Archive adds pointers or links from the many objects to the common metadata. Then the Producer doesn't have to create redundant instances of metadata shared by all the objects.

Manual or Automatic?
Both the Producer and the Archive may have to produce some metadata by hand, but the goal—again, for efficiency’s sake—should be automatic production of metadata. The METAe software is an example of semi-automatic production of preservation metadata. For a description of the project and information about its Metadata Engine, see RLG DigiNews, v.6, no. 3.

Interface: How does the metadata get from the creator to the repository?
The short answer is, “negotiation.” The long answer is that the Producer and the Archive must make a Submission Agreement that spells out the means of transmission, the verification process, and the process by which the Archive can request re-transmission. In other words, the two can agree that the files will be delivered by any means that is convenient to both. Once the files arrive at the repository, they should be verified; the Archive can verify the files against checksums sent by the Producer. If something goes wrong, the Archive will want to be able to let the Producer know and be able to get good copies of the files.

How is metadata requested and retrieved by a user?
Web browsers and HTML forms are the most common interface, although dedicated software is a possibility. A repository can also create a process to notify certain users of the creation or modification of objects or metadata, either on a set schedule or event driven basis.

Adding Metadata image of printer printing out mets XML metadata

Storage: How will the metadata be stored?
Because current digital repository systems don’t provide for the complete range of preservation metadata, archiving institutions must create their own mechanisms for storing the metadata. Three types of digital storage are possible: discrete files, a database management system, and embed metadata in the objects themselves. The discrete-file method is the simplest: metadata is stored in text files (often using XML tagging) and associated with the digital objects by persistent IDs in some way.[See the METS box below, for an example of an XML framework that organizes digital objects in a tagged text file.] Database management requires a higher level of technical commitment but has the advantage of being capable of storing a relational model of complex objects. The third type of digital storage, embedding metadata directly in the object, is possible with some file formats. The TIFF format, for example, has the space and functionality to allow metadata to be stored in the file header.

Update: When something about the object changes, how is its metadata modified?
At a minimum, the change should be recorded as Provenance Information. If the objects are moved to a new location, various pointers will have to be updated. Messages to producers, users, and the repository administration might be sent out. A desirable principle is consistency: how the changes are made and documented should not differ within a cohesive set of objects.

 

METS (The Metadata Encoding and Transmission Standard)

The METS schema, is a flexible XML framework designed for storing administrative, structural, and descriptive metadata about digital objects. In addition to encapsulating the metadata itself (or pointers to metadata stored outside the METS object), the framework provides elements for describing the relationship among the metadata and among the pieces of the complex objects. What’s more, it provides tags for describing and attaching executable behavior appropriate for content in the METS object. In short, it is an XML-based container for all types of metadata, for the relationships among them and the objects they are about, and for the behaviors associated with the objects. METS’s comprehensiveness and the flexibility designed in its structure make it an excellent choice for a framework or container for the objects and metadata in a preservation system.

METS is not a tool, however. An instance of METS is an XML document. To be able to work with METS as a container for Ingest, you need a text editor, an XML editor, or ideally, a forms-based user interface built and customized to your collections and to your working environment. Batch processing will require some customized programming to integrate your metadata into the METS structure. Using it as the container for an Archival Information Package will also require programming work.

 

Exercise

  1. Has your organization adopted a metadata standard that supports digital preservation?
  2. Check out developing standards for metadata containers, such as METS, MPEG-21, FOXML, XFDU.