By Don Sawyer
A new standard is emerging that can assist digital archives, their managers, and their customers in managing and ensuring the long term preservation of, and access to, digital information. It is titled "Reference Model for an Open Archival Information System (OAIS)" (http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html), but is often referred to as simply the OAIS Reference Model. It has just completed balloting as an ISO Draft International Standard (DIS), and is expected to become a full standard later this year. Its purpose is to provide a framework of concepts and terminology that facilitate long-term preservation and access to digital information. Its acclaim and growing usage were highlighted in the September 2000, NSSDC Newsletter (http://nssdc.gsfc.nasa.gov/nssdc_news/sept00/archive_ref_model.html).
Key OAIS Concepts
The two primary modeling concepts in the OAIS are the information models and functional models.
Two key information models are the Information Object and the Archival Information Package (AIP). The information object is defined as a data object (usually digital bits) together with its representation information object, which gives the format and meaning of the digital object bits. It is shown schematically in Figure 1. Note that the two objects constituting an Information Object need not be physically contiguous. Pointers or links in one to the other are often used. Note also that the data object's bits may represent numerical values of physical observations or they may represent text about observations, their sources, their state, etc.
This highlights the importance of the format and meaning descriptions needed to turn bits into information. This is modeled in more detail in the OAIS.
The Archival Information Package (AIP) is a concept for identifying a collection of information that can be preserved over the long term. It is shown schematically in Figure 2.
The primary information object the OAIS is tasked to preserve is the Content Information. Also within the AIP is an Information Object called the Preservation Description Information (PDI). The PDI contains additional information about the Content Information and is needed to make the Content Information meaningful for the indefinite long-term. It includes several categories of information addressing how the Content Information is referenced, how it is shielded from unintended alteration, how it relates to other information, and what has been the chain of custody.
Information related to the AIP is shown as Package Descriptions and Packaging Information. Package descriptions are the information used by finding aids to assist users in determining the AIPs of interest. Packaging information is used to bind the Content Information and Preservation Description Information into a recognizable entity. These are described more completely in the OAIS Reference Model.
The primary functional model, shown in Figure 3, breaks the OAIS archive into five functional entities and related interfaces. Only major information flows are shown. The lines connecting entities identify communication paths over which information flows in both directions. The lines to Administration are dashed only to reduce diagram clutter.
The role provided by each of the entities in Figure 3 is described briefly as follows:
Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers (or from internal elements under Administration control) and prepare the contents for storage and management within the archive. Ingest functions include receiving SIPs, performing quality assurance on SIPs, generating an Archival Information Package (AIP) which complies with the archive's data formatting and documentation standards, extracting Descriptive Information from the AIPs for inclusion in the archive database, and coordinating updates to Archival Storage and Data Management.
Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of AIPs. Archival Storage functions include receiving AIPs from Ingest and adding them to permanent storage, managing the storage hierarchy, refreshing the media on which archive holdings are stored, performing routine and special error checking, providing disaster recovery capabilities, and providing AIPs to Access to fulfill orders.
Data Management: This entity provides the services and functions for populating, maintaining, and accessing both Descriptive Information that identifies and documents archive holdings and administrative data used to manage the archive. Data Management functions include administering the archive database functions (maintaining schema and view definitions, and referential integrity), performing database updates (loading new descriptive information or archive administrative data), performing queries on the data management data to generate result sets, and producing reports from these result sets.
Administration: This entity manages the day-to-day operation of the archive system. Administration functions include soliciting and negotiating submission agreements with Producers, auditing submissions to ensure that they meet archive standards, and maintaining configuration management of system hardware and software. It also provides system engineering functions to monitor and improve archive operation, and to inventory, report on, and migrate/update the contents of the archive. It is also responsible for developing and maintaining archive data standards and policies, providing customer support, and activating stored requests.
Access: This entity supports Consumers in determining the existence, description, location and availability of information stored in the OAIS and allowing Consumers to request and receive information products. Access functions include communicating with Consumers to receive requests, applying controls to limit access to specially protected information, coordinating the execution of requests to successful completion, generating responses (Dissemination Information Packages, result sets, reports) and delivering the responses to Consumers.
In addition to the two primary modeling areas, the OAIS also addresses migration issues. It defines four types of migration, in order of increasing risk of information loss, known as refreshment, replication, repackaging, and transformation. Refreshment replaces an underlying media volume, replication reproduces a full AIP on new media, repackaging revises the packaging of an AIPs Content and Preservation Description Information, and transformation revises the Content and/or Preservation Description Information. A transformation of an AIP, while attempting to preserve the full information content, results in a new version of the AIP. A new edition of an AIP results when the information content of an AIP is improved. A derived AIP results when the new AIP is a subset of, or an aggregation of, information in one or more other AIPs.
Archival Information Package (AIP)
NSSDC has adopted the Archival Information Package concept using a media independent canonical form to facilitate future refreshment and replication migrations across media types, and to improve association of needed representation information and preservation description information.
The major packaging form uses the ISO 12175 (SFDU packaging) standard to form a single file container ( the AIP) containing SFDU packaging labels, an attribute object and the primary data stream of interest which is typically sensor data (see Figure 4). Each of these objects is assigned an internationally object-type-unique identifier (called the Authority and Description Identifier, or ADID) that is carried as a part of the packaging, and which points to a description of the format and meaning of the object. This provides a convenient mechanism to logically hook representation information to these objects and to support automated checking and gathering of this information. Access to the representation information is an automated service supported by Control Authorities (e.g., NSSDC) that are preserving the representation information.
The primary data stream, or data object, of interest as shown by the SDO in Figure 4, is held as a sequence of bytes within the AIP. It is to be interpreted using its associated representation information.
The attribute object contains additional information about the primary data object. This information can be largely categorized as Preservation Description Information in that it gives checksums and byte counts to ensure the integrity of the primary data object, it gives historical information on how the primary data object originally looked when received such as original file name, source operating system, original record delimiters, and it gives some processing information related to moving the primary data object into the canonical form of the AIP. It also gives the unique identifier, called the Archival Storage Identifier, which is assigned by the NSSDC archive so that the unique content can be tracked and retrieved. An identifier of the primary collection to which it belongs is also given. These last two are types of reference information. There are also attributes giving start and stop times for the primary data object, and these may be said to be a part of giving context. They can be, and generally are, also extracted to be a part of descriptive information which is used by finding aids. The attribute object is implemented using the ISO 14961 (PVL specification) standard. A future version may use XML.
Figure 4 illustrates the concepts of the foregoing paragraphs. In the context of Figures 1 and 2, the AIP contains two information objects. One is the content information object formed from the sensor data object (SDO; referred to as "the data file" in most articles of this newsletter) as linked to its representation information which is pointed to by the ADID in the SDO's SFDU label. The other is the Preservation Description Information object formed from the attribute object (AO; usually referred to as "the attribute file" in other articles) as linked to its representation information pointed to by the ADID in the AO's SFDU label. These two information objects are joined into one AIP by the additional packaging information consisting of the leading SFDU label with its ADID.
In addition to the SFDU packaging, the NSSDC also employs another form of packaging. This is a two file construct, or split package form, where the primary data object is extracted into a file with a name and extension approved by NSSDC, and where the associated attribute object has the same name as the data object, but with an extension of .att. This packaging form is used to facilitate user direct access to files placed on disk for pickup using ftp or similar tools. In this form of packaging, the association to the representation information that is stored in the Control Authority is still available for the primary data object because its ADID is also stored in the attribute object. Users accessing files from NSSDC's disk-based dissemination environment can also use readmes (or equivalent) to find format and other information needed to understand the data.
NSSDC and the OAIS Functions
In OAIS terminology, the SFDU-packaged AIPs relate to NSSDC's Archival Storage function, while the two file construct relates to NSSDC's Access function.
The NSSDC ingest function includes acquisition scientists working with data submitters to establish submission agreements, and it includes quality checking of submissions to ensure correct and usable information is obtained. Adequate metadata needs to be acquired to ensure AIPs can be formed, and to the extent AIPs are received directly (as for the IMAGE mission), the entire process is greatly simplified.
The NSSDC data management function provides various databases that may be used by finding aids, such as web pages, through the access function, to find data of interest. In addition, the simplest finding aid is the use of ftp to directly search the anonymous ftp site containing many AIPs that have been split into two files, the canonical data file and its associated attributes file.
The NSSDC administrative function, involving data operations, system engineering, etc., proceeds under the guidance of NSSDC management and is largely executed by NSSDC's onsite contractor.