Archival Workshop on Ingest, Identification, and Certification Standards (AWIICS)

(A part of the ISO Archiving Workshop Series)
 
 
  

     

Draft Report


Archival Workshop on Ingest, Identification, and Certification Standards
(AWIICS)
DATE: October 13-15, 1999

HOST: The National Archives and Records Administration
Archives II
8601 Adelphi Road
College Park, MD 20740-6001

 


 

Executive Summary

The explosive growth of digital information and the need to successfully archive these data are matters which are receiving a great deal of attention from many organizations. The majority of this effort has been directed toward accessing and retrieving archived data. However, successful and efficient retrieval of data from an archive requires that the data was successfully and efficiently ingested and identified by an archive. It also requires that the archive follow appropriate policies and procedures to ensure the information is understandable and usable into the indefinite future. Appropriate standards can assist in meeting these objectives.

These issues, and others, have been addressed at the conceptual level in a reference model developed by a US ISO archiving group under ISO TC20/ SC13/ (Aircraft and space vehicles/Space data and information transfer systems) and the Consultative Committee for Space Data Systems (CCSDS). This model, called the Reference Model for an Open Archival Information System (OAIS), was undergoing formal ISO and CCSDS review at the time of AWIICS. An electronic version of the OAIS Reference Model can be found at:

http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html
Thus it was viewed as an opportune time to see what additional standardization efforts were of interest and to plan for their initiation.

The National Archives and Records Administration (NARA), the National Aeronautics and Space Administration (NASA), and ISO TC20/ SC13/ hosted the Archival Workshop on Ingest, Identification and Certification (AWIICS) on 13 and 14 October 1999 at the NARA Archives II facilities in College Park, MD.

Based on input from the Digital Archive Directions (DADs) Workshop and further market interest surveys, the Workshop organizers suggested three primary topic areas for workshop focus and possible standardization efforts:

  • Ingest methodology and standardization: A methodology for the interaction between an archive and its producers could greatly facilitate the preparation and ingest of digital material, simultaneously increasing quality and reducing costs. This might cover preparation of the data and supporting material, the archives' ingest of the data and supporting material, and the interactions between the providers and the archives about the data preparation and submission.
  • Identification of data: Permanent, unique, identifiers for digital objects retrievable from archives could significantly improve all aspects of archive ingest and the use of digital information by end users. There are several ongoing efforts that need to be investigated in this context.
  • Certification of archives: A method by which an Archive's customers could gain confidence in the authenticity, quality, and usefulness of digitally archived materials would help ensure management that an archive was fulfilling its role of long term preservation. This may look like a framework that could be tailored for specific domains or archives as appropriate.

To register your interest in supporting one or more of these efforts, please use the following forms:

Participation

The participants attending the workshop represented a wide variety of national and international organizations including government agencies, contractors, archives, academic institutions, non-profit organizations, and vendor's.

Organization

The workshop consisted of a review of papers submitted on needs or approaches to standards in each of the identified areas. The primary focus was on identifying work plans for specific standards, and on gauging the level of likely participation in developing each standard.

Conclusions

A diverse community of science data centers, libraries, electronic records and traditional archives sees the benefit, and is willing to participate, in developing standards in the following areas:

Certification is a framework in which the Ingest Methodology and other archive standards can be related

Relative to the OAIS Reference Model,

  • Certification is at the Management to OAIS Administration interface
  • Ingest Methodology is at the Producer to OAIS Ingest interface
  • Identification requirements are to be derived from consideration of the Consumer to Access interface
When any project that will produce information needing preservation is conceived, it should be mandatory to involve a representative from the archive likely to be the long term steward

The proposed standardization efforts are nearly free of specific underlying technologies so that they are not subject to the rapid pace of technology change.

Recommendations

Pursue the execution of the standardization efforts identified.

An international consortium is needed to coordinate digital preservation issues across a wide variety of organizations, including science data centers, libraries, electronics records management, traditional libraries, and commercial organizations acting as both users and vendors of solutions.

Next Steps

  • Put up web form soliciting participation in one or more of the standardization efforts
  • Generate a reflector with those who attended, and then look to expand it as appropriate
  • Take proposed work packages to ISO TC 20/SC 13 meeting in Madrid (Nov. 8-10)
    • Start a work package in at least one of the proposed areas (probably ingest)
    • Identify a person/agency to head the effort(s)
  • Identify a lead (person/organization) to take the lead in each proposed effort
    • Can NARA be persuaded to take the lead in Certification?
    • Can RLG/ (or other digital library group) take lead in Identification/access?
    • Can scientific archives take lead in Ingest?
  • Investigate setting up an international digital archive consortium
    • Draft a Working Paper on the need for the consortium; due 1 November (introductory text from DADs with AWIICS material)
    • Draft plan to go to funding agencies and other organizations; due 15 January 2000
    • Initiate discussions with various organizations
      • RLG (how did they get formed?)
      • Cornell (Oya Rieger)- regarding their roles/Digital Lib/Fedora
      • ISO secretariat and ANSI regarding establishment of a 'Digital Preservation' Committee
      • NISO / Pat Harris (Don Sawyer)
      • Open GIS consortium (Lou Reich)
      • Federal Funders Group (Bruce Ambacher)
      • NSF regarding support
      • CNRI/Cliff Lynch (Lou Reich)
      • NIH/NLM
      • Federal CIO council /NASA CIO (Don Sawyer)
      • NIMA / Pat Williams (Bruce Ambacher)
      • NIST
      • StorageTek
      • Petroleum industry
      • Medical imaging industry

Introduction

Purpose

The explosive growth of digital information and the need to successfully archive these data are matters which are receiving a great deal of attention from many organizations. The majority of this effort has been directed toward accessing and retrieving archived data. However, successful and efficient retrieval of data from an archive requires that the data was successfully and efficiently ingested and identified by an archive. It also requires that the archive follow appropriate policies and procedures to ensure the information is understandable and usable into the indefinite future. Appropriate standards can assist in meeting these objectives.

These issues, and others, have been addressed at the conceptual level in a reference model developed by a US ISO archiving group under ISO TC20/ SC13/ (Aircraft and space vehicles/Space data and information transfer systems) and the Consultative Committee for Space Data Systems (CCSDS). This model, called the Reference Model for an Open Archival Information System (OAIS), was undergoing formal ISO and CCSDS review at the time of AWIICS. An electronic version of the OAIS Reference Model can be found at:

http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html
Thus it was viewed as an opportune time to see what additional standardization efforts were of interest and to plan for their initiation.

The National Archives and Records Administration (NARA), the National Aeronautics and Space Administration (NASA), and ISO TC20/ SC13/ hosted the Archival Workshop on Ingest, Identification and Certification (AWIICS) on 13 and 14 October 1999 at the NARA Archives II facilities in College Park, MD.

Based on input from the Digital Archive Directions (DADs) Workshop and further market interest surveys, the Workshop organizers suggested three primary topic areas for workshop focus and possible standardization efforts:

  • Ingest methodology and standardization: A methodology for the interaction between an archive and its producers could greatly facilitate the preparation and ingest of digital material, simultaneously increasing quality and reducing costs. This might cover preparation of the data and supporting material, the archives' ingest of the data and supporting material, and the interactions between the providers and the archives about the data preparation and submission.
  • Identification of data: Permanent, unique, identifiers for digital objects retrievable from archives could significantly improve all aspects of archive ingest and the use of digital information by end users. There are several ongoing efforts that need to be investigated in this context.
  • Certification of archives: A method by which an Archive's customers could gain confidence in the authenticity, quality, and usefulness of digitally archived materials would help ensure management that an archive was fulfilling its role of long term preservation. This may look like a framework that could be tailored for specific domains or archives as appropriate.

Participation

The participants attending the workshop represented a wide variety of national and international organizations including government agencies, contractors, archives, academic institutions, non-profit organizations, and vendor's.

Organization

The workshop consisted of a review of papers submitted on needs or approaches to standards in each of the identified areas. The primary focus was on identifying work plans for specific standards, and on gauging the level of likely participation in developing each standard. The agenda gives a close approximation to the actual sequence of events.

Opening Plenary

Don Sawyer provided an introduction and background for this workshop. He explained the context for this work within ISO/TC20/SC13 and CCSDS. He identified its major activities to date as the development of an Open Archival Information System (OAIS) Reference Model, the conducting of a Digital Archive Directions (DADS) workshop in June 1998 and now this AWIICS workshop.

He then reviewed the Open Archival Information System (OAIS) Reference Model, noting that it is a model as opposed to an implementation and, as such, can be used as a framework for the DADS and AWIICS activities. He itemized the functions an OAIS needs to provide and stated that descriptions of actual archives are included in annexes to the OAIS document. He then traced the operations of an OAIS from preparation of a Submission Information Package [SIP] through retention of the Archival Information Package [AIP] to the generation and distribution of a Dissemination Information Package [DIP].

He commented that the Reference Model is being well received by many organizations and is being used as the basis for some of their activities.

He pointed out that at present, the OAIS Reference Model document is out for international review, comment and acceptance by both the CCSDS and ISO/TC20/SC13. He indicated that the document as available on the Web at: http://wwwclassic.ccsds.org/RP9905/RP9905.html [Webmaster's Note: now at ../../wwwclassic/RP9905/RP9905.html] and asked that any review comments this group might care to make be sent to him at: donald.sawyer@gsfc.nasa.gov.

He reported that the next major effort had been the Digital Archives Directions (DADS) workshop held in the summer of 1998. Here the Reference Model had been exposed to a significant number of actual archive users, both nationally and internationally. He then itemized some of the major recommendations which came out of the DADS workshop. These included: establishing an international consortium to coordinate this work, the need for a data ingest methodology and an archival accreditation method.

The opening plenary then continued with overviews by the three theme leaders and a paper which addressed items of general interest covering all three themes.

IDENTIFICATION

Lou Reich (Identification Session Leader) briefly overviewed the identification issues. He noted the types of Identification as including: attributes of an object, its location and its name. He also identified the types of Identifiers in the OAIS Reference Model as Content Identifiers, Archive Information Package Identifiers and Storage Identifiers. Next, he listed some organizations currently working on Identification Standards and named some of their projects, including the OMG CORBAmed document titled Person Identification Service (PIDS) .

In response to questions, he noted that:

  • The minimum size of an archive package is seen as dependent on individual archives.
  • Long term is defined as, "Long enough to have technology become an issue" and he added that in recent years this period is becoming as short as five years.
  • Mike Martin mentioned "Standard Universal Product Codes (UPCs)" which are being developed for online marketing where products have to be uniquely identified across many suppliers.

    CERTIFICATION

    Bruce Ambacher (Certification Session Leader) briefly overviewed the Certification issues.

    He defined Certification as making sure that what goes into an archive and comes out are the same thing. In olden days, the objects to be preserved were hard object and saving their integrity was not a problem. However, Certification has evolved to include the qualifications for persons or institutions to operate an archive. More recently, the process and the data itself may be certified.

    He then noted four approaches to Certification: Individual, Archival Program, Process and Data. He then described in more detail these approaches to Certification and considerations for each.

  • Individual - This relates to the qualifications of the individuals working in an archives. In traditional archival settings individuals can become Certified Archivists through a combination of education, work experience, and a Competencies examination administered by the Academy of Certified Archivists (ACA). A parallel program for certification of Records Managers is based on a Competencies examination conducted by the Association of Records Managers and Administrators (ARMA).
  • Archival Program - This encompasses a traditional approach combining self-evaluation using standardized checklists and evaluation criteria as well as site inspections typical of program accreditation. Two extant models for this are the Society of American Archivists Evaluation of Archival Institutions and the Museum Assessment Program. The areas assessed include legal authority, governing authority, financial resources, staff, facilities, collection development, collection preservation, access, and outreach.
  • Processes - This covers aspects of an archival program which can be subjected to either quantitative or qualitative guidelines to guarantee that procedures used adhere to all internal and external procedures and requirements. External standards that can be associated with evaluating these processes include ISO 9000, DoD 5015.2 Standard, and draft ISO standards on Records Management and Legal Considerations in ISO Standard PDTR 15801.
  • Data - This is concerned with data persistence or reliability over time. It encompasses both internal and external quality control through processes such as ISO 9000 and Procedures manuals. It also includes documenting the processes used when migrating data, creating and maintaining metadata, and verifying the contents of new copies.
  • Collectively the Certification process(es) ensure a high degree of confidence that the information an archives disseminates is the same as the information it ingested and preserved, with full documentation for all necessary modifications.

    The planning committee anticipated that the Certification track would focus on those standards and applications which would ensure that information is preserved over time.

    He concluded that a good procedures manual is needed to serve both as a mechanism to identify all these items and as a compliance check list.

    INGEST

    Don Sawyer's (Ingest Session Leader) presentation began by briefly outlining the functions assigned to Ingest within the OAIS Reference Model.

    From the model, the Ingest function was put into context, together with the other functions to be performed. The Model also breaks Ingest down into a number of sub-functions.

    He felt we should address the interactions between the archive and the producer, such as:

  • What processes has the archive established?
  • What formats and other standards has the archive standardized upon to ensure fidelity in the archive process?
  • He provided a brief overview of several submitted papers relevant to the ingest session to show the attendees essentially what the session might be addressing.

    Mike Martin's paper - The Archive Ingest Process

    Don Sawyer stated that this paper addresses key functions associated with the Producer to Archive interface and looks at these primarily from a producer point of view. It offered six steps, expanded in detail based on experience with the Planetary Data System archives, as constituting the Producer view of preparing for, and interacting with, an OAIS archive.

    He stated rather emphatically that we CANNOT continue to simply accept what is sent to the archive. We need to levy some responsibilities on the producers themselves to facilitate the archival process.

    David Holdsworth's paper - Ingest Standards (and others) in the OAIS model

    Don Sawyer briefly reviewed this paper. It addressed the sufficiency of representation information to represent the data.

    It also postulated the data preservation activity as having several steps:

  • Separation of the representation information from the data itself,
  • Separation of data from the media on which it arrived
  • Mapping the data to a bit stream
  • Preservation of the bit stream and retaining all the significant properties of the data which is to be preserved.
  • Reagan Moore's paper - Persistent Archives for Data Collections

    Don Sawyer briefly reviewed those aspects of this paper that particularly related to the ingest session. The full paper was presented in plenary by Reagan Moore because it was applicable to all the sessions.

    The focus was on the data and information models needed to manage and federate the collections and to migrate them forward in time. To have persistent archives he felt it was necessary to:

  • Use information models for describing the data
  • Distinguish context needed for the data set, for collections, and for access
  • Support interoperability across heterogeneous hardware and software systems
  • This latter point implies the separation of the access mechanisms from the collections

    He felt ingest methodology and standards could be based on emerging digital library standards such as XML and DTDs. Proprietary formats need to be converted into open standards.

    Parmesh Dwivedi's and William Callicott's paper - Archive Issues with the Evolution of Data and Information

    Don Sawyer stated that this paper was a general interest paper that addressed the history of archives including the digital explosion that is forcing a radical change in order to handle digital information. It noted many issues and questioned if we would ultimately be successful, or if we will drown in a sea of bits and information. A companion presentation to this paper is Media Issues.

    PLENARY PAPER PRESENTATION

    Reagan Moore gave a presentation on his paper Persistent Archives for Data Collections. He noted three key items for archival strategy:

  • Is the model sufficient to handle the data to be archived
  • What type of material is to be preserved
  • One has to deal with constant changes so there has to be interoperability across implementations
  • He supported the construction of a national data grid through the integration of local data caches, distributed data collections and distributed archives. His conclusions were:

  • A hierarchical information structure is needed
  • Ingestion of data sets needs to include representation of the data
  • Certification of persistent archives needs to include validation of the ability to recreate the data as a collection on new technology
  • Closing Plenary

    This plenary was held to hear the conclusions and reports formed by the separate sessions. Volunteers, up to 2 or 3 from each working session, were identified to return the following morning to draft key elements of the workshop report. The Executive Summary conclusions, recommendations, and next steps were generated during that morning session.

    Identification Session Report

    Lou Reich gave the presentation on highlights from the Identification working group.

    Although this group did not identify a specific standard to be developed, it did identify the need to generate access scenarios and requirements that would lead to clarification of identification needs in an archival setting. This proposed effort was documented as an Identification Workpackage using the provided template.

    Much of the discussion in the Identification session centered on the definition of various terminologies. It was agreed that an access focus would help drive out identification requirements.

    During this plenary report, one participant felt there should be globally unique names and he offered to write a paper on naming rules to arrive at such a unique identifier.

    Ingest Session Report

    Don Sawyer gave the presentation on highlights from the Ingest working group.

    This group agreed that an Ingest Methodology standard would be very useful and 11 of 18 expressed an interest in supporting the effort. The group documented this proposed effort as an Ingest Workpackage using the provided template. The initial input document for this effort was identified to be Mike Martin's workshop paper.

    The papers from David Holdsworth and Reagan Moore were also discussed as to their implications on the formats submitted to archives. The importance of understanding the representation information was acknowledged, although there was not sufficient time to explore the issues in more depth. Many of the archives are currently in the position of having to accept whatever format is provided. It is anticipated that the development of an ingest methodology standard would lead to the ability to be more pro-active with Producers regarding acceptable data models for their submissions.

    In response to a Ingest session question about how an implementation of the OAIS concept of an Archival Information Unit might look, Don Sawyer's presentation also provided a view of such a unit that the National Space Science Data Center is currently adopting.

    Certification Session Report

    Bruce Ambacher gave the presentation on highlights from the Certification working group.

    This group agreed that a standard Certification Checklist for Archives should be developed and 7 of 20 expressed an interest in supporting the effort. Following the workshop the session chair and his backup (Ben Kobler) documented this proposed effort as a Certification Workpackage using the provided template. The initial input document for this effort was identified to be the closing plenary report.

    The group concluded that the checklist should address qualitative issue, quantitative approaches, and metrics. It could be used in peer review, by management seeking to allocate resources effectively, and by an external body in evaluating archives.

    Working Groups

    Submitted Papers

    Participants


    Wider Views

    Overview of the Archival Workshop
    Overview of US Effort
    Overview of International Effort


    URL: http://ssdoo.gsfc.nasa.gov/nost/isoas/awiics/report.html

    A service of NOST at NSSDC. Access statistics for this web are available. Comments and suggestion are always welcome.

    Author: Archival Workshop Program Committee (archive_standards@nssdc.gsfc.nasa.gov) +1.301.286.3575
    Curator: John Garrett (John.Garrett@gsfc.nasa.gov) +1.301.286.3575
    Responsible Official: Code 633.2 / Don Sawyer (Donald.Sawyer@gsfc.nasa.gov) +1.301.286.2748
    Last Revised: 1999-12-02, Don Sawyer (2004-04-20, John Garrett)