Digital Archive Directions (DADs) Workshop

(A part of the ISO Archiving Workshop Series)


Position Paper

Digital Archive Directions (DADs) Workshop

DATE: June 22-26, 1998

HOST: The National Archives and Records Administration
Archives II
8601 Adelphi Road
College Park, MD 20740-6001



1. Identification of Proposed Topic [Required]

1.1 Title

Needed: A Standard for Enabling Automated Searching for Information within A Digital Archive

1.2 Contributor(s)

Teledyne Brown Engineering
Tilton Duane Price, 300 Sparkman Dr., MS 200, Huntsville, AL 35807-7007,, fax: 256/725-2469, telephone: 256/726-1591
Sue Winsett, 300 Sparkman Dr., MS 200, Huntsville, AL 35807-7007,, fax: 256/725-2469, telephone: 256/726-1116
John B. Rainey, POC, 300 Sparkman Dr., MS 200, Huntsville, AL 35807-7007,, fax: 256/725-2469, telephone: 256/726-1132

1.3 Description of Proposed Project

This standard would provide a means to characterize an archive and relate this characterization to the world. The standard would provide a mechanism that would automate searches by computer search agents for archives that satisfy a set of retrieval parameters.

The standard would be implemented using a three-layer interface between the world and the data/product archive. The first layer, or the interface to computer search engines, would consist of controlled vocabularies or hierarchical classification schemes that link the principal subjects of the archive to the world. The middle layer would contain a standardized list of principal subjects or components of the archive linked both to the URLs for the data sets or products of the digital archive and to the vocabularies/classification schemes, so that the data sets/products are ultimately linked to the world. The third layer would directly touch the data sets/products of the digital archive.

The top layer--the controlled vocabulary or classification scheme--would present the types of components in the archive. This scheme would show a generalized view of the archive to the rest of the world. A hierarchical classification scheme shows some relationships of the components within an archive. The standard would address the format and structure of the classification scheme so that the scheme could be automatically displayed in a generic user interface. An automated search agent would be able to parse the classification scheme and determine if the archive represented by the scheme is what the user wants. Because a picture is worth a thousand words, the classification scheme could be iconized into a picture or image and formatted as GIF or JPEG. The icons or objects in the image would be linked to the rows in the classification scheme.

The principal subjects or components of the archive would be defined and stored digitally in a list. Each component in the list would have a unique identification parameter within the local archive, a name, and a short description. The standard would address the structure of this list to enable items (rows) to be selected and displayed (presented to the user) in an automated manner. This would allow for the generation of generic user-interactive interfaces that work with automated search agents. This list would form a middle layer between the archive products and the interface to the world.

The last layer would touch the archive and directly reference the digital objects within the archive. This could be a URL to a digital file or a library number for a physical item such as a paper document. At this point, tools within the archive would display the digital objects. For example, visualization tools could display data and PDF and TIFF viewers could display digitized documents. Still another aspect of the archive is a requirement for a standardized format with translators to convert from the standard format to the original format of the data.

The enabling technologies for this effort are: Library Sciences with classification schemes; Document Management with text searches and concept searches; Web technology with search agents and web crawlers; W3C's Resource Description Framework (RDF) and Extensible Markup Language (XML); Java with JDBC and JDBC/ODBC drivers and applets; SQL; common formats such as GIF, JPEG, and HDF; and, last but not least, RDBMS with SQL, Data Dictionaries, Entity Relationships, Java's JDBC, and CGI.

1.4 Justification

A need or requirement exists for a standard that will allow automation of the search process to locate the digital archives that have the best prospect for providing the information or data for which a user is searching. Such a standard tool would give expanded operability to Web technology by narrowing searches and providing searched for data in a more timely manner.

1.5 Definitions of Concepts and Special Terms


1.6 Expected Relationship with OAIS Reference Model

This standard would directly relate to the Access and Dissemination area of the Digital Archive reference model. This standard also applies to the metadata of the archive, and populating and defining the required metadata relates to Ingest and Data Management areas of the reference model.



Author: Tilton Duane Price ( +1 256/726-1591
Sue Winsett ( +1 256/726-1116
John B. Rainey ( +1 256/726-1132
Last Revised: May 22, 1998, Sue Winsett(June 16, 1998, John Garrett)