Science Archives in the 21st Century
Rule-based preservation systems automate the execution of preservation management policies and assessment of preservation repository trustworthiness.
As the size of scientific collections grow, the management of assertions about the authenticity and integrity of the collections becomes onerous. For scalable preservation systems, an essential requirement is the ability to automate the application of management policies, the validation of assertions about the properties of the preservation environment, and the application of recovery mechanisms when faults are detected. The integrated Rule-Oriented Data System (iRODS) (under development with funding from NARA and NSF) supports the characterization of management policies as rules controlling the execution of remote micro-services.
The iRODS data grid implements the mechanisms needed to make assertions about the authenticity and integrity of records (scientific data). The desired properties of a preservation environment are expressed as assertions that are implemented as management policies controlling the execution of preservation capabilities (ingest, access, archival storage). The iRODS system supports the mapping of preservation capabilities to sets of micro-services, where each micro-service is a set of operations performed at a remote storage location (archival storage). Management policies are mapped to a set of rules, where each rule controls the execution of a set of micro-services, or a set of rules and micro-services. Assertions are mapped to sets of persistent state information that are generated on application of the rules. Examples of preservation capabilities are provided in the NARA Electronic Record Archives capabilities list. Examples of assertions about preservation environments are provided in the RLG/NARA assessment criteria for trusted digital repositories. Mappings of both the ERA capability list and the RLG/NARA assessment criteria have been made to iRODS rules and micro-services.
Data grids implement the concept of infrastructure independence, the ability to manage the properties of the preservation environment independently of the choice of archival storage system or access mechanism. Infrastructure independence enables the migration of the preservation environment onto new technology, while maintaining the integrity and authenticity of the scientific data collections. iRODS implements as micro-services the actions executed by the OAIS archival functional entities, including extraction of metadata, replication of files, validation of checksums, migration of files, management of descriptive information, creation of AIPS, parsing of SIPs, and extraction of DIPs. iRODS also supports the creation of the rules needed to express the management policies that dictate the risk mitigation strategies against data loss, the authenticity requirements for provenance metadata, and the access controls and data transformations. IRODS is available as open source software at http://irods.sdsc.edu.