ISO Archiving Standards - Second US Workshop - Minutes
MITRE
Greenbelt, Maryland
December 19-20, 1995
(NOTE: We invite all participants to critique these minutes and
to offer updates on any significant points they feel are missing or
inadequately reflected.)
The second US Workshop on Data Archive Standards was held at MITRE
in Greenbelt MD on Dec. 19-20. Don distributed a meeting agenda,
Item 1, which was accepted and followed during the meeting. The
list of attendees is available at:
http://bolero.gsfc.nasa.gov/nost/isoas/us02/participants.html
[Webmaster's note: Now at:
http://ssdoo.gsfc.nasa.gov/nost/isoas/us02/participants.html]
Action Items
Materials Distributed
Future Meetings
Discussion Items
Action Items
- Identify and list functions in each box of Ref Model - Lou
- Develop a WG response to Bearman paper and indicate benefits which
have been found in it for our work Relative to Reference Model
paper (Item 9) - Don
- Develop Reference Model Paper
- Develop Section 1 - Scope - Lou
- Develop Section 2 - Key Definitions - Don
- Develop Section 3 - High-Level View - ( Scenario and Internal Model) - Lou
- Develop Section 4 - Data Modelling view - Don and Lou
- Develop Section 5 - Detailed Services and Interface View
- Develop Section 6 - Operational Scenarios - (Scenario and Internal Model)
- Data Ingest - Mike
- Date Migration (non media) - Steve
- Data Migration (media) - Joel
- Client Retrieval (single box) - Paul
- Client Retrieval (two boxes) - Steve
- Develop Section 7
- Classes of Archives - TBD
- Scaling (Large and small archives) - Don
- Key parameters - Don
- Develop Section 8 - Issues - Lou
NOTES:
- Contributions from anyone in any area are invited
- All Action Responses are due to lou by Jan 15, 1996
- Responses to be in Word and PowerPoint/MacDraw
- Put new material on WWW as/when appropriate - John/Elise
- FAX POSIX management model to Lou - Ron
- Generate reflector list for "WG core group" - John
- Send query about attending meetings and plenary sessions at locations
remote to Washington - Don
- Query Randy Davis about hosting meeting in Boulder - Don
- Develop response to Archiving Digital Information Paper - Don
MATERIALS DISTRIBUTED/REFERENCED
(Item/Author/Distributed By)
- Draft Agenda / Sawyer/Sawyer
- NASA Report on First ISO/CCSDS Archiving Standards Workshop / Sawyer/Sawyer
- A Strategy for Long-Term Preservation of Space Mission Data / C. Huc/Sawyer
- Archiving At CNES / C. Huc/Sawyer
- Status of Reference Model Activities / Reich/Reich
- Metadata Requirements for Evidence / Bearman/Sawyer
- Notes on Bearman paper on "Metadata Requirements for Evidence" / Sawyer/Sawyer
- Preserving Digital Information (Stanford) / RLG/Sawyer
- Reference Model for Digital Archive Standards / Reich/Reich
- CEOS survey on Data Archives Operations / Lauritsen/Sawyer
- Life Sciences Data Archive Mapped to Ref Model / Turner/Blaise
Future Meetings
- March 19-21, 1996
- Third US Data Archiving meeting
- - two day WG meeting plus one day Plenary
- talk to Randy re next meeting in Boulder
- send query as to who will attend where
(Boulder/Washington/Plenary session)
- July 10-11/12, 1996
- Fourth US Data Archiving Meeting
- Sept 11-12/13, 1996
- Fifth US Data Archiving Meeting
Discussions
Results of ISO/CCSDS Workshop and Relations to the US Inputs
- Don Sawyer reported on this workshop held at BNSC/RAL in November
(Item 2). He felt the attendance had been good in view of this
being the first meeting and showed the attendance list which
represented a wide variety of international activities. He showed
the number of papers given; a copy of Claude Huc's presentation and
paper on Archiving at CNES are Items 3 and 4. He noted good
enthusiasm by the participants about the subject and great interest
in the Reference Model. He listed the major accomplishments which
included:
- terminology: general agreement on terms although the term
Archiving itself is still an open issue
- Reference Model
- Scope: which includes analog and digital; but from a services
standpoint, only digital information will be addressed by this
group
- Don reported issues as:
- Extent of the Archiving activity: which seemed to refer to
existence, availability, access security; (How much
does this resemble a library)
- How much metatdata are needed
- How are bulk archive transfers supported by the Reference
Model
- He showed an International Schedule which included meetings and
due dates for papers
- He noted two action items:
- One was to develop scenarios for several types of organizations
that satisfy the archive definition. A number of facilities and
activities were listed and the types of their activities.
- The second was for Lou to update the existing model and
distribute it internationally for comments by early 1996.
- Working Modes: The draft materials will be suitably noted as
drafts and placed on the WEB together with their expiration dates
Review and Discussion of Related External Efforts
Don stated that he had found two papers which were sufficiently
applicable as to merit positions on them being developed by the
group as well as some notice back to the authors.
- The Bearman Paper on Metadata Requirements for Evidence (Item 6)
Don had provided some notes on the Bearman paper (Item 7) and
others had also read it. There was extensive discussion including
the following comments:
- Joel asked about inviolate records - how do you know it has not
been faked. The whole system has to be secured.
- Mike felt there is a rigor implied here that we traditionally do
not observe. He felt there were some important things implied here
that we should address and make others aware of.
- Lou had read the paper and tried to annotate it with: applicable,
not applicable and "don't understand." He felt more than 50% of
the metadata types were things with which we should be concerned.
The definitions and names Lou felt would be useful for source
preservation and some of these need to be included in our model.
These are good metadata for preservation as well as for archiving
and accessibility. He felt there were good ideas in the paper
- In response to a question, Don stated that it had been sent to
him as a possible help to our efforts
- The paper was seen as emphasizing legal aspects of digital data
preservation
- Structural metadata is an important consideration - from one
system to another
- Don felt the Business Transaction discussion was useful from the
standpoint of data ingest
- It was agreed that the paper is very focussed on a single
business' transaction and doesn't appear to address transactions across
businesses
- What is the conclusion for this discussion? Don has promised to
provide a response to this paper. We should see what we can take
from this archtecture for our model.
- It was pointed out that metadata is introduced as data is
progressed through the system with each step in the process adding
more metadata.
- If our goal is the preservation of data, what is our
responsibility re metadata?
- Mike raised the issue of copyright laws and changes in existing
policies and similarly with regard to software
- Don will develop a response to this paper and indicate some of
benefits which have bee found in it for our work
- Preserving Digital Information (Item 8)
- It appears more as an overview paper
- RLG's focus is to share reference information among libraries
- They support efforts to describe data collections and have had to
move into electronic storage and movement of data. They are
attempting to bring archiving standardization into the electronic age
- There was discussion re charging for this type of service
- Lou felt we should contact these people and tell them about our
effort. He is concerned about two activities going in different
directions.
- ARPA has had a project over the last two years with six different
groups doing digital libraries. Perhaps we should also inform them
of our activities. They are building software - and they are
running into the same general issues as we
- The RLG report addresses migration at length but only at a high
level
- What is fundamental difference between data archiving and
libraries - it is a matter of focus. And a library tends to keep
multiple copies of information. Libraries do not have the
responsibility for guaranteeing long term storage accuracy
- Need to put down some timeline of the various technologies- which
are used in data archiving. One proposed answer was 'about 18 months'
- It was noted that some archiving facilities are required to tag
every data element with production data information. Situations
exist where flags are used to highlight when data needs to be
refreshed such as when software is to be modified
- In the future, archives are going to have to become more selective
as to what data they will store
- The purpose of the Reference Model is to provide a framework. To
what extent should this work address policies? Data archives will
be more expensive than libraries.
- To what extent does an archive want to store non-permanent data.
This requires a policy statement from outside this group.
- From Reference Model perspective, it would be of enormous value if
one organization could simply pick up those resources of another
archive which may be going "Out of Business"
Status of Reference Model Activities
- Lou Reich led the discussion on this subject (Item 5). He
reported that he has received some comments since the last meeting,
including some suggestions of relevant material to look at. He
invited further comments from the attendees. The current issue is
to extend the model into greater detail. One organization's
approach uses context diagrams which Lou feels is a good next step.
He noted that this approach has identified some issues in the basic
model. He is concerned that driving the model too far to the
extent that it will begin to resemble the engineering of the
system. He preferred to leave implementation decisions to the
individual organizations and try to standardize the messages which
are to be exchanged, for example, some type of acknowledge message
(We got the data!) could be standardized while the process of
arriving at this acknowledgement would remain non-standard.
- He posed the question as to whether transporting archives across
different media could be accomplished while still maintaining the
original data - and is this even possible as one crosses media?
He mentioned the Bearman paper in this context.
- What are the minimum data elements needed for data archiving and
preservation; these should be common at some level.
- Lou described the updates which he has made to the model and noted
that these are shown in the Table of Contents
- High-Level Architecture: Some of the comments received relate to
common services (e.g. security, naming services). There were
questions as to how these should appear in the model
- He had added and explained a Policy Management box. Again he
asked for ideas and criticism. It was noted that there are a lot
of things in these areas open to questions
- Lou stated that to him Access and Dissemination are different:
Access is going through the catalog, deciding what you want and
getting ready to make the order. Access includes security aspects.
Dissemination takes things out of storage and prepares/sends the
message. No access service is available except perhaps the status
of the order. Dissemination is initiated by the archive. When I
take a product out and send it to you, this process is under the
archive/dissemination control
- Lou stated that one purpose of this meeting is to identify the
functions which are contained in each box of the high-level
architecture
- The multi-boxes shown for each function are logically "one" but
may be physically distributed
- Lou presented five context diagrams, one for each node in the
Reference Model. He reviewed "Ingest" in some detail to give the
attendees some idea of what is intended for each box. He noted
that a later part of this meeting is to address each diagram in
greater detail and solicit comments from the attendees.
- Goals for this workshop:
Lou foresaw Object Modelling Technique (OMT) being used to go one
more level down while still trying to avoid to many implementation inferences.
He listed as objectives:
- an agreed model
- a table of contents to define the scope of the activity
- more editorial help in this large effort
- established procedures and schedules that can be given wider
distribution for the numerous people who are interested in this
activity but unable to participate directly.
- The question was asked, "What will the reference model be used for
"next." He stated as some of the purposes: to identify APIs or
standards that already exist, to start specifying APIs as
necessary, to provide interface protocols definitions or match
existing activities, to show where standards are applicable, where
do standards exist, what standards are needed, for example in the
standardization of need metadata.
- Don noted that at the last US Workshop, Mike Martin had asked what
we expected the Reference Model to be in two to three years.
Responses were given and Don felt these should not be forgotten.
- The model would also serve as a framework for identifying APIs
and core Interface Definition Language specifications for major service calls
- Some attendees felt the current level of Model detail was about
right. Some did not want the model to show standard interfaces
which makes it too complicated and detailed
- One can never win the battle of standardizing on policy and the
group should never attempt to address this.
- Mike stated that he was looking for a good reference guide to
assist people in designing data archival systems. He felt this
would serve as ammunition for when dealing with new projects
inasmuch as it could serve as a checklist for reviewing data
managements plans within a project.
Reference Model for Digital Archiving Standards (Item 9)
Open Discussion
Overall Model
This item was a major subject for discussion during this workshop
and received a great deal of attention. Following significant
discussion, action items, relative to updating the document for a
subsequent version, were assigned as shown in the ACTION ITEMS
section (above). Comments made include:
- The two boxes, Access and Dissemination, can still be served by
a single client and should be combined. Are these different
services or are we making the distinction as to: is it a product,
is it billable, is the difference in access time?
- There were many comments as to why to keep them separate or to
combine them. Paul felt this is yesterday's approach and people
won't buy it.
- Several alternate models were drawn on the whiteboard:
- Ron showed the IEEE POSIX Reference Model - which Lou later
modified to represent a more current view
- Ron developed another perspective (See Figure 1
[PowerPoint 3.0 (6359 bytes)] )
- Policy and security which are more like third dimension
considerations
- Is an archive also a client and/or a producer?
- What about archive to archive transfer in case one archive closes
- What is the minimum request package that a data archive will accept
- There are many issues in this model including the activity of
migration
- Lou called his "boxes" entities since he did not know exactly what
they are and wanted to avoid preconceived definitions of any name
for them
- Don asked if there is a hierarchy of boxes thereby inferring a
management chain. If they are to be viewed as all equal
there needs to be a management box to make all these
come together. It was stated that they were not hierarchical and a
management box was agreed.
- Archive management would include a lot of functions, including
policy management
- There is the consideration of operations metadata as opposed to
metadata about the data itself
- Mike stated that in the future people more and more want to
interface with the data itself. Joel felt this was the subject of
a different consideration, but the model should be developed so
that this is not precluded
- We need to present a useful view - the "most people" view; one for
designers
- General communities are beginning to talk about different kinds
of metadata
- We need to get an agreed-to model plus a lot of explanatory
rational for the model to help sell this to outside community and
the international community. We also need scenarios, a context
diagram for each box and drive the next model to the next lower
level.
The morning session concluded with a generally accepted - for the
time being - reference model as shown in Figure 2
[PowerPoint 3.0 (8153 bytes)]. Authors are to
write their various scenarios, etc against this model.
Discussion of Context Diagrams
There was extended discussion on each of the five context diagrams:
- Ron would like to see the archive to archive transfer activity
addressed in greater detail. This would allow a bulk transfer to
an existing facility, or to serve as a backup to a prime location
- One concern from a previous session was to look at interfaces with
the outside community and integration to other data bases
- Should data compression and error protection be included in the
Storage entity?
- It is producer's responsibility to assure that data is correct or
otherwise validated and any caveats are included in the submission
- The archive must have some methodology to check some accuracy of
data to be stored - if it is to bear any responsibility for the
data it is storing
- There needs to be guidelines as to data compression/decompression
techniques used.
- Should data compression be a Common Service versus being done in
the Storage entity?
- There will be several categories of metadata - these will be
treated by the reference model
- Need to explain how data archives differ from libraries and from
data systems; may need to have another environment view of what
the above three things are
- What about security services; should both data and metadata be
secured?; should be implied in Common Services
- Product order requests should go straight to Dissemination
- An efficient re-ingest system is needed to go across archives and
to accommodate data changes and losses
- One issue is where is the view of the data model schema maintained?
- Need to define migration
- Need to build an Assumptions list
Lou will integrate these comments into a second set of context
diagrams.
Table of Contents
Following the discussions of the Report's technical contents, the
Table of Contents was addressed. Comments include:
- There now will be an environment model to go ahead of the Reference
Model - add section and definition
- Should key definitions go in an annex?
- Need to show a simple model of what a data product looks like
coming into Ingest with certain classes of metadata - and indicate
how/where additional metadata gets added with a map of how this
created metadata can be traced to original metadata
- Still have not talked about query and browse services
- How much does the metadata model have to distinguish between
searchable and non searchable metadata?
- Reverse sections 4 and 5
- Need Executive summary - but that can come later
- Rename section 3 as Alternative Views or Model Views
- Talk about model components in the high level view
The Workshop was adjourned at 1700 hours, Wednesday, Dec 20, 1995
Wider Views
Overview of the Second US Workshop
Overview of US Effort
Overview of International Effort
URL: http://ssdoo.gsfc.nasa.gov/nost/isoas/us02/minutes.html
A service of
NOST at
NSSDC.
Access statistics for this web are available.
Comments and suggestion are always welcome.
Editor: Robert Stephens (robert.stephens@nasamail.sprint.com) +1.301.949-0965
Curator: John Garrett (garrett@ncf.gsfc.nasa.gov) +1.301.441.4169
Responsible Official: Code 633.2 / Don Sawyer (sawyer@ncf.gsfc.nasa.gov) +1.301.286.2748
Last Revised: January 31, 1996, Don Sawyer (January 30, 1997, John Garrett)