Computational Modeling Systems: Computational Support for Scientific Modeling Activities

Terence R. Smith
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-2966
smithtr@cs.ucsb.edu

Divy Agrawal
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4385
agrawal@cs.ucsb.edu

Tom Dunne
Dept of Geological Sciences
University of Washington
Seattle, WA.
206-543-7195
dunne@bigdirt.geology.washington.edu

Omer Egecioglu
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3529
omer@cs.ucsb.edu

Amr El Abbadi
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4239
amr@cs.ucsb.edu

Oscar Ibarra
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-4171
ibarra@cs.ucsb.edu

Jianwen Su
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3698
su@cs.ucsb.edu

Yuan-Fang Wang
Dept of Computer Science
UCSB
Santa Barbara, CA 93106
805-893-3866
yfwang@cs.ucsb.edu

Many areas of scientific and engineering research require  comprehensive and
integrated  computational  support  for  the   development,  evaluation  and
application of  symbolic models of  phenomena. Activities requiring support
range from the acquisition and manipulation of raw data  to the construction
and evaluation of complex sets of mathematical equations and include:

     1) creation and execution of models at a conceptual level
        appropriate to scientists;
     2) interoperability between external heterogeneous tools
        and the ability to reuse tool specific code;
     3) accessing  of geographically-distributed data, using
        appropriate data abstractions and heterogeneous data
        access mechanisms;
     4) handling of large data objects on a  demand-driven basis
        and the ability to filter the data prior to its access.

A computational modeling system (CMS) provides scientific investigators with
such support.  It comprises a  modeling  environment, which contains a knowledge
base  of  symbolic  representations  of  phenomena, and a  resource  access
environment  in  the  form  of  a "seamlessly" integrated  collection  of
computational modules. The modeling environment of a CMS is based on a
characterization of scientific modeling activities that is focussed on the
manner in which scientific concepts are represented, manipulated,
and evaluated in the scientific modeling process. The representation
of a concept is formalized in  terms of  "Representational Structures"
or "R-structures". An R-structure is a triple {D, T, I} in which

     1) D is the Representational Domain or R-domain;
     2) T is a set of "transformations" that may be applied in D;
     3) I is a finite set of "instances" of D.

R-structures provide a significant generalization of existing constructs in
semantic and object-oriented data models. The process of scientific
modeling may be viewed as one in which

     1) extensible collections R-structures are constructed,
        evaluated and applied in modeling both the phenomena
        in specific application domains and the phenomena of
        the modeling process itself;
     2) instances of the domain elements of R-structures are created
        and sequences of transformations are applied to the instances.

Mechanisms to support inheritance and distinct but equivalent representations
of the same concept have also been developed.

In collaboration with a team of EOS-scientists from University of Washington,
we have built a CMS, Amazonia, which supports modeling in  large-scale earth
science research and, in particular, data-intensive and numerically-intensive
modeling applications. Amazonia has a model-oriented perspective, and is
currently employed to solve modeling problems relating to the flows of water,
sediment, and solutes in the Amazon drainage basin. In principle, however,
there do not appear to be restrictions on the domains of science to which such
a CMS may be tailored. For any application, sets of R-structures may be
created and manipulated using a simple high-level computational modeling
language, CML.  Data  abstractions  needed by scientists  are also handled
in Amazonia. The  tool and  data access systems provide transparent
interoperability between local  and  remote tools  and services. It includes
support for scalability and extensibility, and represents a possible solution
to the integration of legacy systems.

We  have implemented R-structures using the OODBMS O2. To make CMS independent
of O2, we have built  an interface layer  which specifies  the functionality
Amazonia  needs  from the  DBMS using  ODMG mandated  features only. The CMS
engine has been designed  to  handle  dynamic definition of  arbitrary class
structures and handle updates of objects. Transformations in  Amazonia can be
written in CML or  can be  external code blocks like FORTRAN, C executables
or package tools like MATLAB, Mathematica, KHORUS. Techniques have been develop*
for the asynchronous handling of multiple tools with the concept of
pseudo-terminals.

A new open layered,  peer-to-peer  protocol, CMS-Resource Access Protocol
(CMS-RAP), based on the Hyper-Text Transfer Protocol(HTTP) has been
developed to transfer code and data to remote sites, and compile and
execute remote tools. We have extended the protocol to be able to
interface with tools on parallel machines as computation resources.
This way, for the sake of computational efficiency, programs can be executed
on parallel machines, and their outputs be used subsequently in CMS.
Ability exists to read  data from remote sites on a demand-driven basis
rather than store a local copy  of the data at each client site.

A simple GUI based on a visual representation of CML supports the easy
construction and manipulation of scientific modeling concepts in general
and of the concept of a ``model'' in particular and provides access to
the modeling environment of the CMS.