Daniel O. Nelson, Robert J. Krumm, Sally L. Denhart, Sheena K. Beaverson
Illinois State Geological Survey

Illinois Natural Resources
Geospatial Data Clearinghouse
Project Overview

This paper was presented at the 1997 ESRI User Conference under the title:

Arc/Info Solutions to Metadata Problems:
Building a Solid NSDI Clearinghouse Node on a Shifting Metadata Landscape



Abstract
Introduction
Project Background
The Value of Metadata
Adopting a Standard Metadata Format
Staff Training
Tools Used to Produce Metadata
Document.aml: A Review and an Alternative
Summary
Acknowledgments
References
Author Information


Abstract

The 1996-97 Illinois Clearinghouse Node project of the National Spatial Data Infrastructure (NSDI) is a multi-agency effort, led by the Illinois State Geological Survey (ISGS), to make metadata and digital geospatial data about Illinois natural resources available on the Internet. The primary objectives are to generate metadata compliant with the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM), develop a clearinghouse node to support search and retrieval of the metadata, and offer the results as a model for other organizations in Illinois. This is a project status report, emphasizing the value of metadata and metadata generation methods applicable to UNIX Arc/Info users.

ml) was developed from Document.aml to extract Arc/Info coverage metadata and pass it to Xtme for subsequent metadata collection. A metadata collection system using Fgdcmeta.aml and the FGDC tools Xtme and mp to produce ASCII metadata text is recommended.


Introduction

The Illinois State Geological Survey (ISGS), in cooperation with other offices and divisions of the Illinois Department of Natural Resources (DNR), has implemented a National Geospatial Data Clearinghouse Node dedicated to serving digital geospatial information about Illinois natural resources. The node contains metadata and GIS data for geological, hydrological, natural resource, historical, administrative and infrastructural issues.
The NSDI encompasses policies, standards and procedures for organizations to cooperatively produce and share geospatial data. The FGDC has assumed leadership in the evolution of the NSDI in cooperation with state and local governments, academia, and the private sector (FDGC, 1997).

The DNR units participating in this project are:

Each has contributed metadata and GIS data to be served and accessed on the Illinois node, making available such data as the Illinois Public Land Survey System (county, township and range, and section lines), bedrock and Quaternary geology maps, wetlands and streams, landfill inventory, fish and wildlife areas, land cover, political boundaries, municipal boundaries, roads and railroads, and much more. Metadata for approximately 100 GIS data sets are currently available, with at least 200 data sets yet to be added.

To support future intra- and inter-agency clearinghouse efforts in Illinois, a standard minimum set of metadata elements conforming to the FGDC Content Standard for Digital Geospatial Metadata (CSDGM) was developed by the participating agencies. It is hoped this effort will serve as a prototype for other agencies in Illinois and will stimulate the development of a fully integrated system of clearinghouse nodes connecting Illinois state agencies with users of digital geospatial data nationwide.


Project Background

The Scientific Surveys and other offices and divisions of the Illinois DNR have an established history of publishing and distributing information to the public, other government agencies, academia, and industry. The ISGS has used GIS technology since 1973, and since then has witnessed an increasing demand for digital geospatial data from all of these sectors. The project participants have worked together several times to meet this demand: by direct distribution of data to end-users, in cooperative projects with other organizations, through significant contributions of digital data to two multi-agency CD-ROM compilations, and by taking prominent roles on the Illinois Geographic Information System (IGIS) Committee. We have participated in numerous national, state, county, and local GIS projects to serve the environmental, geological, socioeconomic, and civil planning needs of Illinois and the midwest United States. As a result, the project partners have several hundred individual Arc/Info data sets available for analytical use.

The participants' experiences with building in-house GIS databases and sharing digital data with others have demonstrated the importance of thorough documentation; it is an essential part of any geospatial data set. However, most of our GIS data have been documented by the many individual creators, using a variety of styles, methods and computer platforms. Because most of our efforts are project driven, there has been little formal maintenance and update of our metadata holdings. To address this situation, we have, over the last three years, implemented several small pilot efforts focused on metadata generation and distribution. These were dedicated predominantly to examination of various metadata collection tools, namely the United States Geological Survey (USGS), Arc/Info, and Bureau of Land Management (BLM) versions of Document.aml, the United States Army Construction Engineering Research Laboratory (USACERL) Corpsmet tool for PC, the National Biological Service (NBS) WordPerfect Template for FGDC Metadata, the NBS MetaMaker tool, and the USGS mp and Xtme tools. In addition, the ISGS generated a Metadata Table of Contents in ASCII format for use with a PERL-based search tool, listing an absolute minimum number of eight descriptive metadata elements for 400+ data sets, and all participants worked together to generate an ASCII-based template for a subset of the CSDGM for use with the 1996 Illinois DNR geospatial data CD-ROM. These efforts, and the needs that engendered them, led to the development of the current project.


The Value of Metadata

From an institutional point of view, this project has provided a number of benefits related to a better organized and documented GIS database. The project participants were provided with an opportunity to examine the overall organizational quality of their GIS data holdings. The ISGS GIS database, for example, was found to be a combination of documented and organized coverages along with undocumented coverages that were not easy to locate or understand. Working with the database has generally not been problematic because the people responsible for generating the information are still on-staff, available for questions. However, staff turn-over is inevitable, and we recognize the potential for loss of valuable, undocumented data histories.

The overall makeup of the in-house GIS user base in Illinois state agencies, and in many other organizations, is changing from a core of dedicated Arc/Info specialists to a much larger group that includes many ArcView users. Along with this evolution comes a database management responsibility to provide and maintain a GIS database that is relatively easy to access, understand and use. Although many of the ArcView users will likely be satisfied to work with a standard set of project files, we expect many others will want to learn what additional data are available on our network of servers. It is especially important to provide searchable metadata to these users so they can locate and use the available data to the greatest extent possible.

Comprehensive metadata allow for better management, control, and protection of the data investment, by providing information on identification (name, description, purpose, version, location); quality (accuracy, completeness, currentness); lineage (sources, processing steps, previous versions); and contact personnel (Who do I call?). With this information, the GIS manager can, for example, improve data catalogs, assign revolving update and review dates, and retain a record of processing and revision histories. Institutional control is enhanced by specifying the proper (and improper) uses of the data, by applying access, distribution and security policies, and by supplying all users with a uniform product. These controls may also serve to protect an organization by limiting liability for misuse of a GIS data product.

Further, metadata can be used to leverage data resources and generate supplementary benefits. For example, redundant data collection and preparation can be avoided, saving resources. Existing data can be combined to create new products, expanding the GIS resource base. The data catalog can be used as a GIS Portfolio to promote data resources and negotiate data exchanges. Older data can be made useful through donation to schools, libraries and communities, thereby generating goodwill.

Finally, on-line, over-the-net, real-time GIS using ArcView will be a reality in the future. Ultimately, data users will locate and access various GIS data layers on potentially several different Internet servers, and without ever actually downloading or possessing the data, immediately combine them within a single GIS application (perhaps ArcView) and perform a spatial analysis. It is imperative that data made available for this type of activity are thoroughly described so that the compatibility of data can be assessed, and so proper and improper use is made plain. The metadata efforts expended on this NSDI project will help prepare the state agencies of Illinois as they move toward participating in this sort of "live" GIS on the Internet.


The Illinois Metadata Experience

One of the greatest concerns being addressed by this project is the need for a flexible, adaptable metadata system. We wish to provide as much useful metadata as possible, while avoiding commitments of time, effort and training to metadata tools and formats that could become obsolete quickly. Four major questions that are defined by this concern are:
  1. What specific metadata format should be adopted?
  2. How much and what types of staff training should be provided?
  3. Which metadata tools should be used?
  4. How can the problems with Document.aml be solved?

Adopting a Standard Metadata Format

The FGDC metadata standard has been undergoing revision over the last year. Personal communication with members of the FGDC metadata committee indicate that the standard will probably not change drastically. Rather, some metadata elements will be redesignated as "core", "recommended if applicable" or "optional" (or something similar), and a standard method of adding "user-defined" metadata elements will be instituted. This will give users of the standard more freedom in the way they choose to apply it, while maintaining relative uniformity. The implication is that it is likely that any relatively comprehensive metadata based on the existing version of the metadata standard will also comply with the revised standard. Nonetheless, project participants decided that to formally adopt a specific metadata format based on a soon-to-be-replaced standard was ill-advised. However, they informally agreed to continue to produce FGDC compliant metadata using the set of elements previously identified for use with the Illinois DNR GIS Data CD-ROM (Illinois DNR, 1996). This metadata element set consists of CSDGM sections 1 (Identification Information) and 7 (Metadata Reference) and substantial parts of other sections, as applicable. Although the difference is subtle, proceeding in this manner has placed the participants in a position to better assess and recommend a formal metadata format for Illinois data after the revisions to the CSDGM are complete.

Staff Training

This is a pilot project, hence training of staff not directly involved with the project has been kept to a minimum. There are two reasons for this. First, the tools and techniques needed by the metadata developers are not necessarily those needed by the data developers or the data users. It is more prudent to establish a prototype clearinghouse node based on other successful clearinghouse node efforts, assess the results, and refine the product. Then the response of data developers and users can be evaluated to determine the type and scope of training required. Second, as previously mentioned, the FGDC metadata standard is being revised. Project participants are already familiar with the standard, and training of others in the standard is not an immediate necessity. The project participants decided that it would be inefficient to provide training in the current version and, shortly thereafter, have to retrain for the revised version.

Tools Used to Produce Metadata

Initially, because most of the participants in this project are dedicated Arc/Info users, Document.aml was the tool of choice for collecting metadata. It is sufficient (albeit occasionally unstable) for producing on-line metadata. However, when the transition was made to producing FGDC compliant metadata, new problems with Document.aml were discovered. Some were related to program bugs, and others were related to the post-processing of Document.aml output. (The problems of Document.aml and one solution to them are discussed in the next section.) This situation prompted a search for metadata tools that were less software-specific and more basic and adaptable. The report by Mitre Corporation (1996) provides an excellent review of metadata tools.

The most flexible format for metadata is ASCII text. It is the most platform- and software-independent vehicle available. Although primitive in many ways, it can be imported into virtually any more-sophisticated text manipulation or word processing software, and if necessary, manipulated directly with system level commands. The FGDC recognized this and has developed excellent ASCII-based metadata compilation tools for UNIX and other platforms, to support the development of the NSDI Clearinghouse network.

A metadata collection system developed for the FGDC by Peter Schweitzer (1997) of the USGS is the most straightforward. This system comprises three tools for UNIX:

  1. Xtme (Xt Metadata Editor),
  2. mp (Metadata Parser), and
  3. cns (Chew 'n Spit).
Xtme is an X Windows application that provides a list of metadata elements in outline form from which the user can pick and choose to build a properly formatted metadata document. Help is provided for every metadata element, describing what information is appropriate for the field, and if applicable, a list of standard values. Mp is a text parser that checks metadata files (i.e. the output from Xtme) for proper format, order and element values, issuing warnings or errors when problems are identified. It is used reiteratively to prepare metadata documents for indexing by the NSDI Clearinghouse Node server software. It is also used to generate output files in text, HTML and SGMLformats. Cns can be used to "clean up" existing metadata files that may have been generated by hand or by some other tool. It will, for example, remove leading section numbers from metadata elements if they are present so that the file can be run through mp.

The advantage of the cns/Xtme/mp method is that once the metadata files are generated, they are entirely independent of the tools that generated them. The ASCII metadata files are not dependent on any software (other than the operating system) to maintain their viability and accessibility. Even in the unlikely event that the FGDC would cease all development and support of these tools, an existing ASCII-based metadata database will not be detrimentally affected.

Document.aml: A Review and an Alternative

Document.aml (ESRI 1995) is a metadata generation tool for Arc/Info that was created by staff of the Water Resources Division of the USGS and subsequently included with recent releases of Arc/Info. For the purposes of this section, it is assumed that the reader has some familiarity with the Arc/Info document atool.

Over the last three years, the ISGS has used Document.aml as its primary metadata collection tool. In this time there have been five or six generations of the tool: USGS ver. 2.0.2; Arc/Info vers. 7.0.2, 7.0.3, and 7.0.4; Blmdoc from the BLM; and the most recent USGS release. (ESRI version 7.1.1 has subsequently been released but is not included in this review.) The concept of Document.aml is excellent and the program has proven useful for creating on-line documentation for individual Arc/Info data sets. However, the mechanics of the program have shown several problems, especially in generating FGDC formatted metadata. These problems can be attributed to three primary causes:

  1. Document.aml was written prior to the advent of the FGDC metadata standard, and the two formats are dissimilar,
  2. there have been several different versions of Document.aml in a short time, and not all are compatible with each other, and
  3. Document.aml forces Arc/Info to become a data entry interface, something it was not built to do.
Some of the specific problems encountered are: In addition, we understand that ESRI is developing a new metadata tool, which casts some doubt on the future development and support of Document.aml. Because of these problems, the ISGS has chosen not to continue using Document.aml for the collection of FGDC metadata. Document.aml does have, however, some impressive functions including: Not wanting to forego the convenience of these functions, the ISGS is using Document.aml as a template to develop a related AML called Fgdcmeta.aml. This tool retains the excellent automatic data gathering routines of Document.aml, but discards entirely any manual data entry within Arc/Info. All data entry is done in Xtme or another editor. The new aml consists of line commands only; no menus. The main function is described in four steps:
  1. The user issues the fgdcmeta command in Arc/Info.
  2. Data that can be automatically gathered (DESCRIBE data, etc.) are collected.
  3. The data are written to a user defined FGDC CSDGM template. This template can be created by the user using Xtme or any other ASCII text editor.
  4. The "skeleton file" (template with describe data) is ported out to Xtme (or other text editor) for subsequent editing.
The approach used in Fgdcmeta.aml has the following advantages over Document.aml: There are disadvantages to using Fgdcmeta.aml. It is not as robust as Document.aml. In its current form, it only supports a one-time query of DESCRIBE data from coverages, grids, and tins for the purposes of generating FGDC compliant metadata outside of Arc/Info. It does not write the metadata to an INFO file or support update of an existing metadata file (although those options are being explored).

The present usage for Fgdcmeta.aml is:

fgdcmeta <geo_dataset> (view | create)
The create option is as described above. The view option (default) displays an existing metadata file. Currently, this option is very system dependent, requiring all metadata files to be stored in a single system directory. (These are in fact the same metadata files that are served on the NSDI node.) The AML code must be edited to indicate the proper directory.

It is anticipated that when development is complete, the usage will be as follows:

fgdcmeta template <template_file>
fgdcmeta <geo_dataset> (view | create | update)
fgdcmeta <geo_dataset> putinfo <metadata_file>
The template option will allow the Arc/Info administrator to define the institutional FGDC metadata template to be used by all users. It is intended that the template will be created with Xtme and checked for integrity with mp, although this will not be an absolute requirement. The update option will update an existing metadata file with current DESCRIBE values from the related Arc/Info data set. The putinfo option will write the metadata file as an INFO file of the appropriate data set. Note, however, that Document.aml currently has problems writing text files to INFO files because of the 80-characters per line limit. If these limitations cannot be overcome, and the integrity of the text absolutely guaranteed, then the update and putinfo options may be abandoned. It is more important to protect the integrity of the metadata file than to have a copy attached to the data set in INFO file format.

When complete, Fgdcmeta.aml will be available on the Illinois NSDI Clearinghouse Node and possibly at the FGDC NSDI node. Check the Illinois Clearinghouse Node for updates.


Summary

For dedicated Arc/Info users, the past year was an uncertain time in terms of generating FGDC compliant metadata and establishing a NSDI Clearinghouse node on the Internet. The FGDC Content Standard for Digital Geospatial Metadata was in a state of flux and the viability of Document.aml was in question. The danger was not in the task itself, but in the commitment to the specific metadata policies, tools and designs required to complete the task. At issue were strategies to maximize metadata productivity in the near term, while avoiding work practices with a recognized potential of obsolescence in the longer term. The value of metadata is undeniable, but only if it efficiently supports the GIS enterprise.

This particular metadata generation effort is young, and the relatively new concept of metadata is dynamic. Thus, the guiding principle of this on-going project has been to maintain maximum flexibility in the metadata product so it is free to evolve with the changing concepts and procedures applied to the compilation of metadata. This philosophy has led to the adoption of FGDC methods and tools for metadata generation. Xtme and mp are recommended for use in generating FGDC compliant metadata. The FGDC is involved at the national and international levels in the development and application of metadata standards. It can be assumed that where the FGDC leads, metadata will follow. However, the tools and methods of the FGDC should produce an ASCII text-based metadata product whatever the mode of generation. Such a product protects the producers the metadata investment. If the FGDC method is generally accepted, then those who use it will be at an advantage. If the FGDC method does not become the generally accepted method, then metadata prepared in ASCII format with FGDC metadata tools are still in the most universal format, and can be easily re-cast into the prevailing format.

The use of Document.aml to produce FGDC compliant metadata for an NSDI Clearinghouse node is not recommended. It has software and design flaws that make it less efficient to use than other tools. We have developed a simpler and faster AML program called Fgdcmeta.aml from Document.aml. The program provides a more direct path to FGDC metadata generation by writing Arc/Info DESCRIBE data to a pre-formatted file which is subsequently edited in Xtme. Fgdcmeta.aml is recommended for first-time generation of metadata for Arc/Info coverages, grids and tins. It is, however, still in development, and a function for successive automatic update of DESCRIBE values is not yet complete.


Acknowledgments

Funding for this project was provided by the FGDC 1996 Competitive Cooperative Agreements Program (CCAP) administered by the USGS. The authors wish to thank the following DNR representatives for their participation on this project: Although they may not know it, Doug Nebert and Peter Schweitzer of the USGS provided invaluable assistance in metadata generation, clearinghouse node operations, and the development of Fgdcmeta.aml.


References

Environmental Systems Research Institute, Inc. (1995) Document.aml (several versions through ESRI version 7.0.4), ESRI, Redlands, California, original programming by D. Nebert and M. Negri (USGS), and M. Hoel (ESRI).

Federal Geographic Data Committee Web Site, http://fgdc.er.usgs.gov/fgdc.html.

Federal Geographic Data Committee (1995) Content Standards for Digital Geospatial Metadata Workbook (March 24), Federal Geographic Data Committee, Washington, D.C

Illinois Department of Natural Resources (1996) Illinois Geographic Information System CD-ROM of Digital Datasets of Illinois, Illinois Department of Natural Resources, Springfield, Illinois, 2 vols.

Mitre Corporation (1996) Metadata Tool Evaluation.

Schweitzer, Peter (1997, most recent update) Chew 'n Spit (cns), Metadata Parser (mp) and Xt Metadata Editor (Xtme) Metadata Tools, United States Geological Survey, Reston, Virginia, http://GeoChange.er.USGS.gov/pub/tools/metadata.


Author Information

Daniel O. Nelson
Associate Staff Geologist
Illinois State Geological Survey
615 East Peabody Drive
Champaign, Illinois 61821
USA
Telephone: 217-244-2513
Fax: 217-333-2830
Email: nelson@muck.isgs.uiuc.edu


Return to the Illinois Geospatial Data Clearinghouse.