1. Introduction

1.1 Purpose of Guidelines

These Guidelines represent the first stage in the development of metadata services for the discovery of data resources that have a geographic component. They have been developed as part of the National Geospatial Data Framework (NGDF). Their aim is to provide a consistent and simple method of documenting any data resources that are referenced in some way to the earth’s surface whether by coordinates or geographic identifiers (addresses, administrative area, component of postcode). The consistent recording of metadata or information about data resources and their presentation in catalogues accessible to the user community via the Internet considerably facilitates the discovery of such data resources.

It is estimated that 80% of information has some relationship to the earth’s surface or, to put it another way, it is geospatially referenced . These Guidelines have been designed to be usable with any data resources that have some sort of geospatial reference whether these are maps, imagery, textual or numeric.

There are a number of metadata standards in existence or under development. None of those existing were found to meet with the requirements of being simple and applicable to the full range of data resources that are geospatially referenced. In developing these Guidelines metadata standards which are not specifically for geospatial data such as the "Dublin Core" [3] have also been examined.

These Guidelines have been produced for use in the United Kingdom. However, with the evolution of global information infrastructures, developments outside the UK cannot be ignored. Therefore the Guidelines must be regarded as interim pending the development of standards at the international level such as those being developed by Technical Committee 211 of the International Organisation for Standardisation (ISO).

The Guidelines will be reviewed and enhanced in the light of comments received from interested parties and developments in international standards.

1.2 About this document

Working Group 2 of the NGDF Task Force has prepared this document. The Working Group is responsible for the facilitation of NGDF compliant metadata services. It is made up of representatives from major vendors, national mapping agencies, government departments and members of the academic community.

The document is in seven Sections and has seven Appendices. The remainder of the introduction provides a brief background to the NGDF and describes what metadata and the proposed NGDF metadata services mean. Section 2 defines the scope and application of these Guidelines. Section 3 is an overview of the discovery metadata and Section 4 is the Guidelines themselves. A transfer format is defined in Section 5 and a communications protocol in section 6. Conformance requirements are given in section 7.

 

Worked examples are provided in Appendix A. Appendix B provides a UML diagram illustrating the structure of the discovery metadata. Appendix C presents a mapping between the NGDF Discovery Metadata and the Dublin Core. Appendix D provides a mapping to XML and Appendix E gives a worked example in XML. Appendix F suggests a list of abbreviated field names. Appendix G provides more detail about the content of the NGDF website.

1.3 The National Geospatial Data Framework

More than ten years ago, the Report of the Committee of Enquiry into the Handling of Geographic Information [6], chaired by Lord Chorley, established that the use of different geospatial data for the same area was crucial but difficult because of:

In the UK today, over 40 government departments and other organisations produce geospatial data for their own needs, spending approximately £400 million annually on data collection and assembly.

Much of this data is not available for reuse or is collected in ways that make it difficult to use with other datasets. Most datasets remain poorly documented. There are still inconsistencies in the ways different computer systems treat sophisticated geospatial data.

The National Geospatial Data Framework is designed to reduce these problems and ultimately to:

1.4 Metadata

1.4.1 The Requirement for Metadata

If geospatial datasets are to become more widely available and their true value realised by the community, then methods must be found to enable the public to discover their existence and determine if a particular dataset is appropriate for their needs.

This means that information about the data must be made available. Metadata is the term used to define this "data about data". The metadata itself has to be easily accessible and provided in such a way that the public can obtain information that will allow them to compare the suitability of data from different sources. It is therefore necessary for there to be standard ways of presenting metadata that will allow these comparisons to be made.

With the exponential growth in the use of the Internet and its adoption as a service within organisations (Intranets) a means is now available to use World Wide Web based search tools to locate sources of geospatial information. This increases the importance of providing standard methods for describing data that can be analysed by computer based systems.

 

Metadata content should vary according to purpose, for example:

Discovery metadata provides sufficient information to enable an inquirer to ascertain the existence of data that is fit for purpose and to reference some point of contact for more information. If after discovery more detail is needed about individual datasets, then more comprehensive and more specific metadata is required. If the data is transferred as a single dataset then quite specific and detailed metadata is needed possibly down to the feature, object or record level.

Thus, not only can metadata content vary according to purpose; it can also vary according to scope of the data being defined. Discovery metadata usually, but not exclusively, relates to collections of data resources or dataset series that have similar characteristics but relate to different geographic extents or times. A map series is the commonest example but it can equally be applied to statistical surveys. More detailed metadata may be applied to a collection or series but may apply to an individual dataset (e.g. one map tile). Transfer metadata applies exclusively to that transfer.

For successful discovery of data resources which are geospatially referenced the following are also essential:

The Guidelines set out in this document relate to discovery metadata only. The use of a thesaurus is recommended.

The rest of this section describes how NGDF metadata services might look and then goes on to explain discovery metadata, detailed metadata and thesauri in more detail.

1.4.2 NGDF Metadata Services

These Guidelines have been developed on the assumption that metadata services will be implemented as Internet/Intranet applications and that a gateway infrastructure will be supported to provide a national focus and to encourage compliance to the guidelines. A diagrammatic illustration of the way the services is to be structured is given in Figure 1 below.

 

Figure 1 - NGDF Gateway Metadata Infrastructure

The NGDF Gateway supports routing facilities to other service providers and Data Producers and a Metadata repository for those data providers unwilling or unable to support their own service. Service providers are free to support enquires from sources other than the gateway and may not even be NGDF compliant. The Central Gateway, will itself provide support and encouragement to adhere to the standards and provide a central focus, where minor data providers may chose to register their metadata. Helpdesk facilities and impartial advisory services may also be developed.

It is the intention of NGDF that all data flow through this system should be achieved using XML as a transfer file format and Z39.50 as the basis of a communications protocol. The application of XML and Z39.50 by NGDF is described in sections 0 and 0.

1.4.3 Discovery Metadata

Discovery Metadata is the minimum amount of information that needs to be provided to convey to the inquirer the nature and content of the data resource. The requisite information falls into the following broad categories;

Computer databases accessible on the Internet that contain metadata entries will be the first point of call for people searching for specific datasets. Indeed for some datasets the Discovery Metadata may be all that is needed for the inquirer to locate and obtain the data needed.

The provision of a Discovery Metadata component also recognises the considerable amount of work that some data producers may be required to undertake in order to prepare a detailed metadata definition. It is intended that there will always be a direct mapping between Discovery Metadata and Detailed Metadata. This will ensure that no discrepancies exist between the two definitions and that the Discovery Metadata can be automatically extracted from the Detailed Metadata thus saving time and effort.

A concise listing of Discovery Metadata is given in Section 3.3 and Guidelines for its preparation are given in Section 4.

The Dublin Core [3] is a similar initiative in the definition of discovery metadata. Further information and a mapping between these Discovery Metadata Guidelines and the Dublin Core is given in Appendix C.

1.4.4 Detailed Metadata

Detailed metadata standards that provide for an exhaustive definition of all aspects of various types of geospatial data are currently under preparation by a number of bodies. The principals of these within an international context are:

The United States Federal Geographic Data Committee (FGDC) have developed a standard known as the Content Standard for Digital Geospatial Metadata [4] that has strong parallels with the draft ISO standard.

There is a further initiative driven by the Open GIS Consortium (OGC)[7]. The OGC consists of a collection of organisations that are striving to define standards for the interoperability of Geographic Information Systems.

It is hoped that these standards will converge, indeed the ISO, FGDC and OGC standards already have a great deal in common. However, it will be some time before these initiatives bear fruit and so the Guidelines are provided as an interim solution pending the final release of the above standards.

The definitions provided here are closely allied to the draft ISO standard. It is anticipated by the Working Group that these Guidelines will remain unaltered until the publication of the ISO standard in 2000 at which time it will be modified to match the ISO standard or possibly become a profile of the ISO standard.

1.4.5 Thesauri

When searching for information, the inquirer may not find any references based on the words used to describe the information sought. This problem can be overcome by use of a thesaurus.

In the context of metadata and other electronic documents, a thesaurus is a tool for the organisation and retrieval of information in electronic materials. It allows data to be indexed and retrieved in a consistent manner. It permits the display of hierarchies of concepts and ideas, leading the user, whether as indexer or information seeker, to define his or her search in terms that are most likely to lead to the retrieval of relevant information.

For example, it will allow improved information retrieval by providing successful searching on synonyms - if the user enters the term "farming" the thesaurus will find the term "agriculture". Hierarchies of meaning can be shown - the term "Great Britain" may retrieve data indexed with that term but could also expand the search to retrieve data on England, Wales and Scotland which have been indexed under those three terms. The term "meals on wheels", although in a hierarchy of terms related to food, will also be linked to concepts relating to personal social services and to the different categories of recipients and a user can elect to follow and retrieve these related terms.

Consistent searching for metadata will be achieved if all those who prepare metadata use the same thesaurus. However different industries will have a requirements for specific thesauri and there is no current implementation which will provide a simple search algorithm for the casual or general enquirer. NGDF, cognisant that organisations will want to decide on there own specific keywords assignments, recommend that data providers utilise their own thesaurus or keyword selections in addition to using at least one of the predefined NGDF keywords. NGDF also recommend that data providers investigate the use of the thesaurus known as HASSET (Humanities and Social Science Electronic Thesaurus) which is maintained by the Data Archive at the University of Essex. The URL for this service is: dawww.essex.ac.uk/services/nhasset.html.

1.5 Comments on these Guidelines

Users of the Guidelines are encouraged to send any comments concerning their scope and applicability to the NGDF Working Group 2 at ngdfwg2@ordsvy.gov.uk. Or through the NGDF web site (www.ngdf.org.uk).

1.6 Glossary of Terms & Abbreviations

   

ASCII

American Standard Code for Information Interchange

CD-ROM

Compact Disk – Read Only Memory

CGM

Computer Graphics Metafile

Data collection

Data captured to a common specification or to a common standard

Dataset

Identifiable collection of data

Dataset series

Data captured to a common specification or to a common standard

Discovery Metadata

A subset of a dataset’s metadata as defined by these Guidelines

DXF

Drawing Exchange Format

Geospatial data

Data defined by reference to a position on the Earth’s surface

GIF

Graphic Interchange Format

JPEG

Joint Photographic Experts Group Format

Metadata

Concise descriptions of a data resource

Metadata element

An individual characteristic of a data resource

NGDF

National Geospatial Data Framework

NTF

BS7567 National Transfer Format

Tag

XML notation to signify the beginning or end of an element data entry. The syntax for a tag is <tagname> for a start tag and </tagname> for an end tag.

TIFF

Tagged Image File Format

URL

Uniform Resource Locator

W3C

The World Wide Web Consortium

www

World Wide Web

XML

Extensible Markup Language, a subset of SGML (Standardised General Markup language)

Z39.50

Protocol to enable querying of multiple remote heterogeneous databases.

1.7 References

[1] ISO/TC 211 Working Draft 15046-15 Geographic Information – Metadata (URL: www.statkart.no/isotc211/).

[2] CEN/TC 287 Draft European Standard prEN 287009 Geographic Information – Data Description – Metadata; AFNOR, Paris.

[3] Dublin Core, Online Computer Library Centre (OCLC) (URL: purl.org/metadata/dublin_core). (Note: there is an underscore "_" between dublin and core.)

[4] Content Standard for Digital Geospatial Metadata, Federal Geographic Data Committee, c/o U.S. Geological Survey, 590 National Center, Reston, Virginia 20192 (URL: www.fgdc.gov).

[5] ISO/TC 211 Working Draft 15046-5 Geographic Information – Conformance and Testing (URL: www.statkart.no/isotc211/).

[6] Handling Geographic Information, Report of the Committee of Enquiry chaired by Lord Chorley, 1987, published by Her Majesty’s Stationary Office, London. ISBN 0 11 752015 2.

[7] The Open GIS Consortium can be contacted at www.opengis.org .

[8] BS 7666 Spatial datasets for geographical referencing, Part 3 Specification for addresses, available from the British Standards Institution, Linford Wood, Milton Keynes, MK14 6LE.


Back to Contents
Scope and Application of the Guidelines