The NGDF Gateway is fundamentally linked to the availability of organisation's
metadata repositories. NGDF recognise that the Gateway will not be the key
impetus for agencies to collect metadata but instead will assist in educating
the industry on the importance of and issues related to metadata.
Metadata is one of those terms that is conveniently ignored or avoided. However
there is an increasing recognition of the benefits and requirement for metadata
for our data as we continue to increase the use of digital data. Whereas
cartographers rigidly provided metadata within a paper maps legend, the
evolution of computers and GIS has seen a decline in this practice. As
organisations start to realise this legacy, they have slowly started to look at
issues of data management and metadata. There are a number of issues, which are
then encountered in deliberating on an approach to metadata documentation:
Metadata is the term used to describe the summary information or characteristics of a set of data. In the area of geospatial information or information with a geographic component this normally means the What, Who, Where, When and How of the data. The only major difference that therefore exists from the many other metadata sets being collected for libraries, academia, professions and elsewhere is the emphasis on the spatial component - or the where element.
Having this summary of the data is no different from the summary information that exists for many items in everyday life. A mail order catalogue provides metadata, which summarises the basic information about electrical goods, whereas labels on food products provide statements on ingredients, nutritional value and manufacturer. It is this metadata which is looked at by the consumer or data user to determine whether it is fit for the purpose for which they wish to use it. Geospatial data is no different.
These different levels might therefore be described as:
Metadata is derived from the Greek root meta, "change" and data and so refers to the changes that take place in data (and therefore the need to document these changes creating a summary of data about data)
The Report of the Committee of Enquiry into the Handling of Geographic Information (the Chorley Report) established that the use of geospatial data was crucial but was difficult because of
Now over ten years later we hear the phrase "information is power". With an increasing amounts of data being created and stored (but sometimes not organised) there is a real need to document the data for future use - to be as accessible as possible to as wide a "public" as possible. There are significant benefits in doing so:
Discovery Metadata is the minimum amount of information that needs to be provided to convey to the inquirer the nature and content of the data resource. This falls into broad categories to answer the what, why when who, where and how questions about geospatial data.
What - title and description of the dataset.
Why - abstract detailing reasons for the data collection.
When - when the dataset was created and the update cycles if any.
Who - originator and data supplier.
Where - the geographical extent based on lat / long, co-ordinates, geographical names or administrative areas.
How - how to obtain more information or order the datasets, formats, media access, constraints.
The broad categories are only few in number to reduce the effort required to collect the information whilst still conforming to the requirement to convey to the inquirer the nature and content of the data resource. However some categories require additional elements for full definition and therefore the total number of elements is 42. For example supplier name covers eight elements - name, address, contact, telephone number, etc.
How the NGDF Metadata Service Will Work
The service envisaged will be structured as in Figure 1 with a Gateway supporting links to other service providers and data producers but also a metadata repository for those providers unwilling or unable to support their own services.
The user will make enquiries of the system based upon location and a set of key words. A search mechanism will then identify the required data sources enabled by look-up tables for key words and geographical names from which the user can select to make searches more effective. The look up tables will provide a list of administrative names from which to chose locations and the key words will be used for selecting themes or categories of metadata.
Having identified those datasets of interest based upon an evaluation of the discovery metadata the user will be able to link, via the web, to sites set up by data providers, data brokers and service providers for more information or to buy the data.
Considerable debate across the world centres on metadata and those characteristics that should be chosen to best describe the dataset. There are discussion groups, seminars and conferences and quantities of paper generated in the debate about the subject. International standards are being set by a number of organisations all designed to ensure that a degree of consistency exists. Examples of different international standards are:
Unfortunately there is no standard within all these standards although they all have a great deal in common and it is hoped that they will converge.
At present the International Standards Organisation is proceeding with the development of various standards for the Geomatics profession. Unfortunately the standards are not due for adoption until some time in 2001. The US has developed guidelines for geospatial data, which have gained acceptance in the US and from which a number of metadata tools have been developed in support.
NGDF have also developed guidelines to promote the core metadata elements that are required if users are to promote their data through the NGDF Gateway Project. The NGDF guidelines have been developed in line with the emerging ISO standard and will help to form a UK profile of the ISO standard when it is ratified.
Why the standards are necessary
This standard and consistency is necessary to ensure that comparisons can be made by data users about the suitability of data from different sources. This means for example when comparing metadata about property or hazardous waste there is an indication of the dates to which the information refers or if comparing metadata about different map sources the relevant scales are shown. Without this standard meaningful comparisons cannot be made.
The standards stated above have different ideas about what characteristics should be included. The FGDC standard for example includes 334 elements but this obviously goes into the detail of the information. To derive all these elements the data provider requires spending considerable time and resources collecting this information and for the data user this detail might be greater than required for an initial investigation. In many situations therefore different levels of metadata need to be defined with the ability to "drill down" into increasing levels of detail. Metadata should therefore vary according to purpose.
Ideally metadata specifications should be linked to a standard. The use of standards is that they have been developed through a consultative process (with other "experts") and provide a basis from which to develop individual profiles or sub standards. It is also highly likely that if the standards are well accepted that various commercial and operational tools exist to support the standard. NGDF are promoting that organisations use, as a minimum, the NGDF Guidelines to improve the knowledge, awareness and accessibility of the UK geospatial data resources.
At the top level NGDF is promoting and setting up a service based upon Discovery of data resources. This Discovery metadata provides sufficient information to enable an inquirer to ascertain that existence of data fit for purpose exists and to reference some point of contact for more information. If, after discovery, more detail is needed about individual datasets then more comprehensive and more specific metadata is required. It is possible that organisations may wish to develop metadata at different but complementary levels - at one level discovery metadata for external use and for in-house / internal use more detailed metadata. And to avoid duplication of effort those elements common to both are flagged.
The level of metadata detail that will be documented is dependent on the type of data held and the methods that it is being accessed and used. Different types of data (e.g. vector ,raster, textual, imagery, thematic, boundary, polygon, attribute, point, etc.) will require different levels and forms of metadata to be collected. However there is still a high degree of compatibility between most of the metadata elements required.
Similarly organisations will look at different ways to manage their data. Some organisations manage information as a dataset, tiles of datasets, series of datasets, or manage the information down to the feature level. Again their is still a high level of compatibility between the levels of metadata required, particularly as the data is cascaded from the feature level to the dataset or data series level.
Geospatial metadata is also documented for three complementary purposes:
a) Data Inventory - to enable organisations to know and publicise what data
holdings they have.
b) Data Transfer - as documentation to be provided wit the data to ensure that
others use the data correctly and wisely.
c) Data Management - to enable an organisation to effectively store, reuse,
maintain and archive their data holdings.
Each of these purposes, while complementary, requires different levels of information. As such organisations should look at their overall needs and requirements before developing their metadata systems. The important aspect is for agencies to establish their business requirements first, the content specifications second and the technology and implementation methods third.
What form should the metadata take
The form for maintaining metadata will depend on a number of factors
If the metadata holdings are fairly modest then it is common to store the metadata in discrete documents by using any tool that is currently used (e.g. word-processor, spreadsheet, and simple database). Commonly organisations have built up folders of single documents that may be in either paper or digital formats. Many organisations will start to investigate the use of more complex systems as they realise the benefit of the metadata, and as they gain greater data holdings and start to provide broader access to the data.
Indeed many organisations will start with a basic audit of their data holdings which will alert them of the vast wealth of data that they indeed possess and where it is being used, replicated or improved across the organisation. As the data holdings become larger and the access to the data distributed, then organisations would look at more advanced methods for maintaining metadata of their data holdings. These advanced tools may consist of commercial or self-developed forms based systems that may also form part of the operational GI systems to extract aspects of the metadata automatically from the data itself.