5. NGDF METADATA Transfer Format
5.1 Introduction
The NGDF Metadata transfer format sets out a definition for how the encoding of the 42 metadata elements in the Draft Guidelines for NGDF Discovery Metadata should take place. The transfer format utilises the syntax and features of Extensible Markup Language (XML) Version 1.0 as defined by the World Wide Web Consortium (W3C). Extensible Markup Language is a restricted form of SGML, the Standardised General Markup Language. SGML is an ISO standard for marking up documents that has been widely used in government and industry. XML documents are conforming SGML documents.
5.2 The Purpose of a Transfer Format
The purpose of the transfer format is to enable the conveyance of NGDF compliant metadata to take place between data suppliers and service providers. The transfer format may also be used for the onward transmission of metadata between any other parties and could be used in certain circumstances as a means of transferring metadata between service providers and end users, although the normal protocol here will be through Z39.50. The transfer format is largely intended for the transmission of whole, or substantial portions of, metadata sets.
5.3 Additional Resources
The transfer format is predicated on a Document Type Definition (DTD) that is available via the World Wide Web. Users of the format should note from the example that they will be required to reference this document in any discovery metadata descriptions that are coded using the transfer format. Details of how this should be done are given in Appendix D.
5.4 XML Document Type Definition
This section defines the XML Document Type Definition (usually abbreviated to DTD) for the NGDF Metadata transfer format. A Document Type Definition is a formal explanation of how the components that make up an XML document are defined and structured. It is expressed in a machine-readable form as a series of declarative statements using a simple syntax.
The DTD defines how the elements in a NGDF Metadata document fit together, which ones are compulsory and which ones are not, how many times different elements can be referenced and what data can be entered against each metadata element. A more comprehensive explanation of DTD formatting is given below. The DTD is read and used by indexing programs to check that all the required elements are present and correctly ordered. It can also be used by display programs like web browsers to present the XML information to the user.
5.5 Defining Metadata Elements in the DTD
Each of the 42 NGDF metadata elements is formally defined as an ELEMENT in the DTD. An XML element has a name and a formal specification of the type of information that it can hold. For example, the following line shows the XML DTD element definition for the Title element of NGDF Metadata.
<!ELEMENT title (#PCDATA)>
This statement informs us that the name of the element is "title", and that the value it can take is any set of free text. The term "PCDATA" stands for "Parsed Character Data".
In some cases, section 0 specifies that the possible values that an element can take are restricted to a controlled set. The System of Spatial Referencing by Coordinates (Element 17) can take only three possible values : "British National Grid", "Irish National Grid" and "Latitude and Longitude".
In the DTD, this restriction is encoded using an attribute list. Attribute lists define a prescribed set of values that the XML element can take. The following example shows the prescribed set of attributes for NGDF Metadata Element 12, Access Constraints.
<!ATTLIST access
accrestrict (Financial|Legal|Other|Not Known|None) >
This statement tells us that the ELEMENT called "access" will be assigned one of five possible values from an attribute list called "accrestrict" instead of being defined in its own right.
5.6 Specifying the XML Version
All XML documents must include a statement defining which version of XML has been used to encode the document. Presently, there is only one version of XML, release 1.0. The statement to specify this XML version would be encoded thus:
<?xml version="1.0"?>
5.7 Referencing the NGDF Metadata DTDs
XML documents that are compliant with this transfer format will be required to utilise the customised tag names, attribute definitions and syntax that are defined in the DTD. Users will therefore need to include a reference to the DTD in each of their Discovery Metadata XML documents. This can be achieved two ways :
By including a pointer in the Discovery Metadata XML document to the URL for the online version of the DTD. This is available on the NGDF web server at <www.ngdf.org.uk/ngdfdm.dtd>. An XML capable browser will then reference this document and use it to read the Discovery Metadata.
By downloading the DTD and storing it on a local server. The local copy of the DTD can then be used to read the discovery metadata.
5.8 An Example of XML Encoding
Figure 1, below, gives a simple illustration of how an XML document is constructed. It shows how the Document Type Definition reference and XML version are encoded, and the basic syntax that NGDF Discovery Metadata XML documents use.
<?xml version="1.0"?>
<!DOCTYPE NGDFDiscoveryMetadata SYSTEM "www.ngdf.org.uk/ngdfdm.dtd">
<NGDFDiscoveryMetadata>
<abstract>
large scale digital topographic mapping for site location, planning
and land and asset management. Land-Line contains 30 feature
codes and six text categories. Land-Line.Plus has additional
landscape features represented by a further 27 feature codes.
Standard map tile coverage 500 x 500 m in urban areas, 1 km x
1 km in rural areas and 5km x 5km in moorland areas.
</abstract>
<syscoord system ="britishnationalgrid" />
</NGDFDiscoveryMetadata>
Figure 3
All NGDF Metadata documents must include two lines at the beginning specifying the XML version and the source DTD that has been used for encoding. In Figure 3, line one specifies the XML version that has been used while line two specifies the Document Type Definition that this document uses and gives its location in a machine and human-parseable manner.
The metadata itself consists of a list of elements, which may be composed of other elements. An NGDF Discovery Metadata XML document is defined as including a range of 42 possible elements, which occur in a specific order. Some of these elements are compulsory. Some can occur more than once. Each element is represented in the XML document by a start tag, the data contained in the element, and the end tag. The format for this is as follows:
<element name> (The start tag)
Information contained in the element ..
</element name> (The end tag)
Where an element is composed of other elements (like the Discovery Metadata as a whole), the format for encoding is as follows :
<element name> (Start Tag)
<sub element name> (Start Tag)
Sub element data .
</sub element name> (End Tag)
<sub element name> (Start Tag)
Sub element data .
</sub element name> (End Tag)
</element name> (End Tag)
Figure 1 shows how this structure is applied. In a valid NGDF Metadata document, all of the necessary sub elements would have to be included for the document to be acceptable.
5.9 Specifying Controlled Attributes
In some cases, the Guidelines state that the possible entries relating to an element are restricted to a particular set of responses. The Document Type Definition uses an Attribute List to define what these are. In the NGDF Metadata Guidelines, the System of Spatial Referencing by Coordinates (Element 17) can take only three possible values : "British National Grid", "Irish National Grid" and "Latitude and Longitude".
It should be noted that in an attribute definition, there is no requirement for start and end tags for the element that the attribute describes. This is because the element itself is declared as EMPTY. In an XML document, the value of the attribute is defined as follows :
<element_name attribute_name="attribute_value" />
Here, the start and end tags are replaced by the syntax <element_name />. The /> symbol effectively replaces the end tag.
Figure 1 shows how a value of "British National Grid" for Element 17 would be encoded according to the Discovery Metadata DTD.
5.10 Entity Declarations
The DTD contains formatting information for symbols that might occur in an NGDF Discovery Metadata document. This information is encoded using ENTITY references which occur at the beginning of the DTD prior to the encoding of elements and attributes.
5.11 DTD for NGDF Discovery Metadata
This part of the document presents a formal XML 1.0 Document Type Definition for NGDF Discovery Metadata. This DTD forms the core of the transfer standard.
<!DOCTYPE NGDFDiscoveryMetadata [
<!-- NGDF Discovery Metadata DTD 1.1 19991003
This is the Document Type Definition for discovery metadata conforming to the UK National Geospatial Data Framework Discovery Metadata Guidelines. This DTD corresponds to Version 1.1 (March 1999) and 1.2 (October 1999) of the Standard. The DTD is for use with XML 1.0 compliant viewers and word processors.
Three special characters in the DTD are defined as follows :
+ - This element must be present and can occur one or more times.
? - This element can be absent, and can occur only once if present.
* - This element can be absent, or can occur one or more times.
Where these special characters are absent, the element must be specified and can occur only once.
The values of metadata elements that are not compound are here declared as PCDATA following the example of the US Federal Geographic Data Committee so that parsers will recognise and support entities representing special characters.
Reference : NGDF Working Group 2 - NGDF Discovery Metadata Guidelines, March 1999.
DTD Ref : ngdfdm.dtd
DTD Document URL : www.ngdf.org.uk/ngdfdm.dtd
Authors : Martin Ralphs and Paul Miller, for NGDF Working Group 2 -->
<!ENTITY lt "&#60;">
<!ENTITY gt ">">
<!ENTITY amp "&#38;">
<!ENTITY apos "'">
<!ENTITY quot """>
<!ENTITY trd "®"> <!-- trademark symbol -->
<!ENTITY lbs "£"> <!-- pound sterling sign -->
<!ENTITY cpy "©"> <!-- copyright symbol -->
<!ENTITY deg "°"> <!-- degree symbol -->
<!ENTITY plm "±"> <!-- plus or minus symbol -->
<!ENTITY s2 "²"> <!-- superscript two -->
<!ENTITY at "@" > <!-- commercial at symbol, used in email -->
<!ELEMENT NGDFDiscoveryMetadata (title , alttitle* , originat* , abstract , capture? , present* , access* , usecon? , keywords , geogext+ , spatrefsys+ , spatdet? , media* , dataform* , addinfo* , associat* , supplier+ , update , sample*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT alttitle (#PCDATA)>
<!ELEMENT originat (#PCDATA)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT capture (startstat , stdate , endstat , enddate , frequpdt )>
<!ELEMENT present EMPTY>
<!ATTLIST present
presentation (image|graphic|map|numeric|text) #REQUIRED>
<!ELEMENT access EMPTY>
<!ATTLIST access
accrestrict (financial|legal|other|notknown|none) #REQUIRED>
<!ELEMENT usecon (#PCDATA)>
<!ELEMENT keywords (#PCDATA)>
<!ELEMENT geogext (coordref* , geogref* )>
<!ELEMENT spatrefsys (#PCDATA)>
<!ELEMENT spatdet (#PCDATA)>
<!ELEMENT media EMPTY>
<!ATTLIST media
mediatype (paper|magnetic|optical|online|other) #REQUIRED >
<!ELEMENT dataform (#PCDATA)>
<!ELEMENT addinfo (#PCDATA)>
<!ELEMENT associat (#PCDATA)>
<!ELEMENT supplier (cname , postaddr , postcode , telno? , faxno? , email? , webaddr? )>
<!ELEMENT update (#PCDATA)>
<!ELEMENT sample (#PCDATA)>
<!ELEMENT coordref (syscoord+ , boundrec+)>
<!ELEMENT geogref (natext* , adminext* , pcodeext*)>
<!ELEMENT startstat EMPTY>
<!ATTLIST startstat
ststat (known|notknown|notapplicable) #REQUIRED>
<!ELEMENT stdate (#PCDATA)>
<!ELEMENT endstat EMPTY>
<!ATTLIST endstat
edstat (known|notknown|notapplicable|ongoing) #REQUIRED>
<!ELEMENT enddate (#PCDATA)>
<!ELEMENT frequpdt EMPTY>
<!ATTLIST frequpdt
frequency nnially|triennally|quinquennially|decennially|continuous|irregular|never|other) #REQUIRED>
<!ELEMENT syscoord EMPTY>
<!ATTLIST syscoord
system (britishnationalgrid|irishgrid|latitudeandlongitude) #REQUIRED>
<!ELEMENT boundrec (westext? , eastext? , northext? , southext?)>
<!ELEMENT westext (#PCDATA)>
<!ELEMENT eastext (#PCDATA)>
<!ELEMENT northext (#PCDATA)>
<!ELEMENT southext (#PCDATA)>
<!ELEMENT natext EMPTY>
<!ATTLIST natext
natextent northernireland|scotland|wales|isleofman|channelislands|offshore|unitedkingdom) #REQUIRED>
<!ELEMENT adminext (#PCDATA)>
<!ELEMENT pcodeext (#PCDATA)>
<!ELEMENT cname (#PCDATA)>
<!ELEMENT postaddr (#PCDATA)>
<!ELEMENT postcode (#PCDATA)>
<!ELEMENT telno (#PCDATA)>
<!ELEMENT faxno (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT webaddr (#PCDATA)>
]>