4. Preparing Discovery Metadata
This Section gives a full exposition on the application of the Guidelines. For each metadata element an example of its application is given together with additional comments and/or explanation where necessary. The examples are drawn from various datasets to illustrate each distinct metadata element and therefore do not form a coherent whole. Worked examples of complete metadata sets are included in Appendix A.
It is strongly recommended that an entry be provided for all discovery metadata elements. If a particular piece of information is not known or not available, saying so is more valuable than leaving the entry empty. Leaving the entry empty could mean that the data provider had just neglected to provide relevant information.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Title |
1 |
Name by which dataset is known |
M |
1 |
Text |
Free text |
4.1.1 Example
|
Element Name |
Identifier |
Entry |
|
Title |
1 |
Lexicon of Named Rock Unit Definitions |
|
|
|
|
4.1.2 Comments
This element is used to provide the title of the dataset, that is, the title by which the dataset is known in the public domain. The title should be short, typically not more than 50 characters and should not include terms or jargon that would render it incomprehensible by the general user.
Where there is a possibility of multiple titles then the name by which the dataset is most commonly known should be given and the others provided as Alternative Titles .
Where no title exists that is intelligible to external users, the data producer is encouraged to create a title by which it will be known externally from this point on. Titles should encapsulate the subject, temporal and spatial coverage of the dataset, e.g. Voter Participation in Liverpool Local Elections, 1994. Where a title has an acronym in common use, or where a discipline specific name exists, it should appear as an Alternative Title.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Alternative Title |
2 |
Short name, other name, acronym or alternative language title |
O |
N |
Text |
Free text or "Not Applicable" |
4.2.1 Example
|
Element Name |
Identifier |
Entry |
|
Alternative Title |
2 |
NRUD |
|
Alternative Title |
2 |
Stratigraphical Lexicon |
|
|
|
|
4.2.2 Comments
This may be used for any additional title(s) under which the dataset may be known, including other language translations. The Alternative Title should be short, typically not more than 50 characters. Where a dataset title has an acronym in common use, or where a discipline specific name exists, it should appear under this element. If there is no alternative title then enter "Not Applicable". More than one instance of this element is allowed. If multiple instances are desired then each instance should have a separate entry as shown in the example.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Originator |
3 |
Person or organisation having primary responsibility for the intellectual content of the data |
O |
N |
Text |
Free text or "Not Known" |
4.3.1 Example
|
Element Name |
Identifier |
Entry |
|
Originator |
3 |
Heath, A., Nuffield College (Oxford) |
|
Originator |
3 |
Jowell, R., Social and Community Planning Research |
|
Originator |
3 |
Curtice, J.K., University of Strathclyde |
|
Originator |
3 |
Brand, J.A., University of Strathclyde |
|
Originator |
3 |
Mitchell, J.C., University of Strathclyde |
4.3.2 Comments
This element should contain information about the person or body with prime responsibility for the intellectual content of the data. This is not necessarily the person or organisation named as contact for the data, or indeed, the person or organisation responsible for publishing the data. The originator(s) are not necessarily those who own the intellectual property rights in the dataset.
More than one instance of this element is allowed. If multiple instances are desired then each instance should have a separate entry as shown in the example.
For many government datasets it will be the originating department that has responsibility. The general rule is that where a work is produced as a result of the policy of a corporate body and at the expense of that corporate body, the corporate body has prime responsibility, not the individual who actually produced it.
|
Element NameObligation |
Maximum Occurrence |
Data Type |
Domain |
||||
|
Abstract |
4 |
Brief narrative summary of the dataset |
M |
1 |
Text |
Free text |
|
4.4.1 Example
|
Element Name |
Identifier |
Entry |
|
Abstract |
4 |
Large scale digital topographic mapping, detailed framework mapping for site location, planning and land and asset management. |
|
|
|
|
4.4.2 Comments
This element is used to provide a brief description of the dataset. It should include the content of the dataset and the purpose and/or applications for which the dataset was produced. Whilst there is no maximum length for this description it should be concise and ideally under 100 words in length and not normally longer than 300 words.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Data Capture Period |
5 |
The period during which data capture took place |
O |
1 |
||
4.5.1 Example
|
Element Name |
Identifier |
Entry |
|
|
|
|
|
Data Capture Period |
5 |
No data entry required here |
|
|
|
|
4.5.2 Comments
Element 5 acts as a heading to the following 5 elements (elements 6 to 10) and so no metadata entry is required against this element. It is used to show that if data capture period data is available then the following five elements are required.
If the start and/or end dates of capture are not known, not applicable or if data capture is ongoing then this fact cannot be conveyed by using a date data type. This information can be indicated by elements 6 and/or 8.
If the dataset is an amalgamation of a number of datasets with different start and end dates of capture then the overall first and last dates should be given.
4.6 Status of Start Date of Capture
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Status of Start Date of Capture |
6 |
Status of knowledge on Start Date of Capture |
O |
1 |
Enumerated List |
Known Not Known Not Applicable |
4.6.1 Example
|
Element Name |
Identifier |
Entry |
|
|
|
|
|
Status of Start Date of Capture |
6 |
Known |
|
|
|
|
4.6.2 Comments
It is possible that for some datasets the start date of capture is not known in which case this fact should be indicated here.
In the case of a dataset that has been continuously updated for a number of years and does not contain an historic record, as in the case of the Ordnance Survey Land-Line® dataset, then the term "Not Applicable" may be used.
If the start date is known then the word "Known" should be entered here and the actual date provided in via element 7.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Start Date of Capture |
7 |
Date on which data was first collected |
C |
1 |
Date |
Date |
4.7.1 Example
|
Element Name |
Identifier |
Entry |
|
|
|
|
|
Start Date of Capture |
19850323 |
|
|
|
|
|
4.7.2 Comments
This element is used to specify the date on which data in the set was first collected and is conditional on the entry to element 6 being "Known". This date refers to the actual data capture process and does not relate to the date when the data was first made available to the general public. The date given should be in the format, CCYYMMDD. Where any part of the date is not known it should be replaced with asterisks. For example if data capture started in March 1985, as above, but the day is not known then the date entry should be 198503**.
4.8 Status of End Date of Capture
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Status of End Date of Capture |
8 |
Status of knowledge on End Date of Capture |
O |
1 |
Enumerated List |
Known Not known Not Applicable Ongoing |
4.8.1 Example
|
Element Name |
Identifier |
Entry |
|
|
|
|
|
Status of Date of Capture |
8 |
ongoing |
|
|
|
|
4.8.2 Comment
It is possible that for some datasets the end date of capture is not known in which case this fact should be indicated here.
In the case that the dataset is still being updated then "Ongoing" should be entered and the periodicity of update provided via element 10.
If the start date is known then the word "Known" should be entered here and the actual date provided via element 9.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
End Date of Capture |
9 |
Date on which data was last collected |
C |
1 |
Date |
Date |
4.9.1 Example
|
Element Name |
Identifier |
Entry |
|
End Date of Capture |
19960128 |
|
|
|
|
|
4.9.2 Comments
If data capture has ended then this element is used to provide the date on which data in the set was last collected and is conditional on the entry to element 8 being "Known". The date given should be in the format, CCYYMMDD. Where any part of the date is not known it should be replaced with asterisks. For example where data capture ended in January 1996, but the day is not known then the date entry should be 199601**.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Frequency of Update |
10 |
Revision regime of dataset |
O |
1 |
Enumerated List |
Hourly Daily Weekly Fortnightly Monthly Quarterly Biannually Annually Biennially Triennially Quinquennially Decennially Continuous Irregular Never Not Known Other |
4.10.1 Example
|
Element Name |
Identifier |
Entry |
|
Frequency of Update |
10 |
Quarterly |
|
|
|
|
4.10.2 Comments
If data capture still continues, then the period of update should be given here. Where a dataset is updated continuously and the user would obtain the data in whatever state it happened to be on the date of purchase then enter "Continuous". If the data continues to be updated but there is no regular period of update, enter "Irregular"
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Presentation Type |
11 |
Form in which the data resource is available |
O |
N |
Enumerated List |
Image Graphic Map Numeric Text Other |
4.11.1 Example
|
Element Name |
Identifier |
Entry |
|
Presentation Type |
11 |
Numeric |
|
|
|
|
4.11.2 Comments
This element can be used multiple times to specify the forms in which the data is available. The allowable types are:
|
Photographic or, satellite image |
|
drawing |
|
data presented in map form |
|
set of numbers or statistical data |
|
free text |
More than one instance of this element is allowed. If multiple instances are desired then each instance should have a separate entry as shown in example 4.2.1.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Access Constraint |
12 |
Restrictions and legal prerequisites for accessing the dataset |
O |
N |
Enumerated List |
Financial Legal Other Not Known None |
4.12.1 Example
|
Element Name |
Identifier |
Entry |
|
Access Constraint |
12 |
none |
|
|
|
|
4.12.2 Comments
Use this element to define any restrictions placed on access to the data. As well as financial and legal constraints relating to the data this might be used to specify categories of people who are/are not allowed access to the data. A financial constraint would imply that the data had to be purchased. A legal constraint implies that some security or contractual constraint applies to the dataset.
More than one instance of this element is allowed. If multiple instances are desired then each instance should have a separate entry as shown in example 4.2.1.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Use Constraints |
13 |
Restrictions and legal constraints on using the data |
O |
1 |
Text |
Free text Not Known |
4.13.1 Example
|
Element Name |
Identifier |
Entry |
|
Use Constraints |
13 |
© Natural Environmental Research Council |
|
|
|
|
4.13.2 Comments
When copyright or other legal restrictions protect the data this element is used to define the terms of the constraint including re-use fees, reproduction restrictions and acknowledgements. The actual fees payable would not be provided here, just the fact that a fee would be payable. If the use constraints are unknown then enter "Not Known".
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Keywords |
14 |
Common-use words or phrases summarising the subject of the dataset |
M |
1 |
Text |
NGDF list Free text |
4.14.1 Example
|
Element Name |
Identifier |
Entry |
|
Keywords |
14 |
PHOTOGRAPHY AND IMAGERY Satellite panchromatic, 10m resolution |
|
|
|
|
4.14.2 Comments
The entry for this element should take the form of a comma-separated list of words or phrases that describe the subject of the dataset. Comma-separation will enable computer systems to analyse the keyword entry when performing metadata searches.
NGDF has developed a prescribed list of keywords to assist data providers and ensure searchers accurately and consistently find relevant records. As a minimum, one NGDF keywords should be selected, with data providers able to supplement the metadata with other local keywords. The list provides a theme keyword (upper case) as a mandatory requirement with the second order word (lower case) as optional. The NGDF pick-list could also be enhanced or expanded by NGDF as demand dictated especially with the introduction of new themes of metadata.
Keywords should be taken from an identifiable thesaurus of keywords. A recommended on-line thesaurus is the Humanities and Social Science Electronic Thesaurus (HASSET) which is maintained by the Data Archive at the University of Essex. The URL for this service is dawww.essex.ac.uk/services/nhasset.html..
|
NGDF Keyword |
NGDF Modifier (optional) |
|
AGRICULTURE |
Crops, Horticulture, Irrigation, Livestock |
|
ATMOSPHERE |
Air Quality, Greenhouse, Ozone, Pressure |
|
BOUNDARIES |
Administrative, Biophysical, Cultural |
|
CLIMATE AND WEATHER |
Climate change, Drought, Extreme weather events, Meteorology, Radiation, Rainfall, Temperature |
|
DEMOGRAPHY |
|
|
DISEASE |
|
|
ECOLOGY |
Community, Ecosystem, Habitat, Landscape |
|
ENERGY |
Coal, Electricity, Petroleum, Renewable, Use |
|
FAUNA |
Exotic, Insects, Invertebrates, Native, Vertebrates |
|
FISHERIES |
Aquaculture, Freshwater, Marine, Recreational |
|
FLORA |
Exotic, Native |
|
FORESTS |
Agriforestry, Natural, Plantation |
|
GEOSCIENCES |
Geochemistry, Geology, Geomorphology, Geophysics, Hydrogeology |
|
HAZARDS |
Cyclones, Drought, Earthquake, Fire, Flood, Landslip, Manmade, Pests, Severe, local storms, Tsunamis |
|
HEALTH |
|
|
HUMAN ENVIRONMENT |
Economics, Housing, Livability, Planning, Structures and Facilities, Urban Design |
|
INDUSTRY |
Manufacturing, Mining, Other Primary Service |
|
LAND |
Cadastre, Cover, Geodesy, Geography, Ownership, Topography, Use, Valuation |
|
MARINE |
Biology, Coasts, Estuaries, Geology and Geophysics, Human Impacts, Meteorology, Reefs |
|
MINERALS |
|
|
MOLECULAR BIOLOGY |
Genetics |
|
OCEANOGRAPHY |
Chemical, Physical |
|
PHOTOGRAPHY AND IMAGERY |
Aerial, Remote Sensing, Satellite |
|
POLLUTION |
Air, Noise, Soil, Water |
|
SOIL |
Biology, Chemistry, Erosion, Physics |
|
TRANSPORTATION |
Air, Land, Marine |
|
UTILITIES |
|
|
VEGETATION |
Floristic, Structural |
|
WASTE |
Greenhouse gas, Heat, Liquid, Sewage, Solid, Toxic |
|
WATER |
Groundwater, Hydrochemistry, Hydrology, Lakes, Quality, Rivers, Salinity, Supply, Surface, Wetlands |
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Geographic extent |
15 |
Geographic area or areas for which data is available |
M |
1 |
||
4.15.1 Example
|
Element Name |
Identifier |
Entry |
|
Geographic extent |
15 |
No entry required here |
|
|
|
|
4.15.2 Comments
Element 15 acts as a heading to the following two groups of elements (elements 16 to 22 and elements 23 to 26) and so no metadata entry is required against this element. It is used to show that it is mandatory to provide information that defines the geographic extent covered by the dataset. The method used to define the extent can be in terms of spatial referencing by coordinates e.g. "British National Grid" or by geographic identifiers e.g. "England", "Berkshire" or "SO16". If the spatial referencing is by coordinates then the area or areas bounding the geographic extent described by the data is defined.
Therefore, whilst it is mandatory to define a geographic extent, the type of information provided is conditional on whether spatial referencing by coordinates or geographic identifiers is used. Coordinate data is defined by elements 16 to 22 and geographic identifiers by elements 23 to 26.
If the geographic extent is to be defined by coordinates then this can be achieved by providing National Grid or latitude and longitude values. If the extent covers both Great Britain and Northern Ireland then two sets of National Grid coordinates will be required. That is one set referring to the British National Grid and one referring to the Irish Grid.
Where a dataset contains data relating to a number of geographically localised areas, for example several archaeological sites within a county, then at the Discovery Metadata level, the surrounding enclave i.e. the county should be used to define the extent either using coordinates or geographic identifiers (e.g. administrative areas). Details of the locations of each of the areas would be provided at the detailed metadata level. Where data relates to a number of areas that are widely separated e.g. Cornwall, Yorkshire and Kent then this can be indicated by reference to a number of enclaves or administrative areas. These would require a multiple entry.
It should be noted that the information provided in elements 16 to 22 concerns the definition of the extent of the dataset and not to any coordinate system relating to information within the dataset. Information concerning the contents of the dataset should be provided in element 27
4.16 Spatial Referencing by Coordinates
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Spatial Referencing by Coordinates |
16 |
Geographic extent referenced by use of coordinates |
C |
1 |
||
4.16.1 Example
|
Element Name |
Identifier |
Entry |
|
Spatial Referencing by Coordinates |
16 |
No entry required here |
|
|
|
|
4.16.2 Comments
This element acts as a sub-heading for the following 5 elements that can be used to define the geographic extent of the dataset by means of coordinates.
4.17 System of Spatial Referencing by Coordinates
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
System of Spatial Referencing by Coordinates |
17 |
Name or description of system or systems of spatial referencing by coordinates used for the geographic extent |
M |
N |
Enumerated List |
British National Grid, Irish Grid, Latitude and Longitude |
4.17.1 Example
|
Element Name |
Identifier |
Entry |
|
System of Spatial Referencing by Coordinates |
17 |
British National Grid |
|
|
|
|
4.17.2 Comments
Where coordinates are used to indicate the geographic extent of the data then the name of the system of spatial referencing should be selected from the following list:
More than one instance of this element is allowed. So if the supplier wishes to define the extent by both British National Grid values and by Latitude and Longitude values then elements 16 to 22 should be repeated in two groups.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Bounding Rectangle |
18 |
Rectangle defining an area of occurrence of the dataset |
M |
N |
||
4.18.1 Example
|
Element Name |
Identifier |
Entry |
|
Bounding Rectangle |
No entry required here |
|
|
|
|
|
4.18.2 Comments
This element acts as a sub-heading for the following 4 elements that can be used to define a rectangle that bounds the dataset.
If the data supplier wishes to give both Grid values and Latitude and Longitude values then elements 19 to 22 should be repeated for each method of extent definition.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
West Bounding Coordinate |
19 |
Westernmost coordinate of the limit of the area |
M |
1 |
Real |
Grid value / longitude |
4.19.1 Example
|
Element Name |
Identifier |
Entry |
|
West Bounding Coordinate |
0 |
|
|
|
|
|
4.19.2 Comments
This coordinate defines the western edge of the bounding rectangle within which the area or one of the areas of the dataset lies. The coordinate should be given either in metric units to the nearest kilometre or as a longitude to the nearest hundredth of a degree (0.01). Longitude values to the West should be expressed as negative numbers and those to the East as positive.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
East Bounding Coordinate |
20 |
Easternmost coordinate of the limit of the area |
M |
1 |
Real |
Grid value / longitude |
|
|
|
|
|
|
|
|
4.20.1 Example
|
Element Name |
Identifier |
Entry |
|
East Bounding Coordinate |
20 |
700 |
|
|
|
|
4.20.2 Comments
This coordinate defines the eastern edge of the bounding rectangle within which the area or one of the areas of the dataset lies. The coordinate should be given either in metric units to the nearest kilometre or as a longitude to the nearest hundredth of a degree. Longitude values to the West should be expressed as negative numbers and those to the East as positive.
4.21 North Bounding Coordinate
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
North Bounding Coordinate |
21 |
Northernmost coordinate of the limit of the area |
M |
1 |
Real |
Grid value / latitude |
4.21.1 Example
|
Element Name |
Identifier |
Entry |
|
North Bounding Coordinate |
21 |
1300 |
|
|
|
|
4.21.2 Comments
This coordinate defines the northern edge of the bounding rectangle within which the area or one of the areas of the dataset lies. The coordinate should be given either in metric units to the nearest kilometre or as a latitude to the nearest hundredth of a degree (0.01). Latitude values to the North should be expressed as positive numbers and those to the South as negative.
4.22 South Bounding Coordinate
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
South Bounding Coordinate |
22 |
Southernmost coordinate of the limit of the area |
M |
1 |
Real |
Grid value / latitude |
4.22.1 Example
|
Element Name |
Identifier |
Entry |
|
South Bounding Coordinate |
22 |
0 |
|
|
|
|
4.22.2 Comments
This coordinate defines the southern edge of the bounding rectangle within which the area or one of the areas of the dataset lies. The coordinate should be given either in metric units to the nearest kilometre or as a latitude to the nearest hundredth of a degree (0.01).
4.23 Spatial Referencing by Geographic Identifiers
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
Spatial Referencing by Geographic Identifiers |
23 |
Geographic extent referenced by use of geographic identifiers |
C |
1 |
||
4.23.1 Example
|
Element Name |
Identifier |
Entry |
|
Spatial Referencing by Geographic Identifiers |
23 |
No entry required here |
|
|
|
|
4.23.2 Comments
This element acts as a sub-heading for the following 3 elements that can be used to define the geographic extent of the dataset by means of geographic identifiers.
|
Element Name |
Identifier |
Element Definition |
Obligation |
Maximum Occurrence |
Data Type |
Domain |
|
|
National Extent |
24 |
Geographic extent or occurrence of dataset by parts of the United Kingdom |
C |
| |||