Home

Project Info

The University of Illinois Open Archives Initiative Metadata Harvesting Project

Normalization of the Date and temporal aspect of the Coverage element:

Dublin Core definitions of date and coverage elements:

From the Dublin Core Metadata Initiative (DCMI) website:

Name: Date
Definition:
A date associated with an event in the life cycle of the resource.
Comment:
Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 and follows the YYYY-MM-DD format.

Name:
Coverage
Definition:
The extent or scope of the content of the resource.
Comment:
Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges. The two main places in which temporal information has been mapped are the date and coverage elements. The coverage element can hold a range of information including spatial, jurisdiction, and temporal information. It is the use of the coverage element for temporal information that will be discussed here. The DCMI distinguishes between the date and the temporal aspect of the coverage elements as follows:
The date element is associated with the life cycle of the resource itself (when it was created, when it was digitized, when it was published, etc.). The coverage element is associated with the 'temporal period' of the content of the resource (the era that a photograph is from -- World War II, for example). The recommended best practice for the date element is to use the YYYY-MM-DD format. The recommended best practice for the temporal aspect of the coverage element is to use a controlled vocabulary with named time periods rather than date ranges.

Analysis of the use of the date and coverage elements in the Illinois project:

The analysis of the Illinois data providers indicated a wide variability in what elements were used (in some cases coverage and date were used interchangeably), what values were used (date created, date published, date digitized), and what format the values were in. Table 1 lists the data providers, what elements used, the values used, and the format of the values. Table 2 is an aggregate view of the use of the date and coverage fields. The date field was used much more often than the coverage field for any sort of temporal information. Over 95% of the data providers used the date field to indicate some sort of temporal value. 16% used the coverage field to indicate a temporal value. 16% have used both the coverage and date elements. 5% of the date providers use neither field.

The values used can be broken down into the following categories: Date of creation (the date a photo was taken or an event took place) Date of publication or copyright Date of collection (when added to the library, museum, or archive collection) Date of digitization (date photo was scanned) Date of metadata creation Temporal period covered by item Any of these can appear in either the date or coverage element or both. Chart 1 (below) shows the values used in the date and coverage elements by all data providers.

87% of the data providers provided information about when an item was created, 8% when the item was digitized, 3% when the item was collected, 3% when the metadata was created, and 21% provided information about the temporal period or coverage of the item.

The format of the values ranges from very specific dates (2000-07-04, June 18, 2002) to date ranges (1834-1900) to general terms designating temporal periods (roman or medieval). Other variations included how specific dates were represented (numbers only or alphanumerical representation).

Several data providers included different types of temporal information in the date element. For instance, the Celebration of Women Writers included both the date an item was digitized and the publication or copyright date of an item. CIMI and the Colorado Digitization Project, two aggregators of metadata (CIMI includes metadata from approximately 480 different institutions and the Colorado Digitization Project includes metadata from 17 different institutions), each had three different types of temporal values in the date element.

Normalization of temporal information in the date and coverage elements:

Considerations:

Given the variation of the elements and formats used by data providers and the perception that the ability to limit by date is of importance to users, the decision was made to attempt to normalize the temporal information in the date and coverage fields. There were several issues to consider:

Decisions:

The Illinois project made the following decisions about the normalization process for temporal information in the date and coverage fields.

Process:

Normalization terms were listed for use by the research programmer. See Appendix 1. Using the analysis of what values were in the date and coverage field, the programmer developed normalization scripts for each separate collection. This normalization script is applied after the metadata has been harvested and prior to indexing. As new collections are added, an analysis is done of the content and a normalization script is added.

Appendix 1: Terms used for normalizing the date element

2000-

1975-1999

1950-1974

1925-1949

1900-1924

1875-1899

1850-1874

1825-1849

1800-1824

1700-1799

1600-1699

1500-1599

1400-1499

1300-1399

1200-1299

1100-1199

1000-1099

500-999

0-499

B.C.

B.C.E.

21st century

21st c.

twenty first century

20th century

20th c.

twentieth century

19th century

19th c.

nineteenth century

18th century

18th c.

eighteenth century

17th century

17th c.

seventeenth century

16th century

16th c.

sixteenth century

15th century

15th c.

fifteenth century

14th century

14th c.

fourteenth century

13th century

13th c.

thirteenth century

12th century

12th c.

twelfth century

11th century

11th c.

eleventh century

10th century

10th c.

tenth century

early 20th century = 1900-1939

mid 20th century = 1940-1960

late 20th century = 1961-1999

(and so on)

Posted 7-22-02
Sarah Shreeves