Home

The University of Illinois Open Archives Initiative Metadata Harvesting Project

Summary of OAI Protocol for Metadata Harvesting

 

This document provides an overview of important OAI-PMH definitions and concepts.  For complete, official, & up-to-date OAI-PMH see http://www.openarchives.org/OAI/openarchivesprotocol.html

 

I.                    Basic Definitions:

Repository

An OAI-Compliant Repository is a network-accessible server to which OAI Requests, embedded in HTTP, can be submitted.  OAI-Compliant Repositories may be registered with a central OAI Registration Authority.

Request

An OAI Request may be expressed using either the HTTP GET or POST methods.  All OAI Requests of a given repository are submitted to the single Base-URL for that repository and consist of a list of arguments in the form of key=value pairs.  One key will always be an OAI Verb.  The other keys will vary by OAI verb and the specific nature of the request.  All keys and most values are case-sensitive.

Response

An OAI Response is the XML-encoded byte stream, embedded in HTTP, which is returned by a repository in response to an OAI Request.  The HTTP status line and HTTP headers accompanying an OAI response may be used by an OAI repository to indicate exception conditions.  OAI responses must be valid XML (other than exception condition responses which may be only well-formed XML).

Record

An OAI Record is a <record> node of an OAI Response as returned by a repository to satisfy an OAI Request for metadata describing an item or items in that repository.  Each OAI Record consists of 2 required nodes, <header> and <metadata>, and 1 optional node, <about>.

     Identifier

An OAI Record Identifier is a persistent, repository-unique key used to extract and identify a specific OAI Record held by a repository.  If the repository is registered, all OAI Record Identifiers for that repository will be unique across the entire registered OAI namespace.  However, the same metadata content may be associated with multiple OAI Record Identifiers (e.g., if the same metadata content is held by multiple repositories).  To be valid, an OAI Record must include its OAI Record Identifier in its <header> node.

     Datestamp

An OAI Record Datestamp gives the date of creation, deletion, or last modification of the <metadata> node contained in that OAI Record.  It is a date only; no clock time is included.  To be valid, an OAI Record must include an OAI Record Datestamp in its <header> node.

Set

An OAI Set is an optional construct for grouping items in a repository for the purpose of selective harvesting of records.  OAI Sets may be hierarchical (if so, members of child sets are also retrieved as part of parent set).  set” is an optional key for some OAI Verbs.

Metadata Prefix

metadataPrefix” is a required key for certain OAI Verbs.  It is used to specify the XML schema of the OAI Response <metadata> node(s) returned from a repository to satisfy an OAI Request.  Currently all fully compliant OAI repositories must support the “oai_dc” Metadata Prefix for all non-deleted OAI Record Identifiers contained in the repository.

Flow Control

Repository resource use may be managed in 2 ways.  A repository may chunk a long response to an OAI Request.  When using this method, a repository includes a <resumptionToken> node as part of its OAI Response. To retrieve the next chunk of an OAI Response, a harvest service will include this resumptionToken value as part of its next OAI Request.  Repositories also may return a HTTP status of 503 (Service Unavailable) as a way to manage flow control.  When returning a status of 503, the repository must also a return a “Retry-After” HTTP response header.  OAI-compliant harvest services must respect this header value.


 

II.                  Verbs Used in OAI Requests:

 

Identify

This verb is used to retrieve information about a Repository.  No added arguments are allowed for this verb.  An Identify Response includes the base-URL of the repository, the OAI protocol version supported, the repository name, and the email address of the repository administrator.  Additional human-readable and community-specific descriptive information about the repository also may be provided.

ListMetadataFormats

This verb is used to retrieve the Metadata Formats available from a Repository or for a particular Record.  Not all records in a repository need be available in all formats.  The only allowed optional argument is identifier (used to find available metadata formats for a particular record).  A ListMetadataFormats Response includes metadata prefix, namespace (optional), and XSD (for validation) for each metadata format available.  If the identifier specified in the request is not available, the request does not generate an error response (rather the response simply contains no metadata formats).

ListSets

This verb is used to retrieve the Set structure of a Repository.  The only allowed optional argument is resumptionToken.  A ListSets Response includes setSpec (string used as value for optional “set” key argument allowed with verbs ListIdentifiers and ListRecords) and setName (human-readable string useful for display purposes).  The required syntax for construction of the setSpec reveals hierarchical relationship to parent sets (if any).  If a repository has no set structure, a valid, non-error response is returned containing no information about any sets.

ListIdentifiers

This verb is used to retrieve the identifiers of records that can be harvested from a Repository.  Allowed optional arguments are until, from, and set.  until and from are used to limit retrieval by date, while set limits retrieval by set.  resumptionToken is also an allowed optional argument, but may not be use in combination with any other.  Any identifiers that match the limit criteria are returned.  Deleted identifiers that match limits are returned with their XML status attribute set to “deleted”.  Return order of identifiers is arbitrary and entirely up to the repository (may vary request to request).  An empty list is a valid response.

GetRecord

This verb is used to retrieve an individual Record from a Repository.  Required arguments are identifier and metadataPrefix.  There are no optional arguments.  A GetRecord Response will return a record containing a metadata node in the requested format, if available.  If identifier is not available, no <record> node is included in the response.  If identifier is valid but not available in requested format, no <metadata> node is included in the <record> node returned.

ListRecords

This verb is used to harvest multiple Records from a Repository.  metadataPrefix is a required argument (except when resumptionToken is used).  until, from, set, and resumptionToken are optional arguments as with ListIdentifiers.  All records in the repository that match the limits specified are retrieved.  A <record> node for a deleted record contains no <metadata> node and includes an XML status attribute set to “deleted”.   A <record> node for a record not available in requested metadata format contains no <metadata> node.  Return order of records is arbitrary and entirely up to the repository (may vary request to request).  An empty list is a valid response.

Document

This verb is not part of the OAI PMH, but is often implemented to facilitate testing of the repository using an XML-aware Web browser.  There are no optional arguments.

 

All verbs are case-sensitive.

 

Timothy W. Cole, University of Illinois at UC
17 September 2001

 

  University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign
Library Gateway Homepage
Comments to: Tom Habing
Updated on: 9-16-01 TWC