From oaibp

Jump to: navigation, search

Main Page >> Data Provider Implementations

[edit] OAI Identifiers

[edit] Protocol Definition

The OAI-PMH 2.0 specification requires that each item in the repository have a unique identifier <oai-identifier>.

The <oai-identifier> is used to harvest a metadata record disseminated from that item with the GetRecord request, and is also used in the responses to the ListIdentifiers and ListRecords requests. Note that the OAI identifier is specific to the item from which records are disseminated. For example, an item may have metadata available for harvest in both oai_dc and marc21. Both of these records will have the same unique OAI identifier, but will differ in their metadataPrefix and may differ in their datestamp.

OAI identifiers must correspond to Uniform Resource Identifiers (URIs) generic syntax. The Open Archives Initiative maintains a "Specification and XML Schema for the OAI Identifier Format" which is described in the Implementation Guidelines for use by Data Providers.

Note that an OAI identifier is NOT referring to the identifier for a resource (for example, what might be contained within the <dc:identifier> element in a metadata record).

See also the OAI documentation for a definition of OAI identifiers.

[edit] Best Practices for OAI Identifiers

The <oai-identifier> for a specific repository item should not change over time for the same object. If the <oai-identifier> for a specific item must be changed, in no case should it be re-used for a different item. If an item is deleted, its OAI identifier should also be marked deleted. See the DeletedRecords page for a full discussion of deleted records.

The <oai-identifier> for any item should not exceed 128 characters to be efficiently handled by all kinds of databases and file systems. Although not specified in the protocol, the length of the oai-identifier might affect processing of the record by the service provider.

Unless already using an established URI schema, OAI data repositories should conform to the Specification and XML Schema for the OAI Identifier Format. Use of this specification will mean that the repository identifiers are globally unique within the oai namespace. The advantage for repositories of adopting this naming convention is that record identifiers are resolvable via future resolution services (or current services such as OCLC's Extensible Repository Resource Locators (ERRoLs) for OAI Identifiers. See parts 3 and 4 in particular.)

If OAI data repositories implement the OAI Identifier Format discussed above, they should expose their compliance with the <oai-identifier> format by including a <description> container in their Identify response. An example of this container is:

    <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" 

See Identify Response Example 1 for the full Identify response.

Regarding the <oai-identifier> sub-elements:

The <repositoryIdentifier> must be an internet domain name (not a literal numeric IP address) that is registered to the organization that controls the OAI repository. Best practice is to use the domain name where the OAI service itself resides. For example, if the baseURL of the OAI data repository is http://oai.some.edu/path/one/oai.asp, then the repository identifier should be "oai.some.edu". If multiple OAI providers come from the same domain it is acceptable to create new domains specifically for use as identifiers. For example, the baseURL for two providers might be http://oai.some.edu/path/one/oai.asp and http://oai.some.edu/path/two/oai.asp, so their repository identifiers could be "one.oai.some.edu" and "two.oai.some.edu".

It is possible for the baseURL of a repository to change over time. However, the repository identifier should never change once it is established, so later in the life of an OAI repository, the domain of the baseURL may differ from the <repositoryIdentifier>.

OAI identifiers, and thus repository identifiers, are case-sensitive (even though internet domain names are not). Therefore, the best practice is to always use all lower-case for repository identifiers.

For the <sampleIdentifier> it is best practice to use an actual real identifier as the value.

Personal tools