From oaibp

Revision as of 14:26, 28 June 2007 by Khage (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Main Page >> Shareable Metadata

Note: Moved CDWA lite to list of formats with a current schema. Jenn Riley 8/29/06
Note: Clarification of stepped crosswalking added as per reviewer comment. Jenn Riley 10/8/05.
Note: Summary of best practices added. Jenn Riley 10/11/05
Note: Combined use of multiple metadata formats with possible metadata formats per reviewer suggestion. Sarah Shreeves 12/7/05.


[edit] Use of Multiple Metadata Formats

[edit] Summary of Best Practices

  • Use of metadata formats in addition to simple Dublin Core are both allowed and encouraged.
  • Choose metadata formats to supplement simple Dublin Core that are expressed as an XML schema and are common in communities to which your resources are of interest.
  • Metadata formats used must be listed in response to a ListMetadataFormats request.
  • Indicate metadata formats used for records within a given set in the Set Description.

Myth: The OAI protocol only allows exposure of simple Dublin Core (DC) records.

Myth: The OAI protocol exposes a single metadata record for each item.

Reality: The OAI protocol is designed to support records in multiple metadata formats for each item in a repository. An item can be exposed as a MODS record, MARCXML record, Qualified Dublin Core record, as well as the required simple Dublin Core record.

[edit] Use of Multiple Metadata Formats

OAI-PMH is designed to support records in multiple metadata formats for each item. An item can be exposed in as many metadata formats as desired as long as those metadata formats have an XML schema available for validation.

The OAI protocol does require a simple (or unqualified) Dublin Core Dublin Core Metadata Element Set 1.1 record to be available for every item. For this purpose the Open Archives Initiative makes available an XML schema for simple Dublin Core, and has reserved the metadata prefix oai_dc for this schema. However, in addition to this required oai_dc record, records in other metadata formats can be provided for any or all of the items a repository includes.

It is a best practice that, in addition to simple Dublin Core, repositories expose the richest possible metadata formats available for all items in the repository. Why include additional metadata formats? Simple Dublin Core cannot express some of the complexities many OAI repositories wish to communicate about their resources (and which service providers wish to know!). In addition, simple Dublin Core does not include a way to convey the controlled vocabularies and encoding schemes in use. Metadata formats that are more semantically complex can support a variety of uses, including ones not anticipated by the OAI repository. By supplying additional metadata formats which have the semantic richness to more clearly express meaning, data providers can help service providers make better use of their metadata.

The choice of additional metadata formats should be made based on the robustness of description desired for the resources in question, the commonly-used metadata schema in the community in which the resources will be primarily used, and, if applicable, the needs of a service provider by whom a repository specifically wishes to be harvested. Any number of additional metadata schemas may be used in order to reach desired audiences. However, metadata formats used with OAI must have an XML schema available for validation (see See below for a selected list of possible metadata formats.

All metadata formats available for harvest must be included in the response to a ListMetadataFormats] request.

If multiple metadata formats are available and sets are implemented, it is best practice to include in the set description(s) the metadata formats that are available for the items in a particular set. This is because ithe protocol does not require that all items be available in all metadata formats (besides simple Dublin Core), and there is not a way in the protocol to request the metadataPrefixes in use for a specific set. For example, one set may include items available in both oai_dc and mods, while a second set may include items only available in oai-dc. See Best Practices for OAI Sets for further information.

If some, but not all items in a set are available in an additional metadata format(s), it is recommended that additional sets, corresponding to the additional format(s), be built.

As discussed in Crosswalking Logic section, repositories with metadata formats other than simple Dublin Core may benefit from first attempting to crosswalk their native metadata to Qualified Dublin Core, then "dumb down" to simple Dublin Core. This strategy allows the repository to make iterative small changes rather than one significant change, and may ensure the most compliant and lossless result.

See also XML Namespaces and Schemas for discussion of some technical issues.

[edit] Potential Metadata Formats for Use with OAI

The use of multiple metadata formats (at least one in addition to the required simple Dublin Core) is strongly encouraged, and this list is a necessarily incomplete list of potential metadata formats for use with OAI. To see the range of metadata formats in use, see the Distinct Metadata Schemas report from the OAI Registry at the University of Illinois at Urbana-Champaign.

MODS: Metadata Object Description Standard

MODS may be a good option for an additional metadata schema to expose via OAI for data providers who:

  1. locally engage in descriptive practices heavily influenced by resource description standards in libraries, and,
  2. have as a primary audience for resources described via OAI records a community well-versed in library descriptive practices, yet also want robust records in a format accessible to service providers outside the core library community.

The MODS v.3.0 XML Schema is available at

Qualified Dublin Core

Qualified Dublin Core may be a good option for an additional metadata schema to expose via OAI for data providers who:

  1. have a need for more granularity of description than is available in simple Dublin Core but not a fundamentally different approach to resource description, and,
  2. use controlled vocabularies that they wish to specify within their metadata records, and,
  3. have resources of interest to many different knowledge communities with disparate descriptive metadata practices.

There does not exist a single canonical Qualified Dublin Core XML schema. However, a XML schema for Qualified DC can be created through the importation of the necessary namespaces and schemas.


MARCXML may be a good option for an additional metadata schema to expose via OAI for data providers who:

  1. locally describe resources in MARC according to AACR2r, and,
  2. have as a primary audience for resources described via OAI records the core library community.

The MARCXML XML Schema is available at


CDWA Lite may be a good option for an additional metadata schema to expose via OAI for data providers who:

  1. wish to describe works of art and material culture
  2. intend their metadata to be used by specialist audiences in the art domain

The CDWA Lite XML Schema version 1.1 is available at

ETD-MS: Electronic Theses, and Dissertations Metadata Standard

ETD-MS may be a good option for an additional metadata schema to expose via OIA for data providers who:

  1. are primarily exposing metadata about electronic theses and dissertations.
  2. wish to contribute to aggregations of electronic theses and dissertations such as the Networked Digital Library of Theses and Dissertations

The ETD-MS XML Schema is available at

[edit] Metadata Formats with XML Schemas in Development

VRA Core

Visual Resources Association (VRA) Core is a "single element set that can be applied as many times as necessary to create records to describe works of visual culture as well as the images that document them." The current version, VRA Core 3, is available as an XML DTD. The Visual Resources Association Data Standards Committee is currently developing a new version, VRA Core 4, which will be available as an XML Schema. The tentative release date for VRA Core 4 is early 2006.

See for more information about VRA Core, version 3.

Encoded Archival Description (EAD)

EAD is an option for providing expressive metadata for linking parts of archival collections to a collection-level description. The benefit in using EAD is that all parts or items in a collection are provided en masse, making sure that individually digitized items do not lose their context in regard to their relationship to the collection as a whole.

There is no official EAD XML Schema, although leaders within the EAD community are discussing its creation. Several EAD implementers have converted instances of the EAD 1.0 and EAD 2002 DTDs into XML Schema, however. See for an example from Princeton University.

See for more information about the EAD standard.

Personal tools