From oaibp

Jump to: navigation, search

Main Page


[edit] Best Practices for OAI Data Provider Implementations and Shareable Metadata

A joint initiative of the Digitial Library Federation and the National Science Digital Library.

[edit] Introduction

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has been widely adopted since its inception in 2001; as of July 2005 there are over 700 active data providers (see the University of Illinois data provider registry for a current number: from a wide variety of domains and institution types. The protocol has demonstrated its usefulness as a tool to move and aggregate metadata from diverse institutions. The National Science Digital Library (NSDL), OAIster, American, and the IMLS Digital Content Gateway are examples of metadata aggregations harvested via the OAI protocol. Building on this work, the first phase of the Digital Library Federation's Aquifer Initiative will include an OAI-based repository for metadata and collection level descriptions harvested from DLF members. However, as the protocol has become more widely adopted, several broad areas of concern have surfaced -- mainly through the documentation of service providers -- that would benefit from the establishment of best practices.

Metadata harvesting consists of two parties: the data provider and the service provider. The data provider is any institution, organization or individual who exposes metadata (usually describing a resource(s) of some kind) via the OAI protocol. The service provider uses the OAI-PMH to harvest the data provider's metadata. The service provider generally aggregates metadata from many different data providers and creates a larger database of like resources. The intention in creating a larger database of resources is to provide users with a way of searching one database, rather than many, to discover resources distributed across the internet. In addition, service providers may build services beyond the standard search and retrieval service -- e.g. aggregations designed to support curriculum development or building of personal virtual collections.

The OAI protocol is quite flexible in that there are relatively few required pieces for implementation: valid responses to OAI verbs, the use of oai_dc, a unique and persistent OAI identifier, and a datestamp. The OAI Guidelines for Implementation ( have a limited technical scope, are intended for a general audience of implementers, and do not describe the consequences of not implementing some of the optional features of the protocol. This has meant that many of the features of OAI, such as sets, use of descriptive containers, etc, that are quite helpful for service providers have been underutilized. In addition, the need for best practices for the metadata provided through OAI can best be seen in the work that service providers have had to do to normalize and manipulate their aggregations to ensure a certain threshold of usefulness for end-users. High quality 'shareable' metadata will be crucial to the next step in useful metadata aggregations.

In the summer of 2004 a group of DLF and NSDL-affiliated Open Archives Initiative (OAI) data and service providers as well as other interested individuals gathered to discuss issues about OAI data and service provider implementations and concerns stemming from the harvesting of metadata from diverse collections. This document arose out of this discussion and the ongoing work of that group and proposes best practices for the OAI data provider implementations and the creation of 'shareable' metadata.

These best practices are designed to provide data and service providers with the information needed to create resources that are consistent across repositories and thereby support the user in finding the variety of rich resources available and to aid in the implementation of OAI data provider services.

[edit] Scope

The best practices in this document are meant for both OAI data and service providers and cover topics ranging from practices to be encouraged among data provider groups and optimization of content description.

Fundamental to this effort is the strategy of sharing -- whether metadata, standards, tools, or workflows -- which will make interoperability within reach.

The following sections have been included in this document.

  • General Needs

These are general guidelines, practices, or knowledge areas that are necessary before the OAI protocol can be implemented and used successfully. Both data and service providers should be conversant with the issues presented here. In many ways they represent the minimum of proficiency that is necessary to be a 'good' OAI data or service provider.

  • Best Practices for OAI Data Providers

This section deals with best practices for OAI data providers in terms of how the OAI protocol is implemented. Also included in this section are guidelines for some of the optional pieces of the OAI protocol including sets, branding, rights, and use of the about containers. Please note that metadata guidelines (both technical and content based) are included in their own section below.

  • Best Practices for Shareable Metadata

This section presents best practices for shareable metadata, both in terms of technical issues (such as XML encoding) and metadata format, semantics, and content.

[edit] Target Audience and Developing Principles

The target audience for this document is both data and service providers who seek to make the process of providing and harvesting data an efficient and streamlined process. We will endeavor, to the best of our abilities to make the document succinct and free of jargon in order that those new to the provision of data and content and those who seek to aggregate metadata from distributed online collections will find this a clear and useable document.

The following principles guide the development of this document:

  • Do not duplicate what has already been developed and is openly available. For this reason, many of these best practices and guidelines link to information outside of this document. We have added additional information and context for these as needed.
  • Give context for best practices. Let the reader understand what the ramifications are for following (or not) the best practice.
  • Standards, standards, standards.
  • Give examples - both 'best' and 'worst' - whenever possible.
  • Use clear and concise language. Avoid jargon.
  • Be prescriptive whenever possible.
  • Link within document as much as possible.

[edit] Future Plans

The group responsible for this document are aware that there are other areas in which best practices and other community building tools could be developed. These include best practices for service provider implementations, communications between data and service providers, and a place to share tools and strategies to extend the OAI protocol. We hope that these pieces will be developed in the future.

[edit] Acknowledgements

This document is the result of a large group of individuals with a range of experience with the OAI protocol and includes both service and data providers and metadata librarians. We would like to thank and acknowledge all of those who have given their time and expertise to help develop these best practices.

The individuals who have contributed to the writing of this document are:

  • Caroline Arms (Library of Congress)
  • Tim Cole (UIUC)
  • Naomi Dushay (NSDL / Cornell)
  • Muriel Foulonneau (UIUC)
  • Tom Habing (UIUC)
  • Kat Hagedorn (Univ. of Michigan) - Co-editor of Best Practices
  • Arwen Hutt (UC San Diego)
  • Diane Hillmann (NSDL / Cornell)
  • Ann Lally (Univ. of Washington)
  • Bill Landis (CDL)
  • Clay Redding (Princeton)
  • Jenn Riley (Indiana) - Editor of Shareable Metadata
  • Sarah Shreeves (UIUC) - Co-editor of Best Practices
  • Jewel Ward (USC)
  • Simeon Warner (Cornell)
  • Jeff Young (OCLC)

The original planning group for this effort was:

  • Naomi Dushay (NSDL / Cornell)
  • Kat Hagedorn (Univ. of Michigan)
  • Martin Halbert (Emory University)
  • Diane Hillmann (NSDL / Cornell)
  • David Seaman (DLF)
  • Sarah Shreeves (UIUC)
  • Roy Tennant (CDL)

In July 2004, a meeting of interested parties was hosted at the California Digital Library. Present at this meeting were:

  • Caroline Arms (Library of Congress)
  • Naomi Dushay (NSDL / Cornell)
  • Muriel Foulonneau (UIUC)
  • Kat Hagedorn (Univ. of Michigan)
  • Martin Halbert (Emory University)
  • Ann Lally (Univ. of Washington)
  • Bill Moen (Univ. of North Texas)
  • Clay Redding (Princeton Univ.)
  • Jenn Riley (Indiana Univ.)
  • Sarah Shreeves (UIUC)
  • Robert Tansley (HP - DSpace)
  • Roy Tennant (CDL)
  • Simeon Warner (Cornell)
  • Jeff Young (OCLC)
Personal tools