DocumentingSource

From oaibp

Revision as of 14:28, 28 June 2007 by Khage (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Main Page >> Shareable Metadata

Contents

[edit] Providing Supplemental Documentation to OAI Service Providers

Note to Reviewers: This page was updated on Sept 7 2005.
Note: Summary of best practices added. Jenn Riley 10/11/05

[edit] Summary of Best Practices

  • Provide documentation on choices made when providing metadata for exposure via OAI.

[edit] Why documentation?

There are two parties to an OAI-mediated transaction--the data provider and the service provider. They exchange metadata in the context of a protocol that allows a good bit of information about the metadata to be exchanged. However, the OAI world is becoming increasingly diverse, and in order to make best use of harvested metadata, the service provider may need to know more than what the protocol requires or encourages the data provider to expose. Given this scenario, the data provider should:

  • make careful use of the opportunities within the OAI protocol (such as <about> containers) for documenting practices
  • make additional information available for any Service Providers who wish to know more about the metadata and its origins, changes, and capabilities

Regardless of the metadata format(s) data providers expose via OAI, it is always a good idea to provide documentation for the decisions and standards in use for the exposed metadata. Such documentation can help service providers better aggregate your metadata with others because the service provider will have a better understanding of how to interpret and normalize the metadata. This documentation is especially important when a data provider is exposing only a low-granularity metadata format like oai_dc, as it can help OAI service providers to make sense of the choices the data provider has made.

In their chapter on The Continuum of Metadata Quality: Defining, Expressing, Exploiting published in "Metadata in Practice," [1] Thomas Bruce and Diane Hillmann assert that optimal metadata provision should include:

  • An expression of metadata intentions based on an explicit, documented application profile, endorsed by a specialized community, and registered in conformance to a general metadata standard (they assert that an XML schema does not express intention).
  • A source of trusted data with a known history of regularly updating metadata, including controlled vocabularies. This includes explicit conformance with current standards and schemas.
  • Full provenance information, including nested information, as original metadata is harvested, augmented, and re-exposed. This may not record changes at the element level, but should reference practice documentation that describes augmentation and upgrade routines of particular aggregators.

Note that this last point is particularly relevant to aggregators who are re-exposing metadata harvested from elsewhere.

[edit] What to document?

  • Data source and creation decisions and history
    • Is the data crosswalked from another data source? Is that data source available in its native form? If so, where?
      • Example: A table that illustrates the mapping of the native metadata's elements to the metadata format that is exposed for harvest.
    • Is the data created by humans or machines? Describe the creation methodology or refer to a description.
  • Use of controlled vocabularies and content standards:
    • What vocabularies are used, for what element, and under what circumstances (especially important if only oai_dc is used)?
      • For example, when exposing oai_dc, which, unlike MODS, MARC or Qualified Dublin Core, does not have attributes to express authorities or controlled vocabularies used, indicate that the subjects are assigned using LCSH and your genre and form terms are assigned using AAT (if indeed these are the thesauri you use).

Example in simple Dublin Core, where information about the thesauri used is unavailable to OAI service providers unless supplied in supplemental documentation:

    <dc:title>13th National Army Cantonment, Camp Dodge, Iowa</dc:title>
    <dc:subject>Railroads.</dc:subject>
    <dc:type>still image</dc:type>
    <dc:type>Panoramic photographs.</dc:type>

See DC LOC Photo 1 for the complete record from which this example was taken.

Example in MODS, where information about the thesauri you use travels with each metadata element, making it unnecessary to supply it in supplemental documentation:

    <mods:titleInfo>
    <mods:title>13th National Army Cantonment, Camp Dodge, Iowa</mods:title>
    </mods:titleInfo>
    <mods:typeOfResource>still image<mods:typeOfResource>
    <mods:genre authority="gmgpc">Panoramic photographs.</mods:genre>
    <mods:subject authority="lctgm">
    <mods:topic>Railroads</mods:topic>
    </mods:subject>

See MODS LOC Photo 1 for the complete record from which this example was taken.

    • Is the whole vocabulary available to be used with this data or is only a subset approved for use?
    • Are some terms used from local lists and not specified in the namespaces (possibly not formally documented?) Is any documentation available on local vocabularies, and if so, where?
    • Information about the descriptive content standards that are used locally (such as Anglo-American Cataloging Rules (AACR2), Describing Archives: A Content Standard (DACS), or Cataloging Cultural Objects (CCO)), which will help OAI service providers harvesting your metadata to understand the context for your metadata and may be useful to them in normalizing and transforming your metadata content so that it works most efficiently for end users of their web-based portal or service.
  • Names practice:
    • Order of names (direct order or surname, forename)
    • Fullness of names (are initials used routinely instead of full names?)
    • Any additions to names (courtesy or academic titles, affiliations)
    • Authority source for names? (are name variants available?)
  • Dates
    • Date practices (again, particularly important for oai_dc)
    • Parsing rules if not encoded?
  • Identifiers
    • For unencoded or local identifiers, describe the source of the identifier and any parsing or validation rules
  • Quality control measures
    • Are controlled vocabulary values updated when the source vocabulary changes?
    • Are validiation routines run regularly on encoded data?
  • Updating practices and schedules
    • How often are new or changed records added to the database?
    • How many new or updated records are added per week or month?
  • Set decisions and specifications (see Sets documentation)
    • Are sets added/removed on a regular basis? Are all records included in sets? Is there overlap?

In addition, supplemental documentation is also a good way to provide special information to OAI service providers harvesting your metadata. As one example of how this can be useful to and eliminate guesswork on the part of OAI service providers, many types of resources for which metadata records are exposed may not have a meaningful formal title, for instance a set of satellite images or a set of images of fossils. In both of these cases, end users might find a subject, geographic location, or type of resource a more meaningful substitute for a title supplied by a cataloger or the service provider in the absence of a formal title. Nonetheless, many OAI service providers will need to identify a title-equivalent field to be used in automated citation generation or in search results lists. It is very helpful for OAI data providers, who are in many cases much closer to content experts for the metadata objects being exposed for harvest, to supply information regarding a preferred title-equivalent metadata field for sets in which records do not contain titles (e.g., a <title> field in oai_dc or its equivalent in another metadata format).

[edit] How and where to document

The best place to make available this supplemental documentation about sources used for the creation and mapping of your OAI-harvestable metadata is in the set description, as described in Best Practices for Sets. It is useful to provide a link to a publicly available web page that provides the documentation. It should be noted that this can also be quite useful for the data provider as well!

References:

[1] The Continuum of Metadata Quality: Defining, Expressing, Exploiting / Thomas R. Bruce, Legal Information Institute, Cornell Law School; Diane I. Hillmann, National Science Digital Library. Published in "Metadata in Practice," ALA Editions, 2004.

Personal tools