DigitalTactileResource

From oaibp

Jump to: navigation, search

Main Page >> Shareable Metadata

Note: Summary of best practices added. Jenn Riley 10/11/05

[edit] Describing Versions and Reproductions

[edit] Summary of Best Practices

  • Adhere to the one-to-one principle when practical.
  • When it is necessary to provide access to multiple versions of a resource, carefully select a strategy from options used by other data providers in the OAI community.

Many resources described by metadata records shared via OAI exist in multiple versions. An image may exist as a film negative, a digitized TIFF master image, and three sizes of JPEG derivative images. A text may exist as a TEI file, an unedited ASCII text file, and a PDF document of digitized page images. Metadata is also used (particularly by museums) to describe physical objects, as well as reproductions of those objects in various formats.

Most metadata standards emphasize the description of only one version of an object in a metadata record. The Dublin Core (DC) Usage Guide <http://dublincore.org/documents/usageguide/index.shtml> defines this concept as the 'One-to-One Principle':

"The One-to-One Principle. In general Dublin Core metadata describes one manifestation or version of a resource, rather than assuming that manifestations stand in for one another. For instance, a jpeg image of the Mona Lisa has much in common with the original painting, but it is not the same as the painting. As such the digital image should be described as itself, most likely with the creator of the digital image as Creator or Contributor, rather than the painter of the original Mona Lisa. The relationship between the metadata for the original and the reproduction is part of the metadata description, and assists the user in determining whether he or she needs to go to the Louvre for the original, or whether his/her need can be met by a reproduction."

The One-to-One Principle was designed to forestall the common problem of metadata that described more than one version of a resource. Because DC is dependent on syntax for structure, and many (if not most) of the syntaxes used for DC have no mechanism for relating statements when more than one description is embedded in a single record, interoperability is severely compromised. This is a particular problem when metadata from many sources, with different practices regarding versions, are aggregated to support discovery over a broad range of materials (see p. 32-33 of the book "Metadata in Practice" for a particularly chilling discussion of the possibilities). It is important to note that metadata formats other than Dublin Core have similar difficulties maintaining connections between descriptive elements within records describing multiple versions of resources.

Complete adherence to the one-to-one principle may sometimes be impractical. The difficulties of re-combining versions for user displays, and the current primitive state of linking mechanisms between related records should be considered when planning for the creation of metadata records. Below are described some common compromises (listed in no particular order) between reality and the One-to-One Principle, with some descriptions of advantages and disadvantages of each. In all cases, the advice that one should not try to overcome software deficiencies by manipulating data should prevail.

1. The "entry page" approach

In this approach, the metadata description is relatively generalized, without a lot of technical detail, except, perhaps, a list of digital formats in which the item is available, in repeating elements. The identifier for the record leads to an HTML page with links to the versions available, including instructions, caveats, etc.

Advantages

  • Fairly clean, easily updated without necessarily needing to update metadata frequently
  • Metadata record creation can emphasize provision of information on topic, contributors, etc. without worry about reconciling versions of these records

Disadvantages

  • Aggregators cannot easily get to content through entry pages when content is used to support additional indexing
  • When format information is present only on entry pages, downstream services cannot easily support user filtering by format

Examples:

  • arXiv.org

2. The "clustering" approach

In this approach, similar metadata records (varying only by digital version) are clustered together. This approach is sometimes chosen by providers who have a number of similar versions, say of image files varying only by format or non-significant size variations. In this approach, different records might be provided for physical and digital versions, or between an original and reproduction(s) but not for versions of a similar genre.

Advantages

  • Seems to provide a practical compromise
  • May be relatively easy to explain to users
  • May support compact user display

Disadvantages

  • Decisions made locally about what constitutes a "cluster" may not be the same as others in a partnership or domain
  • Records may "misbehave" when used in a system with different assumptions about relationships

No example available.

3. The "vocabulary" approach

This approach relies on specific vocabularies and careful practices to link metadata records together. The vocabularies can be used in a wide range of elements, from identifier to citation, with the only requirement that they not represent invalid values for that element. Best practice for this approach requires that the vocabulary be documented and available to other users.

Advantages

  • Easy to maintain
  • Allows more than one strategy for "clustering" search results for users
  • Allows simple "more like this" capabilities using standard thesaural mechanisms

Disadvantages

  • Requires close monitoring of data records to manage
  • Requires development and/or management of vocabularies in addition to metadata

Examples

The K-MODDL project

The K-MODDL project uses several vocabularies to manage relationships between a wide variety of media and versions all relating to a collection of physical models demonstrating mechanical principles. One vocabulary uses a nineteenth century classification of the models themselves as the basis for a vocabulary that, used as a subject, ties all materials relating to a particular model together. The vocabulary is structured so it can also be used to relate models together that demonstrate the same principles. In addition to this vocabulary a specially developed type vocabulary allows all materials of the same format to be gathered together. The project also uses the Art and Architecture Thesaurus (AAT) to further describe the physical aspects of some of the materials in the project.

4. The "linking" approach

This approach uses the Relation element or other linking strategies to connect related records for versions and reproductions. The links may be URIs or other identifying strings, or they may be citations. Linking approaches may vary; some may require reciprocal links, others may be one way only.

Advantages

  • Unambiguous when using URIs or standard numbers
  • Generally does not require explanation to be interpreted by others

Disadvantages

  • Expensive and sometimes difficult to maintain
  • Does not scale well to complex relationships

No example available.

5. The "intellectual object" approach

This approach provides one descriptive metadata record for multiple versions of one "intellectual object". For example, an institution may decide to digitize a 4x5 photo negative which also has an 8x10 print with it. When the negative is digitized for online access, multiple digital versions will be created, in addition to the 2 original physical objects, but only one metadata record will be created for the entire "intellectual object".

Advantages

  • Reduces maintenance, because only one record has to be created and maintained for each intellectual object
  • Reduces dependence on applications to make sense of each set of records

Disadvantages

  • A heavy reliance on qualifiers
  • The original object being catalogued is not necessarily the one all users want information about

Examples

USC Digital Archive

The University of Southern California Digital Archive creates one metadata record for one intellectual object that is comprised of the following items:

  • One thumbnail
  • One 256-pixel .jpeg
  • One 512-pixel .jpeg
  • One 1024-pixel .jpeg
  • One hi-res zoom in/zoom out file
  • One hi-res .tiff
  • The 8x10 photo print
  • One 4x5 negative

The files below are example mappings from the physical to the digital object.

USC DC Digital vs Physical

USC DC Format Demo

Personal tools