CrosswalkingLogic

From oaibp

Jump to: navigation, search

Main Page >> Shareable Metadata

Note: Clarification of stepped crosswalking and splitting fields cases added as per reviewer comment. Jenn Riley 10/8/05.
Note: Teased apart data to include and exclude per reviewer comment. Sarah Shreeves 10/12/05.
Note: Summary of best practices added. Jenn Riley 10/12/05

Contents

[edit] Crosswalking Logic

[edit] Summary of Best Practices

  • Map metadata from more robust formats to simpler ones.
  • Plan for both mapping values between fields and for transforming data values themselves to meet the expectations of the target metadata format.
  • Repeat elements when your target metadata schema allows it.
  • Include titles and appropriate context in your mapped metadata.
  • Exclude indications of unknown or inapplicable data, and artifacts of descriptive practices not applicable to the target metadata format.
  • Stepped crosswalking may be beneficial.

[edit] General Procedure

Ideally, crosswalking from one metadata schema to another should be done in a manner that limits loss of data or specificity. In some cases crosswalking will be done to convert records from a local schema to a community standard, in others from a richer to a simpler metadata format. Most metadata providers who are not using Simple Dublin Core as their primary schema will need to do some crosswalking to provide the required Simple DC (oai_dc) records for OAI-PMH. Always crosswalk from a richer metadata standard to a simpler one; mapping from a simple schema to a richer one will not usually yield any extra usable data. After determining the metadata formats you intend to expose via OAI, you will need to decide whether you wish to use standard crosswalks to accomplish your task, or whether you will need to develop your own crosswalk(s).

If you choose to create your own crosswalk(s), you will need to develop the logical rules for transforming your existing metadata records into those you want to make available via OAI. This involves defining the steps, as specifically as possible, necessary to change your metadata into the version you would like to expose. These logical rules may operate at several levels, and require distinct decisions.

[edit] Data mapping:

  • Mapping the complete contents of one field to another, as is. This case occurs when a metadata element in your local implementation matches exactly the semantics of a target element, and your existing metadata in this field is formatted exactly in the same way as the target element metadata should be. The transformation rule in this case simply involves copying the value in your local field to the field in the target schema.
  • Splitting data in one field into two or more fields. Your target metadata schema may have separate fields for data that your local schema stores in a single field; e.g., publisher name and place, or first and last names. Developing mapping logic in this case would require identifying the rules for deciding what part of an existing field goes to each new field in the target schema.
  • Splitting multiple values in a single local field into multiple iterations of a single field in the target schema. This case occurs when your local implementation allows for multiple values of the same type to be placed together in a single field, and your target metadata schema recommends repeating fields rather than 'packing' multiple values of the same type into a single field. The transformation rule in this case requires specifying the characters within the field in your source format that indicate the end of one value and the beginning of the next.

[edit] Data value transformation:

  • Translating anomalous local practices into a more generally useful value. For example, many sites using CONTENTdm software store date ranges as a comma-delimited set of individual years; e.g., the date range 1890-1895 would be expressed as 1890, 1891, 1892, 1893, 1894, 1895. Such software-specific work-arounds should be translated back to the original date range value for OAI records.
  • Transforming data values. The syntax of data in your local fields may not match the expected syntax of a given metadata schema or the descriptive practices of the community you are trying to reach with your OAI records; e.g., your metadata uses direct order for names while inverted names are expected in your OAI records, or you use a date format with written-out month names and numeric dates are expected in your OAI records. In this case, the fields affected must be identified and information provided about the format of the source data so that it may be interpreted correctly when transformed into another format.

[edit] Data to Include

  • Titles. While Dublin Core does not require any element, OAI Service Providers commonly use the DC title element as the core of a brief results display. Think carefully before exposing DC records without titles. If you determine title is not an appropriate element for the items being described, you should communicate the reasons for this in a Set Description, along with information about the fields in your records most useful for a brief results display. See Providing Supplemental Information to Service Providers and Titles in OAI Records for further discussion.
  • Repeating elements. Whenever using a metadata schema that allows for repeating elements, always repeat the element for each value rather than "packing" multiple values into a single element. It is easier for the service provider to merge them for display than divide them to process the various values within an element field.
  • Preserving context. Data providers should ensure that their record, when standing alone, makes sense outside its local context. For example, in a collection of Russian images, each record should contain reference to Russia. Any context useful for information discovery or for information display should be automatically added to each individual record during mapping.

[edit] Data to Exclude

  • Values indicating that information is unknown or the element is not applicable. It is possible that a record will have elements with values that are essentially empty or values that indicate that the relevant information is unknown or not applicable such as <dc:date>--</dc:date> or <dc:subject>XXX</dc:subject>. In several cases, the value, or lack thereof, might be considered as information (the fact that the object has no date) but generally this is of little interest for the purpose of information retrieval. For example, <dc:date>ND</dc:date> or <dc:date>undated</dc:date> or <dc:creator>unknown</dc:creator>. While this information may be useful in a local environment, it creates problems in a shared environment. OAI records should not include these values. Similarly, if there is no data for an element in the schema being exposed, the OAI record should refrain from using that element, rather than presenting it as empty, for example, <dc:date></dc:date> or <dc:date />
  • Junk values. When mapping values from one metadata format to another, do not include values representing artifacts of description from the original records, such as '<dc:creator>et al.</dc:creator> or <dc:creator>and</dc:creator>.

[edit] Stepped Crosswalking

In some cases, a direct crosswalk from a very rich format to a simple one involves a great deal of lossiness. One way to prevent unnecessary loss and improve the accuracy of mapping, is to crosswalk the richer format into an intervening format as part of the process. For example, Qualified DC can serve as an intervening format between richer metadata and Simple DC (oai_dc). This stepped approach can be used simply as a conceptual tool for improving the quality of mapping, or as a stage in actual data transformation. By making transformations in a series of small steps rather than one large step, repositories may find it easier to match the semantics of elements in the source format to those of Dublin Core.

Performing data transformation in stages has the added benefit of creating records in the intervening metadata format (in our example, Qualified DC) that can then be exposed via OAI, in addition to the required Simple DC and any other metadata formats. As mentioned in the Multiple Metadata Formats section, data providers are encouraged to provide metadata in multiple formats.

Personal tools