Working with Map Files

From DLXS Documentation

Revision as of 16:20, 14 August 2007 by Cboulay (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Main Page > Working with DLXS Components > Working with the Collection Metadata Database > Working with Map Files

Contents

[edit] Overview

This document describes what we call maps or map files. Map files contain mapped items where one term or name for an item is mapped to another term or name. For example, a term used by an HTML form to refer to a searchable region (e.g., "entire text"; see LABEL below) can be mapped to an XPAT searchable region (e.g., TEXT; see NATIVEREGIONNAME below).

Currently, the format of the map files is SGML and each collection map file conforms to a simple DTD (other ways of mapping terms, such as a database where one could map from one column's data to another are possible and have been considered for implementation). The map is read into a TerminologyMapper object during the running of the middleware after which the CGI program can at any time request of the object the mappings for terms. Each mapped item and its various terms are contained within a <MAPPING> element.

[edit] Semantic Contexts

There are two semantic contexts for MAPPINGs currently implemented.

  1. Mapping a set of terms to one another
  2. Mapping and ordering the terms for an HTML form selection's option elements

[edit] Mapping a set of terms to one another

Collection map files exist to identify the regions and operators used by the middleware and XPAT in four ways, each way represented by one of four terms:

  1. LABEL: by the term that is used in the collection database and interface
  2. SYNTHETIC: by the variable name that is used in the cgi program
  3. NATIVE: by the language that is used by the search engine
  4. NATIVEREGIONNAME: by the element name that is indexed

[edit] Mapping terms for XPAT operators

The first part of the map (by convention rather than by DTD enforcement) contains the mappings for the boolean and proximity operators. In versions of DLXS prior to Release 10, mappings for operators tended to appear twice, with labels in all lower case and with mixed case, to cover likely interface option scenarios. Only one mapping per operator is now permitted; older map files must be updated to eliminate unused "duplicate" operator mappings. Here is an example of an operator mapping:

  <mapping>   <label>and</label>   <synthetic>AND</synthetic>   <native>^</native> </mapping>

(^ is the symbol used in the XPAT query language to indicate an intersection.)

[edit] Mapping terms for regions

The second part of the map file contains region mappings, which identify the SGML elements, encoded or fabricated, that are used by the middleware and in the HTML, either as labels in pulldown menus or as rgn variables in links to text from results lists. These are the labels stored in the collection manager fields termsearch,regionsearch, and bibsearch. The mapping labels and the collmgr entries must match exactly in spelling, number, and case. If they do not, the middleware will fail. For any collection, there will be at a minimum entries with SYNTHETIC mappings for MAIN_SEARCHABLE, IDNO, BIBL, and NODE (used by the cgi); with LABEL mappings for full text, works, and citation (used as labels in the HTML search pages); and with NATIVEREGIONNAME mappings for DIV1 (used to build a link to divisions from results lists). There should of course be maps for all the divisions in a given collection. Here is an example of a region mapping:

  <mapping>
  <label>full text</label>
  <synthetic>MAIN_SEARCHABLE</synthetic>
  <native>region TEXT</native>

  <nativeregionname>TEXT</nativeregionname>
  </mapping>

Note: In BibClass, SYNTHETIC and NATIVEREGIONNAME are not used, but SUMMARYLABEL is. See Mounting a Bib Class Collection.

[edit] Mapping and ordering the terms for an HTML form selection's option elements

This section of the map file is not needed in all collections, but may be needed for a specific collection if its markup supports specialized restrictions such as date of publication, genre, period, or gender. In general, the maps support label values, native values, and the order in which the restrictions should be presented in pulldown menus. The existence of these maps is indicated in the metadata database. Here are the genre mappings for the Chadwyck-Healey Yeats collection, which divides works into four categories:

  <mapping>
  <genrelabel>Prose Fiction</genrelabel>
  <genreorder>1</genreorder>
  <genrenative>FICT</genrenative>

  </mapping>
  <mapping>
  <genrelabel>Prose Non-fiction</genrelabel>
  <genreorder>2</genreorder>

  <genrenative>NONFICT</genrenative>
  </mapping>
  <mapping>
  <genrelabel>Drama</genrelabel>

  <genreorder>3</genreorder>
  <genrenative>PLAY</genrenative>
  </mapping>
  <mapping>

  <genrelabel>Poetry</genrelabel>
  <genreorder>4</genreorder>
  <genrenative>POEM</genrenative>
  </mapping>

Under the basic middleware architecture, collection maps are stored in $DLXSROOT/misc/c/class/maps/ and are named collid.map (for example, moa.map or ampo20.map for the Making of America and 20th Century American Poetry collections, respectively).

Top

Personal tools