Mounting a Finding Aids Collection

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
(Working with the EAD)
(Working with the EAD)
Line 79: Line 79:
* Modify XML templates
* Modify XML templates
* Modify CSS
* Modify CSS
 +
 +
== Practical EAD Encoding Issues ===
 +
 +
There are a number of encoding issues that may affect the data preparation, indexing, searching, and rendering of your finding aids. Some of them are:
 +
 +
* [fc_char.html Character Encoding issues]
 +
* [fc_ids Attribute ids must be unique within the entire collection ]
 +
* If you use attribute ids and corresponding targets within your EADs preparedocs.pl may need to be modified.
 +
* <eadid> should be less than about 20 characters in length
 +
* UTF-8 Byte Order Marks (BOM) should be removed from EADs prior to concatenation
 +
* XML processing instructions should be removed from EADs prior to concatenation
 +
* Multiline DOCTYPE declarations are currently not properly handled by the data prep scripts
 +
* If your DOCTYPE declaration contains entitys, you need to modify the appropriate *inp files accordingly
 +
* Out-of-the-box <dao> handling may need to be modified for your needs
 +
* If your &lt;unititle&gt; element precedes your &lt;origination&gt; element in <span class="unixcommand">the top level &lt;did&gt;, you will have to modify the maintitle fabricated region query in xxx.extra.srch </span>
 +
* If you have encoded &lt;unitdate&gt;s as siblings of &lt;unittitle&gt;s, you may have to modify the appropriate XSL templates
 +
* If you do not use a &lt;frontmatter&gt; element, you will have to make modifications to various files to provide an appropriate "Title Page" region based on the &lt;eadheader&gt;
 +
* If your encoding practices for &lt;biohist&gt; differ from the Bentley's, you may need to make changes in findaidclass.cfg or create a subclass of FindaidClass and override FindaidClass:: GetBioghistTocHead, and/or change the appropriate XSL files.
 +
* If you want &lt;relatedmaterial&gt;,&lt;separatedmaterial&gt; to show up in the table of contents (TOC) on the left hand side of the Finding Aids, you may have to modify findaidclass.cfg and make other modifications to the code. This also applies if there are other sections of the finding aid not listed in the out-of-the-box findaidclass.cfg %gSectHeadsHash.
 +
* If you want the middleware to use the &lt;head&gt; element for labeling sections instead of the default hard-coded values in findaidclass.cfg, you may need to make changes to the XSL and possibly modify other files.
==[[Findaid Class Behaviors Overview]]==
==[[Findaid Class Behaviors Overview]]==

Revision as of 10:34, 14 August 2007

Main Page > Mounting Collections: Class-specific Steps > Mounting a Finding Aids Collection


This topic describes how to mount a Findaid Class collection.

Workshop materials are located here.

Contents

Overview

Examples

Overview of Data Preparation and Indexing Steps

Data Preparation

  1. [#DataPrepStep1 validating the files individually] against the EAD 2002 DTD
    make validateeach
  2. [#DataPrepStep2 concatenating the files into one larger XML file]
    make prepdocs
  3. [#DataPrepStep3 validating the concatenated file] against the dlxsead2002 DTD:
    make validate
  4. [#DataPrepStep4 "normalizing" the concatenated file.]
    make norm
  5. [#DataPrepStep5 validating the normalized concatenated file against the dlxsead2002 DTD]
    make validate

The end result of these steps is a file containing the concatenated EADs wrapped in a <COLL> element which validates against the dlxsead2002 and is ready for indexing:

<COLL>
<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>

WARNING! If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like:


<COLL>
baddata<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
baddata<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
baddata<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>

In this case you will get "character data not allowed" or similar errors during the make validate step. You can troubleshoot by looking at the concatenated file and/or checking your original EADs.

Indexing

  1. make singledd indexes words for texts that have been concatenated into on large file for a collection. This is the recommended process.
  2. make xml indexes the XML structure by reading the DTD. Validates as it indexes.
  3. make post builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file.

Working with the EAD

EAD 2002 DTD Overview

These instructions assume that you have already encoded your finding aids files in the XML-based EAD 2002 DTD. If you have finding aids encoded using the older EAD 1.0 standard or are using the SGML version of EAD2002, you will need to convert your files to the XML version of EAD2002. When converting from SGML to XML a number of character set issues may arise. These are pretty much the same issues that were described for text class see [../conversion/index.html Data Conversion: Unicode, XML, and Normalization] .

Resources for converting from EAD 1.0 to EAD2002 and/or from SGML EAD to XML EAD are available from:

Other good sources of information about EAD encoding practices and practical issues involved with EADs are:

The EAD standard was designed as a ´loose¡ standard in order to accommodate the large variety in local practices for paper finding aids and make it easy for archives to convert from paper to electronic form. As a result, conformance with the EAD standard still allows a great deal of variety in encoding practices.

The DLXS software is primarily designed as a system for mounting University of Michigan collections. In the case of finding aids, the software has been designed to accommodate the encoding practices of the Bentley Historical Library. The more similar your data and setup is to the Bentley’s, the easier is will be to integrate your finding aids collection with DLXS. If your practices differ significantly from the Bentley’s, you will probably need to do some preprocessing of your files and/or modifications to various files in DLXS. We have found that the largest number of issues in implementing Findaid Class for member institutions is dealing with differences in encoding practices. We will cover various issues that commonly arise.

More information on the Bentley's encoding practices and workflow:


Types of changes to accomodate differing encoding practices and/or interface changes

  • Custom preprocessing
  • Add dummy EAD to data
  • Modify prep scripts (Makefile, preparedocs.pl, validateeach.csh)
  • Modify *inp files (DOCTYPE declarations and entities)
  • Modify fabricated regions (*.extra.srch)
  • Modify CollMgr entries
  • Modify findaidclass.cfg (change table of contents sections)
  • Subclass FindaidClass.pm
  • Modify XSL
  • Modify XML templates
  • Modify CSS

Practical EAD Encoding Issues =

There are a number of encoding issues that may affect the data preparation, indexing, searching, and rendering of your finding aids. Some of them are:

  • [fc_char.html Character Encoding issues]
  • [fc_ids Attribute ids must be unique within the entire collection ]
  • If you use attribute ids and corresponding targets within your EADs preparedocs.pl may need to be modified.
  • <eadid> should be less than about 20 characters in length
  • UTF-8 Byte Order Marks (BOM) should be removed from EADs prior to concatenation
  • XML processing instructions should be removed from EADs prior to concatenation
  • Multiline DOCTYPE declarations are currently not properly handled by the data prep scripts
  • If your DOCTYPE declaration contains entitys, you need to modify the appropriate *inp files accordingly
  • Out-of-the-box <dao> handling may need to be modified for your needs
  • If your <unititle> element precedes your <origination> element in the top level <did>, you will have to modify the maintitle fabricated region query in xxx.extra.srch
  • If you have encoded <unitdate>s as siblings of <unittitle>s, you may have to modify the appropriate XSL templates
  • If you do not use a <frontmatter> element, you will have to make modifications to various files to provide an appropriate "Title Page" region based on the <eadheader>
  • If your encoding practices for <biohist> differ from the Bentley's, you may need to make changes in findaidclass.cfg or create a subclass of FindaidClass and override FindaidClass:: GetBioghistTocHead, and/or change the appropriate XSL files.
  • If you want <relatedmaterial>,<separatedmaterial> to show up in the table of contents (TOC) on the left hand side of the Finding Aids, you may have to modify findaidclass.cfg and make other modifications to the code. This also applies if there are other sections of the finding aid not listed in the out-of-the-box findaidclass.cfg %gSectHeadsHash.
  • If you want the middleware to use the <head> element for labeling sections instead of the default hard-coded values in findaidclass.cfg, you may need to make changes to the XSL and possibly modify other files.

Findaid Class Behaviors Overview

Preparing Data and Directories

Character Issues

Encoding Issues

Validating and Normalizing Your Data

Building the Index

Working with Fabricated Regions

Modifying Findaid Class Files

Mounting the Collection Online

Troubleshooting

Linking from Finding Aids Using ID Resolver

Workshop materials

Working with the User Interface

Findaid Class Graphics Files

Findaid Class Processing Instructions

Top

Personal tools