Building the Index

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search

Revision as of 13:08, 14 September 2007

After you have followed all the steps to set up your directories and prepare your files, as described in Validating and Normalizing Your Data, indexing the collection is fairly straightforward. To create an index for use with the Findaid Class interface, you will need to index the words in the collection, then index the XML (the structural metadata, if you will), and then finally "fabricate" regions based on a combination of elements (for example, defining what the "main entry" is, without adding a <MAINENTRY> tag around the appropriate <AUTHOR> or <TITLE> element).

The main work in the indexing step is making sure that the fabricated regions in the workshopfa.extra.srch file match the characteristics of your collection.

Note: If the final "make validate" step in Data Preparation Step 5:Validating the normalized file against the dlxsead2002 DTD produced errors, you will need to fix the problem before running the indexing steps. Attempting to index an invalid document will lead to indexing problems and/or corrupt indexes.

The Makefile in the $DLXSROOT/bin/c/collection directory contains the commands necessary to build the index, and can be executed easily.

To create an index for use with the Findaid Class interface, you will need to index the words in the collection, then index the XML (the structural metadata, if you will), and then finally "fabricate" structures based on a combination of elements (for example, defining who the "main author" of a finding aid is, without adding a <mainauthor> tag around the appropriate <author> in the eadheader element).

The Makefile should be in the $DLXSROOT/bin/c/collection directory.

cd $DLXSROOT/bin/c/collection

The following commands can be used to make the index:


make singledd indexes words for texts that have been concatenated into one large file for a collection.

make xml indexes the XML structure by reading the DTD. It validates as it indexes.

make post builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file. Because every collection is different, the *extra.srch file will probably need to be adapted for your collection. If you try to index/build fabricated regions from elements not used in your finding aids collection, you will see errors like:

Error found: <Error>syntax error before: ")</Error>  

when you use the make post command

Step by Step Instructions for Indexing

Step 1: Indexing the text

 cd $DLXSROOT/bin/w/workshopfa
 make singledd

The make file runs the following commands:

 cp /l1/workshop/test02/dlxs/prep/w/workshopfa/workshopfa.blank.dd
 	/l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 /l/local/xpat/bin/xpatbld -m 256m -D /l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 cp /l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 	/l1/workshop/test02/dlxs/prep/w/workshopfa/workshopfa.presgml.dd

Step 2: Indexing the the XML

 make xml

The makefile runs the following commands:

 cp /l1/workshop/test02/dlxs/prep/w/workshopfa/workshopfa.presgml.dd
 	/l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 /l/local/xpat/bin/xmlrgn -D /l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 	/l1/workshop/test02/dlxs/misc/sgml/xml.dcl
 	/l1/workshop/test02/dlxs/prep/w/workshopfa/workshopfa.inp
 	/l1/workshop/test02/dlxs/obj/w/workshopfa/workshopfa.xml
 
 cp /l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.dd
 	/l1/workshop/test02/dlxs/idx/w/workshopfa/workshopfa.prepost.dd

Top

Personal tools