Finding Aids Workshop Quick Reference

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
(Data Preparation)
(Indexing)
Line 106: Line 106:
==Indexing==
==Indexing==
 +
===Step by Step Instructions for Indexing===
 +
 +
====<span id="indexing_step1">'''Step 1: Indexing the text'''</span>====
 +
Index all the words in the file of concatenated EADs with the following command:
 +
 +
 +
cd $DLXSROOT/bin/w/workshopfa
 +
make singledd
 +
 +
====<span id="indexing_step2">'''Step 2: Indexing the the XML'''</span>====
 +
 +
Index all the elements and attributes listed in the ead DTD that occur in the file of concatenated EADs by running the following command:
 +
 +
make xml
 +
 +
After running this step, if you wish, you can see the indexed regions by issuing the following commands:
 +
xpatu $DLXSROOT/w/workshopfa/workshopfa.dd
 +
>> {ddinfo regionnames}
 +
>> quit
 +
 +
You can also test out the xpat queries in your workshopfa.extra.srch file.  See [[Testing Fabricated Regions]]
 +
 +
====<span id="indexing_step3">'''Step 3: Configuring fabricated regions'''</span>====
 +
 +
 +
* Once you have run "make xml", but before you run "make post", start up xpatu running against the newly created indexes:
 +
 +
  xpatu $DLXSROOT/idx/w/workshopfa/workshopfa.dd
 +
 +
then run the command
 +
  >> {ddinfo regionnames}
 +
 +
This will give you a list of all the XML elements, and attributes
 +
 +
====<span id="indexing_step4">'''Step 4: Indexing fabricated regions'''</span>====
 +
Index the fabricated regions specified in your workshopfa.extra.srch that occur in the file of concatenated EADs with the following command:
 +
 +
 +
make post

Revision as of 15:12, 11 July 2008

Workshop Quick Reference. _TOC_

Contents

Prepare Directories and Copy Files

Set up directories and files for Data Preparation

For more details see:Step by step instructions for setting up Directories for Data Preparation

To check your $DLXSROOT, type the following command at the command prompt:

echo $DLXSROOT
mkdir -p $DLXSROOT/prep/w/workshopfa/data
cd $DLXSROOT/prep/w/workshopfa
cp $DLXSROOT/prep/s/samplefa/data/*.xml $DLXSROOT/prep/w/workshopfa/data/.
cp $DLXSROOT/prep/s/samplefa/samplefa.ead2002.dcl   $DLXSROOT/prep/w/workshopfa/workshopfa.ead2002.dcl
cp $DLXSROOT/prep/s/samplefa/samplefa.concat.ead.dcl $DLXSROOT/prep/w/workshopfa/workshopfa.concat.ead.dcl
mkdir -p $DLXSROOT/obj/w/workshopfa
mkdir -p $DLXSROOT/bin/w/workshopfa
cp $DLXSROOT/bin/s/samplefa/preparedocs.pl $DLXSROOT/bin/w/workshopfa/preparedocs.pl
cp $DLXSROOT/bin/s/samplefa/Makefile $DLXSROOT/bin/w/workshopfa/Makefile
DLXS_TIP:

Make sure you changed your copy of the Makefile to reflect /w/workshopfa instead of /s/samplefa and that your $DLXSROOT is set correctly in the Makefile. You will want to change lines 1-3 accordingly

   1  DLXSROOT = /l1
   2  NAMEPREFIX = samplefa
   3  FIRSTLETTERSUBDIR = s

Set Up Directories and Files for XPAT Indexing

For more details see:Set Up Directories and Files for XPAT Indexing


mkdir -p $DLXSROOT/idx/w/workshopfa
cp $DLXSROOT/prep/s/samplefa/samplefa.blank.dd  $DLXSROOT/prep/w/workshopfa/workshopfa.blank.dd
cp $DLXSROOT/prep/s/samplefa/samplefa.extra.srch $DLXSROOT/prep/w/workshopfa/workshopfa.extra.srch


Both of these files need to be edited to reflect the new collection name and the paths to your particular directories.

cd $DLXSROOT/prep/w/workshopfa

Edit the files to change all samplefa and s/samplefa to workshopfa w/workshopfa

After editing the files, you can check to make sure you changed all the "samplefa" strings with the following command:

grep -l "samplefa" $DLXSROOT/prep/w/workshopfa/*

Data Preparation

Validating and Normalizing Your Data

Step 1: Validating the files individually against the EAD 2002 DTD

cd $DLXSROOT/bin/w/workshopfa
make validateeach


Check the error files by running the following commands

 ls -l $DLXSROOT/prep/w/workshopfa/data/*err

if there are any *err files, you can look at them with the following command:

 less  $DLXSROOT/prep/w/workshopfa/data/*err

Step 2: Concatentating the files into one larger XML file (and running some preprocessing commands)

cd $DLXSROOT/bin/w/workshopfa
make prepdocs


Step 3: Validating the concatenated file against the dlxsead2002 DTD

make validate

Check for errors by running the following command

 ls -l $DLXSROOT/prep/w/workshopfa/workshopfa.errors

If there is a workshopfa.errors file then run the following command to look at the errors reported

 less $DLXSROOT/prep/w/workshopfa/workshopfa.errors

Step 4: Normalizing the concatenated file

make norm

Check for normalization errors:

less $DLXSROOT/prep/w/workshopfa/workshopfa.osgmlnorm.errors

Step 5: Validating the normalized file against the dlxsead2002 DTD

make validate2

Check the resulting error file:

less $DLXSROOT/prep/w/workshopfa/workshopfa.errors2

Indexing

Step by Step Instructions for Indexing

Step 1: Indexing the text

Index all the words in the file of concatenated EADs with the following command:


cd $DLXSROOT/bin/w/workshopfa
make singledd

Step 2: Indexing the the XML

Index all the elements and attributes listed in the ead DTD that occur in the file of concatenated EADs by running the following command:

make xml

After running this step, if you wish, you can see the indexed regions by issuing the following commands:

xpatu $DLXSROOT/w/workshopfa/workshopfa.dd
>> {ddinfo regionnames}
>> quit

You can also test out the xpat queries in your workshopfa.extra.srch file. See Testing Fabricated Regions

Step 3: Configuring fabricated regions

  • Once you have run "make xml", but before you run "make post", start up xpatu running against the newly created indexes:
 xpatu $DLXSROOT/idx/w/workshopfa/workshopfa.dd

then run the command

 >> {ddinfo regionnames}

This will give you a list of all the XML elements, and attributes

Step 4: Indexing fabricated regions

Index the fabricated regions specified in your workshopfa.extra.srch that occur in the file of concatenated EADs with the following command:


make post
Personal tools