Mounting a Finding Aids Collection
From DLXS Documentation
(→Overview) |
(→Overview of Data Preparation and Indexing Steps) |
||
Line 9: | Line 9: | ||
===Examples=== | ===Examples=== | ||
===Overview of Data Preparation and Indexing Steps=== | ===Overview of Data Preparation and Indexing Steps=== | ||
+ | |||
+ | '''Data Preparation''' | ||
+ | |||
+ | # [#DataPrepStep1 validating the files individually] against the EAD ''2002'' DTD<br />'''make validateeach'''<br /> | ||
+ | # [#DataPrepStep2 concatenating the files into one larger XML file]<br />'''make prepdocs'''<br /> | ||
+ | # [#DataPrepStep3 validating the concatenated file] against the ''dlxsead2002'' DTD:<br />'''make validate'''<br /> | ||
+ | # [#DataPrepStep4 "normalizing" the concatenated file.]<br />'''make norm'''<br /> | ||
+ | # [#DataPrepStep5 validating the normalized concatenated file against the ''dlxsead2002'' DTD]<br />'''make validate'''<br /> | ||
+ | |||
+ | The end result of these steps is a file containing the concatenated EADs wrapped in a <COLL> element which validates against the dlxsead2002 and is ready for indexing: | ||
+ | |||
+ | <COLL><br /><ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead><br /><ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead><br /><ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead><br /></COLL> | ||
+ | |||
+ | |||
+ | '''WARNING!''' If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like: | ||
+ | |||
+ | |||
+ | <COLL><br />baddata<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead><br />baddata<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead><br />baddata<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead><br /></COLL> | ||
+ | |||
+ | In this case you will get "character data not allowed" or similar errors during the make validate step. You can troubleshoot by looking at the concatenated file and/or checking your original EADs. | ||
+ | |||
+ | '''Indexing''' | ||
+ | |||
+ | # '''make singledd''' indexes words for texts that have been concatenated into on large file for a collection. This is the recommended process. | ||
+ | # '''make xml''' indexes the XML structure by reading the DTD. Validates as it indexes. | ||
+ | # '''make post''' builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file. | ||
==Working with the EAD== | ==Working with the EAD== |
Revision as of 15:21, 14 August 2007
Main Page > Mounting Collections: Class-specific Steps > Mounting a Finding Aids Collection
This topic describes how to mount a Findaid Class collection.
Workshop materials are located here.
Overview
Examples
Overview of Data Preparation and Indexing Steps
Data Preparation
- [#DataPrepStep1 validating the files individually] against the EAD 2002 DTD
make validateeach
- [#DataPrepStep2 concatenating the files into one larger XML file]
make prepdocs
- [#DataPrepStep3 validating the concatenated file] against the dlxsead2002 DTD:
make validate
- [#DataPrepStep4 "normalizing" the concatenated file.]
make norm
- [#DataPrepStep5 validating the normalized concatenated file against the dlxsead2002 DTD]
make validate
The end result of these steps is a file containing the concatenated EADs wrapped in a <COLL> element which validates against the dlxsead2002 and is ready for indexing:
<COLL>
<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>
WARNING! If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like:
<COLL>
baddata<ead><eadheader><eadid>1</eadid>...</eadheader>... content</ead>
baddata<ead><eadheader><eadid>2</eadid>...</eadheader>... content</ead>
baddata<ead><eadheader><eadid>3</eadid>...</eadheader>... content</ead>
</COLL>
In this case you will get "character data not allowed" or similar errors during the make validate step. You can troubleshoot by looking at the concatenated file and/or checking your original EADs.
Indexing
- make singledd indexes words for texts that have been concatenated into on large file for a collection. This is the recommended process.
- make xml indexes the XML structure by reading the DTD. Validates as it indexes.
- make post builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file.