Mounting a Finding Aids Collection

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
(Overview of Data Preparation and Indexing Steps)
Current revision (21:58, 1 July 2008) (edit) (undo)
(Examples of Findaid Class Implementations and Practices)
 
(312 intermediate revisions not shown.)
Line 5: Line 5:
Workshop materials are located at http://www.dlxs.org/training/workshop200707/findaidclass/fcoutline.html
Workshop materials are located at http://www.dlxs.org/training/workshop200707/findaidclass/fcoutline.html
- 
- 
-
<p>
 
-
'''WARNING!! This page is under construction. Please use the existing documentation at http://www.dlxs.org/docs/13/index.html
 
-
until we take down this warning!
 
-
'''
 
-
</p>
 
-
 
-
----
 
- 
==Overview==
==Overview==
-
===Examples===
+
The Finding Aids Class is in many ways similar in behavior to Text Class. Access minimally includes full text searching across collections or within a particular collection of Finding Aids, viewing Finding Aids in a variety of display formats, and creation of personal collections ("bookbag") of Finding Aids.
-
===Overview of Data Preparation and Indexing Steps===
+
-
'''Data Preparation'''
+
To mount a Finding Aids Collection, you will need to complete the following steps:
-
# [[#DataPrepStep1|validating the files individually]] against the EAD ''2002'' DTD<br />'''make validateeach'''<br />
+
# [[Preparing_Data and Directories|Prepare your data and set up a directory structure]]
-
# [#DataPrepStep2 concatenating the files into one larger XML file]<br />'''make prepdocs'''<br />
+
# [[Finding_Aids_Data_Preparation#Validating_and_Normalizing_Your_Data| Validate and normalize your data]]
-
# [#DataPrepStep3 validating the concatenated file] against the ''dlxsead2002'' DTD:<br />'''make validate'''<br />
+
# [[Building the Index |Build the Index]]
-
# [#DataPrepStep4 "normalizing" the concatenated file.]<br />'''make norm'''<br />
+
# [[Mounting the Collection Online|Mount the collection online]]
-
# [#DataPrepStep5 validating the normalized concatenated file against the ''dlxsead2002'' DTD]<br />'''make validate'''<br />
+
-
The end result of these steps is a file containing the concatenated EADs wrapped in a &lt;COLL&gt; element which validates against the dlxsead2002 and is ready for indexing:
+
===[[Findaid Class Behaviors Overview]]===
-
&lt;COLL&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;1&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;2&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;3&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;/COLL&gt;
+
This section describes the basic Findaid Class behaviors.
-
+
-
'''WARNING!''' If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like:
+
===Examples of Findaid Class Implementations and Practices===
-
+
This section contains links to public implementations of DLXS Findaid Class as well as documentation on workflow and implementation issues. If you are a member of DLXS and have a collection or resource you would like to add, or wish to add more information about your collection, please edit this section.
-
&lt;COLL&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;1&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;2&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;3&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;/COLL&gt;
+
-
In this case you will get "character data not allowed" or similar errors during the make validate step. You can troubleshoot by looking at the concatenated file and/or checking your original EADs.
+
;[http://bentley.umich.edu/EAD/index.php University of Michigan, Bentley Historical Library Finding Aids]
 +
: Out-of-the-box DLXS 13 implementation.
 +
;[http://bentley.umich.edu/EAD/eadproject.php Overview of Bentley's workflow process for Finding Aids ]
 +
:See also the links in [[#Practical_EAD_Encoding_Issues | Practical EAD Encoding Issues]] for background on the Bentley EAD workflow and encoding practices
-
'''Indexing'''
+
;[http://dlc.lib.utk.edu/f/fa/ Unversity of Tennesee Special Collections Libraries]
 +
: DLXS Findaid Class version ?
-
# '''make singledd''' indexes words for texts that have been concatenated into on large file for a collection. This is the recommended process.
+
;[http://digital.library.pitt.edu/ead/ University of Pittsburgh, Historic Pittsburgh Finding Aids]
-
# '''make xml''' indexes the XML structure by reading the DTD. Validates as it indexes.
+
:DLXS Findaid Class version ?
-
# '''make post''' builds and indexes fabricated regions based on the XPAT queries stored in the workshopfa.extra.srch file.
+
;[http://digital.library.pitt.edu/ead/aboutead.html Background on Pittsburgh Finding Aids workflow]
 +
:
-
==Working with the EAD==
+
;[http://digicoll.library.wisc.edu/wiarchives University of Wisconsin, Archival Resources in Wisconsin: Descriptive Finding Aids]
-
=== EAD 2002 DTD Overview ===
+
:DLXS Findaid Class version ?
-
These instructions assume that you have already encoded your finding aids files in the XML-based [http://www.loc.gov/ead/ EAD 2002 DTD]. If you have finding aids encoded using the older EAD 1.0 standard or are using the SGML version of EAD2002, you will need to convert your files to the XML version of EAD2002. When converting from SGML to XML a number of character set issues may arise. These are pretty much the same issues that were described for text class see [../conversion/index.html Data Conversion: Unicode, XML, and Normalization] .
+
;[http://discover.lib.umn.edu/findaid University of Minnesota Libraries, Online Finding Aids]
 +
:DLXS Findaid Class version ?
-
Resources for converting from EAD 1.0 to EAD2002 and/or from SGML EAD to XML EAD are available from:
+
;[https://wiki.lib.umn.edu/Staff/FindingAidsInEAD/ EAD Implementation at the University of Minnesota]
 +
:
-
* The Society of American Archivists EAD Tools page:http://www.archivists.org/saagroups/ead/tools.html
+
;[http://archives.getty.edu:8082/cgi/f/findaid/findaid-idx?cc=utf8a;c=utf8a;tpl=browse.tpl Getty Research Institute Special Collections Finding Aids]
-
* Library of Congress EAD conversion toolshttp://lcweb2.loc.gov/music/eadmusic/eadconv12/ead2002_r.html
+
: Heavily customized DLXS11a. [http://library.pub.getty.edu:8100/DLXS06.html Background on Getty customization and user interface changes to DLXS]
-
Other good sources of information about EAD encoding practices and practical issues involved with EADs are:
+
;[http://archives.getty.edu:8082/cgi/f/findaid/findaid-idx?cc=iastaff;c=iastaff;tpl=browse.tpl J. Paul Getty Trust Institutional Archives Finding Aids]
 +
:Heavily customized DLXS11a.
-
* Library of Congress EAD page http://www.loc.gov/ead/ (This is the home of the EAD standard
+
==[[Working with the EAD]]==
-
* EAD2002 tag library http://www.loc.gov/ead/tglib/index.html
+
-
* The Society of American Archivists EAD Help page: http://www.archivists.org/saagroups/ead/
+
-
* Various EAD Best Practice Guidelines listed on the Society of American Archivists EAD essentials page: [http://www.archivists.org/saagroups/ead/ http://www.archivists.org/saagroups/ead/essentials.html] (the links to BPGs are at the bottom of the page)
+
-
* The EAD listserv http://listserv.loc.gov/listarch/ead.html
+
-
The EAD standard was designed as a ´loose¡ standard in order to accommodate the large variety in local practices for paper finding aids and make it easy for archives to convert from paper to electronic form. As a result, conformance with the EAD standard still allows a great deal of variety in encoding practices.
+
==[[Preparing Data and Directories]]==
-
The DLXS software is primarily designed as a system for mounting University of Michigan collections. In the case of finding aids, the software has been designed to accommodate the encoding practices of the Bentley Historical Library. The more similar your data and setup is to the Bentley’s, the easier is will be to integrate your finding aids collection with DLXS. If your practices differ significantly from the Bentley’s, you will probably need to do some preprocessing of your files and/or modifications to various files in DLXS. We have found that the largest number of issues in implementing Findaid Class for member institutions is dealing with differences in encoding practices. We will cover various issues that commonly arise.
+
==[[Finding Aids Data Preparation]]==
-
More information on the Bentley's encoding practices and workflow:
+
==[[Building the Index]]==
-
* Overview of Bentley's workflow process for Finding Aids http://bentley.umich.edu/EAD/eadproj.htm
+
==[[Working with Fabricated Regions in Findaid Class]]==
-
* Description of Bentley Finding Aids and their presentation on the web http://bentley.umich.edu/EAD/findaids.htm
+
-
* Bentley MS Word EAD templates and macros http://bentley.umich.edu/EAD/bhlfiles.htm
+
-
* Description of EAD tags used in Bentley EADs http://bentley.umich.edu/EAD/bhltags.htm
+
-
----
+
==[[Customizing Findaid Class]]==
-
=== Types of changes to accomodate differing encoding practices and/or interface changes ===
+
-
* Custom preprocessing
+
==[[Mounting the Collection Online]]==
-
* Add dummy EAD to data
+
-
* Modify prep scripts (Makefile, preparedocs.pl, validateeach.csh)
+
-
* Modify *inp files (DOCTYPE declarations and entities)
+
-
* Modify fabricated regions (*.extra.srch)
+
-
* Modify CollMgr entries
+
-
* Modify findaidclass.cfg (change table of contents sections)
+
-
* Subclass FindaidClass.pm
+
-
* Modify XSL
+
-
* Modify XML templates
+
-
* Modify CSS
+
-
== Practical EAD Encoding Issues ===
+
==[[Troubleshooting Finding Aids]]==
-
There are a number of encoding issues that may affect the data preparation, indexing, searching, and rendering of your finding aids. Some of them are:
+
==[[Linking from Finding Aids Using ID Resolver]]==
-
* [fc_char.html Character Encoding issues]
+
==[http://www.dlxs.org/training/workshop200707/findaidclass/fcoutline.html Workshop Materials]==
-
* [fc_ids Attribute ids must be unique within the entire collection ]
+
-
* If you use attribute ids and corresponding targets within your EADs preparedocs.pl may need to be modified.
+
-
* &lt;eadid&gt; should be less than about 20 characters in length
+
-
* UTF-8 Byte Order Marks (BOM) should be removed from EADs prior to concatenation
+
-
* XML processing instructions should be removed from EADs prior to concatenation
+
-
* Multiline DOCTYPE declarations are currently not properly handled by the data prep scripts
+
-
* If your DOCTYPE declaration contains entitys, you need to modify the appropriate *inp files accordingly
+
-
* Out-of-the-box &lt;dao&gt; handling may need to be modified for your needs
+
-
* If your &lt;unititle&gt; element precedes your &lt;origination&gt; element in <span class="unixcommand">the top level &lt;did&gt;, you will have to modify the maintitle fabricated region query in xxx.extra.srch </span>
+
-
* If you have encoded &lt;unitdate&gt;s as siblings of &lt;unittitle&gt;s, you may have to modify the appropriate XSL templates
+
-
* If you do not use a &lt;frontmatter&gt; element, you will have to make modifications to various files to provide an appropriate "Title Page" region based on the &lt;eadheader&gt;
+
-
* If your encoding practices for &lt;biohist&gt; differ from the Bentley's, you may need to make changes in findaidclass.cfg or create a subclass of FindaidClass and override FindaidClass:: GetBioghistTocHead, and/or change the appropriate XSL files.
+
-
* If you want &lt;relatedmaterial&gt;,&lt;separatedmaterial&gt; to show up in the table of contents (TOC) on the left hand side of the Finding Aids, you may have to modify findaidclass.cfg and make other modifications to the code. This also applies if there are other sections of the finding aid not listed in the out-of-the-box findaidclass.cfg %gSectHeadsHash.
+
-
* If you want the middleware to use the &lt;head&gt; element for labeling sections instead of the default hard-coded values in findaidclass.cfg, you may need to make changes to the XSL and possibly modify other files.
+
-
==[[Findaid Class Behaviors Overview]]==
+
==Working with the User Interface==
-
==Preparing Data and Directories==
+
-
===Character Issues===
+
-
===Encoding Issues===
+
-
==Validating and Normalizing Your Data==
+
-
=== Data Preparation ===
+
-
For today, we are going to be working with some texts that are already in Findaid Class. We will be building them into a collection we are going to call '''workshopfa'''.
+
General user interface customizations, such as changing rendering style (CSS) or making changes to the XSL are covered in [[Customizing the User Interface]]. Specific user-interface issues related to Findaid Class are discussed in the following sections:
 +
* [[Customizing Findaid Class]]
 +
** [[Customizing Findaid Class#Working_with_the_table_of_contents |Working with the table of contents]]
 +
* [[Working with Fabricated Regions in Findaid Class]]
 +
* [[Troubleshooting Finding Aids#Common_Problems_and_Solutions |Common Problems and Solutions]]
 +
** [[Troubleshooting Finding Aids#Title_of_Finding_Aid_does_not_show_up |Title of Finding Aid does not show up]]
-
This documentation will make use of the concept of the <span class="unixcommand">$[../overview/dirstructure.html DLXSROOT]</span>, which is the place at which your DLXS directory structure starts. We generally use <span class="unixcommand">/l1/</span>, but for the workshop, we each have our own <span class="unixcommand">$DLXSROOT</span> in the form of <span class="unixcommand">/l1/workshop/userX/dlxs/</span>. To check your <span class="unixcommand">$DLXSROOT</span>, type the following commands at the command prompt:
+
===[[Findaid Class Graphics Files]]===
 +
Are there findaid class specific graphics files? The existing html docs actually point to a ../t/text/ directory and it appears that the graphics are generic and not at all specific to findaid class.
-
<blockquote>
 
- 
-
cd $DLXSROOT<br />pwd
 
- 
-
</blockquote>
 
- 
-
The <span class="unixcommand">prep</span> directory under <span class="unixcommand">$DLXSROOT</span> is the space for you to take your encoded finding aids and "package them up" for use with the DLXS middleware. Create your basic directory <span class="unixcommand">$DLXSROOT/prep/w/workshopfa</span> and its <span class="unixcommand">data</span> subdirectory with the following command:
 
- 
-
<blockquote>
 
- 
-
mkdir -p $DLXSROOT/prep/w/workshopfa/data
 
- 
-
</blockquote>
 
- 
-
Move into the <span class="unixcommand">prep</span> directory with the following command:
 
- 
-
<blockquote>
 
- 
-
cd $DLXSROOT/prep/w/workshopfa
 
- 
-
</blockquote>
 
- 
-
This will be your staging area for all the things you will be doing to your texts, and ultimately to your collection. At present, all it contains is the <span class="unixcommand">data</span> subdirectory you created a moment ago. We will be populating it further over the course of the next two days. Unlike the contents of other collection-specific directories, everything in <span class="unixcommand">prep</span> should be ultimately expendable in the production environment.
 
- 
-
Copy the necessary files into your <span class="unixcommand">data</span> directory with the following commands:
 
- 
-
<blockquote>
 
- 
-
cp $DLXSROOT/prep/s/samplefa/data/*.xml $DLXSROOT/prep/w/workshopfa/data/.
 
- 
-
</blockquote>
 
- 
-
We'll also need a few files to get us started working. They will need to be copied over as well, and also have paths adapted and collection identifiers changed. Follow these commands:
 
- 
-
<blockquote>
 
- 
-
 
-
cp $DLXSROOT/prep/s/samplefa/validateeach.csh $DLXSROOT/prep/w/workshopfa/.
 
-
cp $DLXSROOT/prep/s/samplefa/samplefa.xml.inp $DLXSROOT/prep/w/workshopfa/workshopfa.xml.inp
 
-
cp $DLXSROOT/prep/s/samplefa/samplefa.text.inp $DLXSROOT/prep/w/workshopfa/workshopfa.text.inp
 
-
mkdir -p $DLXSROOT/obj/w/workshopfa
 
-
mkdir -p $DLXSROOT/bin/w/workshopfa
 
-
cp $DLXSROOT/bin/s/samplefa/preparedocs.pl $DLXSROOT/bin/w/workshopfa/.
 
-
cp $DLXSROOT/bin/s/samplefa/Makefile $DLXSROOT/bin/w/workshopfa/Makefile
 
- 
-
</blockquote>
 
- 
-
Now you'll need to edit these files to ensure that the paths match your <span class="unixcommand">$DLXSROOT</span> and that the collection name is ''workshopfa'' instead of ''samplefa''.
 
- 
-
''STOP!! Make sure you edit the files before going to the next steps!! ''
 
- 
-
Make sure you change these files:
 
- 
-
* $DLXSROOT/prep/w/workshopfa/validateeach.csh
 
-
* $DLXSROOT/bin/w/workshopfa/Makefile (see below for details)
 
- 
-
You can run this command to check to see if you forgot to change samplefa to workshopfa:
 
- 
-
grep "samplefa" $DLXSROOT/bin/w/workshopfa/* $DLXSROOT/prep/w/workshopfa/* |grep -v "#"
 
- 
-
With the ready-to-go ead2002 encoded finding aids files in the <span class="unixcommand">data</span> directory, we are ready to begin the preparation process. This will include:
 
- 
-
# [#DataPrepStep1 validating the files individually] against the EAD ''2002'' DTD
 
-
# [#DataPrepStep2 concatenating the files into one larger XML file]
 
-
# [#DataPrepStep3 validating the concatenated file] against the ''dlxsead2002'' DTD
 
-
# [#DataPrepStep4 "normalizing" the concatenated file.]
 
-
# [#DataPrepStep5 validating the normalized concatenated file against the ''dlxsead2002'' DTD]
 
- 
-
These steps are generally handled via the <span class="unixcommand">Makefile</span> in <span class="unixcommand">$DLXSROOT/bin/s/samplefa</span> which we have copied to $DLXSROOT/bin/w/workshopfa. To see the Makefile and how it is used, [makefile.html click here].
 
- 
-
Make sure you changed your copy of the Makefile to reflect
 
- 
-
/w/workshopfa instead of /s/samplefa. You will want to change lines 2 and 3 accordingly
 
- 
-
 
-
1
 
-
2 NAMEPREFIX = samplefa
 
-
3 FIRSTLETTERSUBDIR = s
 
- 
-
Tip: Be sure not to add any space after the workshopfa or w. The Makefile ignores space immediately before and after the equals sign but treats all other space as part of the string. I you accidentally put a space after the FIRSTLETTERSUBDIR = s , you will get an error like "[validateeach] Error 127" If you look closely at the first line of what the Makefile reported to standard output (see below) you will see that instead of running the command:
 
- 
-
/l1/workshop/tburtonw/dlxs/prep/w/workshopfa/validateeach.csh
 
- 
-
which just calls the validateeach c-shell script
 
- 
-
it tried to run a directory name: "/l1/workshop/tburtonw/dlxs/prep/w" with the argument "/workshopfa/validateeach.csh" which does not make sense
 
- 
-
% make validateeach
 
-
/l1/workshop/tburtonw/dlxs/prep/w /workshopfa/validateeach.csh
 
-
make: execvp: /l1/workshop/tburtonw/dlxs/prep/w: Permission denied
 
-
make: [validateeach] Error 127 (ignored)
 
- 
-
Further note on editing the Makefile: If you modify or write your own Make targets, you need to make sure that a real "tab" starts each command line rather than spaces. The easiest way to check for these kinds of errors is to use "cat -vet Makefile" to show all spaces, tabs and newlines.
 
- 
-
If you are doing this at your home institution you will also want to make sure you change $DLXSROOT, and the locations of the various binaries to match your installation. We will not need to do this for the workshop.
 
- 
-
''These changes do not apply for the workshop''
 
-
* Change $DLXSROOT /l1/dev/userxx to your $DLXSROOT on every line that uses it
 
-
* Change XPATBINDIR = /l/local/bin/ to the location of the <span class="unixcommand">xpat</span> binary in your installation
 
-
* Change the location of the <span class="unixcommand">osx</span> binary from
 
-
OSX = /l/local/bin/osx
 
-
to the location in your installation
 
-
* Change the location of the <span class="unixcommand">osgmlnorm</span> binary from
 
-
OSGMLNORM = /l/local/bin/osgmlnorm
 
-
to the location in your installation
 
- 
-
Tip: oxs and osgmlnorm are installed as part of the OpenSP package. If you are using linux, make sure that the OpenSP package for your version of linux is installed and make sure the paths above are changed to match your installation. If you are using Solaris you will have to install (and possibly compile) OpenSP. You may also need to make sure the $LD_LIBRARY_PATH environment variable is set so that the OpenSP programs can find the required libraries. For troubleshooting such problems the unix '''ldd''' utility is invaluble. [../troubleshooting/tools.html Information on OpenSP]
 
- 
-
----
 
- 
-
'''Step 1: Validating the files individually against the EAD 2002 DTD'''
 
- 
-
<blockquote>
 
- 
-
cd $DLXSROOT/bin/w/workshopfa
 
-
make validateeach
 
-
 
-
 
-
The Makefile runs the following command:
 
-
% /l1/workshop/userXX/dlxs/prep/w/workshopfa/validateeach.csh
 
- 
-
</blockquote>
 
- 
-
What's happening: The makefile is running the c-shell script [validateeach.html validateeach.sh] in the prep directory. The script creates a temporary file without the public DOCTYPE declaration, runs <span class="unixcommand">onsgmls</span> on each of the resulting XML files in the <span class="unixcommand">data</span> subdirectory to make sure they conform with the EAD 2002 DTD. If validation errors occur, error files will be in the <span class="unixcommand">data</span> subdirectory with the same name as the finding aids file but with an extension of <span class="unixcommand">.err</span>. If there are validation errors, fix the problems in the source XML files and re-run.
 
- 
-
Check the error files by running the following commands
 
- 
-
<blockquote>
 
- 
-
ls -l $DLXSROOT/prep/w/workshopfa/data/*err
 
- 
-
if there are any *err files, you can look at them with the following command: </blockquote><blockquote>
 
- 
-
less $DLXSROOT/prep/w/workshopfa/data/*err
 
- 
-
</blockquote>
 
- 
-
There are not likely to be any errors with the '''workshopfa''' data, but tell the instructor if there are.
 
- 
-
----
 
- 
-
'''Step 2: Concatentating the files into one larger XML file (and running some preprocessing commands) '''
 
- 
-
<blockquote>
 
- 
-
 
-
cd $DLXSROOT/bin/w/workshopfa
 
-
make prepdocs
 
- 
-
</blockquote><blockquote>
 
- 
-
The Makefile runs the following command:
 
-
$DLXSROOT/bin/w/workshopfa/preparedocs.pl $DLXSROOT/prep/w/workshopfa/data $DLXSROOT/obj/w/workshopfa/workshopfa.xml $DLXSROOT/prep/w/workshopfa/logfile.txt
 
- 
-
This runs the preparedocs.pl script on all the files in the specified data directory and writes the output to the workshopfa.xml file in the appropriate /obj subdirectory. It also outputs a logfile to the /prep directory:</blockquote>
 
- 
-
The Perl script does two sets of things:
 
- 
-
# Concatenates all the files
 
-
# Runs a number of preprocessing steps on all the files
 
- 
-
'''Concatenating the files '''
 
- 
-
The script finds all XML files in the <span class="unixcommand">data</span> subdirectory,and then strips off and xml declaration and doctype declaration from each file before concatenating them together. It also wraps the concatenated EADs in a &lt;COLL&gt; tag . The end result looks like:
 
- 
-
 
-
&lt;COLL&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;1&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;2&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;3&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;/COLL&gt;
 
-
 
- 
-
'''WARNING!''' If are extra characters or some other problem with the part of the program that strips out the xml declaration and the docytype declearation the file will end up like:
 
- 
-
 
-
&lt;COLL&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;1&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;2&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />baddata&lt;ead&gt;&lt;eadheader&gt;&lt;eadid&gt;3&lt;/eadid&gt;...&lt;/eadheader&gt;... content&lt;/ead&gt;<br />&lt;/COLL&gt;
 
- 
-
This will cause the document to be invalid since the dlxsead2002.dtd does not allow anything between the closing tag of one &lt;/ead&gt; and the opening tag of the next one &lt;ead&gt;
 
- 
-
Some of the possible causes of such a problem are:
 
- 
-
* UTF-8 Byte Order Marks at the beginning of the file
 
-
* DOCTYPE declaration on more than one line
 
-
* XML processing instructions
 
- 
-
'''Preprocessing steps'''
 
- 
-
The perl program also does some preprocessing on all the files. These steps are customized to the needs of the Bentley. You should look at the perl code and modify it so it is appropriate for your encoding practices.
 
- 
-
The preprocessing steps are:
 
- 
-
* finds all id attributes and prepends a number to them
 
-
* adds a prefix string "dao-bhl" to all DAO links (You probably will want to change this)
 
-
* removes empty <span class="unixcommand">persname</span>, <span class="unixcommand">corpname</span>, and <span class="unixcommand">famname</span> elements
 
- 
-
The output of the combined concatenation and preprocessing steps will be the one collection named xml file which is deposited into the obj subdirectory.
 
- 
-
If your collections need to be transformed in any way, or if you do not want the transformations to take place (the DAO changes, for example), edit preparedocs.pl file to effect the changes. Some changes you may want to make include:
 
- 
-
* Changing the algorithm used to make id attibute unique. For example if your encoding practices use id attributes and targets, the out-of-the-box algorithm will remove the relationship between the attributes and targets. One possible modification might be to modify the algorithm to prepend the eadid or filename to all id and target attributes.
 
-
* Modifying the program to read a list of files or list of eadids so that the files are concatenated in a particular order. The default sort order for search results is in occurance order, which translates to the order in which the eads are concatenated. If you write a script which looks at the eads for some element that you want to sort by and then outputs a list of filenames sorted by that order, you could then pass that file to a modified preparedocs.pl so it would concatenate the files in the order listed.
 
- 
-
----
 
- 
-
'''Step 3: Validating the concatenated file against the dlxsead2002 DTD'''
 
- 
-
<blockquote>
 
- 
-
make validate
 
- 
-
The Makefile runs the following command:
 
-
onsgmls -wxml -s -f $DLXSROOT/prep/w/workshopfa/workshopfa.errors $DLXSROOT/misc/sgml/xml.dcl $DLXSROOT/prep/w/workshopfa/workshopfa.xml.inp $DLXSROOT/obj/w/workshopfa/workshopfa.xml
 
- 
-
</blockquote>
 
- 
-
This runs the onsgmls command against the concatenated file using the dlxs2002dtd, and writes any errors to the workshopfa.errors file in the appropriate subdirectory in $DLXSROOT/prep/c/collection.. [validate.html More details]
 
- 
-
Note that we are running this using <span class="unixcommand">'''workshopfa.xml.inp'''</span> not <span class="unixcommand">'''workshop.text.inp'''</span>. The '''workshopfa.xml.inp '''file points to '''$DLXSROOT/misc/sgml/dlxsead2002.ead''' which is the ''dlxsead2002'' DTD. The ''dlxsead2002'' DTDis exactly the same as the ''EAD2002'' DTD, but adds a wrapping element, <span class="unixcommand">&lt;COLL&gt;</span>, to be able to combine more than one <span class="unixcommand">ead</span> element, more than one finding aid, into one file. The larger file will be indexed with XPAT tomorrow. It is, of course, a good idea to validate the file now before going further.
 
- 
-
Check for errors by looking for the file <span class="unixcommand">'''$DLXSROOT/prep/w/workshopfa/workshopfa.errors'''</span> which will be present and contain messages about what caused the file to be considered invalid if there are errors.
 
- 
-
If you see errors at this point (assuming there were no errors during the validateeach step) is that there was a problem with the preparedocs.pl processing. Some common causes of problems are:
 
- 
-
* The DOCTYPE declaration did not get completely removed. (The current scripts don't always remove multiline DOCTYPE declearations)
 
-
* There was a UTF-8 Byte Order Mark at the begginning of one or more of the concatenated files
 
- 
-
Run the following command
 
- 
-
<blockquote>
 
- 
-
ls -l $DLXSROOT/prep/w/workshopfa/workshopfa.errors
 
- 
-
</blockquote>
 
- 
-
If there is a workshopfa.errors file then run the following command to look at the errors reported
 
- 
-
<blockquote>
 
- 
-
less $DLXSROOT/prep/w/workshopfa/workshopfa.errors
 
- 
-
$ less $DLXSROOT/prep/w/workshopfa/workshopfa.errors<br /> onsgmls:/l1/dev/tburtonw/misc/sgml/xml.dcl:1:W: SGML declaration was not implied<br />
 
- 
-
The above error can be ignored, but if you see any other errors '''STOP!''' You need to determine the cause of the problem, fix it, and rerun the steps until there are no errors from make validate. If you continue with the next steps in the process with an invalid xml document, the errors will compound and it will be very difficult to trace the cause of the problem.
 
- 
-
Note: To avoid seeing this error add the "-w no-explicit-sgml-decl" flag to the Makefile on line 83. Change line 83 of the Makefile
 
- 
-
from
 
- 
-
<blockquote>
 
- 
-
onsgmls -wxml -s -f $(PREPDIR)$(NAMEPREFIX).errors $(XMLDECL) $(XMLDOCTYPE) $(XMLFILE)
 
- 
-
</blockquote> to <blockquote>
 
- 
-
onsgmls -wxml -w no-explicit-sgml-decl -s -f $(PREPDIR)$(NAMEPREFIX).errors $(XMLDECL) $(XMLDOCTYPE) $(XMLFILE)
 
- 
-
</blockquote>
 
- 
-
''This will be fixed in the next release of DLXS Findaid Class. ''
 
- 
-
'''Step 4: Normalizing the concatenated file'''
 
- 
-
<blockquote>
 
- 
-
make norm
 
- 
-
The Makefile runs a series of copy statements and two main commands:
 
- 
-
 
-
1.) /l/local/bin/osgmlnorm -f $DLXSROOT/prep/s/samplefa/samplefa.errors $DLXSROOT/misc/sgml/xml.dcl $DLXSROOT$DLXSROOT/prep/s/samplefa/samplefa.xml.inp $DLXSROOT/obj/s/samplefa/samplefa.xml.prenorm &gt; /l1/dev/tburtonw/obj/s/samplefa/samplefa.xml.postnorm
 
- 
-
2.) /l/local/bin/osx -bUTF-8 -xlower -xempty -xno-nl-in-tag -f /l1/dev/tburtonw/prep/s/samplefa/samplefa.errors /l1/dev/tburtonw/misc/sgml/xml.dcl /l1/dev/tburtonw/prep/s/samplefa/samplefa.xml.inp /l1/dev/tburtonw/obj/s/samplefa/samplefa.xml.postnorm &gt; /l1/dev/tburtonw/obj/s/samplefa/samplefa.xml.postnorm.osx
 
- 
-
</blockquote>
 
- 
-
These commands ensure that your collection data is normalized. What this means is that any attributes are put in the order in which they were defined in the DTD. Even though your collection data is XML and attribute order should be irrelevant (according to the XML specification), due to a bug in one of the supporting libraries used by xmlrgn (part of the indexing software), attributes must appear in the order that they are definded in the DTD. If you have "out-of-order" attributes and don't run make norm, you will get ''"invalid endpoints"'' errors during the make post step.
 
- 
-
Step one, which normalizes the document writes its errors to <span class="unixcommand">$DLXSROOT/prep/s/samplefa/samplefa.errors</span>. Be sure to check this file.
 
- 
-
Step 2, which runs osx to convert the normalized document back into XML produces lots of error messages which are written to standard output. These are caused because we are using an XML DTD (the EAD 2002 DTD) and osx is using it to validate against the SGML document created by the osgmlnorm step. These are the only errors which may generally be ignored. However, if the next recommended step, which is to run "make validate" again reveals an invalid document, you may want to rerun osx and look at the errors for clues. (Only do this if you are sure that the problem is not being caused by XML processing instructions in the documents as explained below)
 
- 
-
'''Step 5: Validating the normalized file against the dlxsead2002 DTD'''
 
- 
-
<blockquote>
 
- 
-
make validate
 
- 
-
</blockquote>
 
- 
-
We run this step again to make sure that the normalization process did not produce an invalid document. This is necessary because under some circumstances the "make norm" step can result in invalid XML. One known cause of this is the presense of XML processing instructions. For example: '''"&lt;?Pub Caret1?&gt;"'''. Although XML processing instructions are supposed to be ignored by any XML application that does not understand them, the problem is that when we use sgmlnorm and osx, which are SGML tools, they end up munging the output XML. The recommended workaround is to add a preprocessing step to remove any XML processing instructions from your EADs before you run "make prepdocs", or to include some code in preparedocs.pl that will strip out XML priocessing instructions prior to concatenating the EADs.
 
- 
-
==Building the Index==
 
-
==Working with Fabricated Regions==
 
-
==Modifying Findaid Class Files==
 
-
==Mounting the Collection Online==
 
-
==Troubleshooting==
 
-
==Linking from Finding Aids Using ID Resolver==
 
-
==Workshop materials==
 
-
==Working with the User Interface==
 
-
===[[Findaid Class Graphics Files]]===
 
===[[Findaid Class Processing Instructions]]===
===[[Findaid Class Processing Instructions]]===
-
 
+
These are some current processing instructions for Finding Aids Class, but the DLXS group will not maintain this section.
-
 
+
-
 
+
[[#top|Top]]
[[#top|Top]]

Current revision

Main Page > Mounting Collections: Class-specific Steps > Mounting a Finding Aids Collection


This topic describes how to mount a Findaid Class collection.

Workshop materials are located at http://www.dlxs.org/training/workshop200707/findaidclass/fcoutline.html

Contents

[edit] Overview

The Finding Aids Class is in many ways similar in behavior to Text Class. Access minimally includes full text searching across collections or within a particular collection of Finding Aids, viewing Finding Aids in a variety of display formats, and creation of personal collections ("bookbag") of Finding Aids.

To mount a Finding Aids Collection, you will need to complete the following steps:

  1. Prepare your data and set up a directory structure
  2. Validate and normalize your data
  3. Build the Index
  4. Mount the collection online

[edit] Findaid Class Behaviors Overview

This section describes the basic Findaid Class behaviors.

[edit] Examples of Findaid Class Implementations and Practices

This section contains links to public implementations of DLXS Findaid Class as well as documentation on workflow and implementation issues. If you are a member of DLXS and have a collection or resource you would like to add, or wish to add more information about your collection, please edit this section.

University of Michigan, Bentley Historical Library Finding Aids
Out-of-the-box DLXS 13 implementation.
Overview of Bentley's workflow process for Finding Aids
See also the links in Practical EAD Encoding Issues for background on the Bentley EAD workflow and encoding practices
Unversity of Tennesee Special Collections Libraries
DLXS Findaid Class version ?
University of Pittsburgh, Historic Pittsburgh Finding Aids
DLXS Findaid Class version ?
Background on Pittsburgh Finding Aids workflow
University of Wisconsin, Archival Resources in Wisconsin: Descriptive Finding Aids
DLXS Findaid Class version ?
University of Minnesota Libraries, Online Finding Aids
DLXS Findaid Class version ?
EAD Implementation at the University of Minnesota
Getty Research Institute Special Collections Finding Aids
Heavily customized DLXS11a. Background on Getty customization and user interface changes to DLXS
J. Paul Getty Trust Institutional Archives Finding Aids
Heavily customized DLXS11a.

[edit] Working with the EAD

[edit] Preparing Data and Directories

[edit] Finding Aids Data Preparation

[edit] Building the Index

[edit] Working with Fabricated Regions in Findaid Class

[edit] Customizing Findaid Class

[edit] Mounting the Collection Online

[edit] Troubleshooting Finding Aids

[edit] Linking from Finding Aids Using ID Resolver

[edit] Workshop Materials

[edit] Working with the User Interface

General user interface customizations, such as changing rendering style (CSS) or making changes to the XSL are covered in Customizing the User Interface. Specific user-interface issues related to Findaid Class are discussed in the following sections:

[edit] Findaid Class Graphics Files

Are there findaid class specific graphics files? The existing html docs actually point to a ../t/text/ directory and it appears that the graphics are generic and not at all specific to findaid class.

[edit] Findaid Class Processing Instructions

These are some current processing instructions for Finding Aids Class, but the DLXS group will not maintain this section.

Top

Personal tools