Working with Text Class Data and Directories

From DLXS Documentation

Revision as of 13:57, 28 September 2007 by Cboulay (Talk | contribs)
Jump to: navigation, search

Main Page > Mounting Collections: Class-specific Steps > Mounting a Text Class Collection > Working with Text Class Data and Directories

Setting up directories

You will need to identify directories where you plan to store your source files, your converted and concatenated Text Class XML file, your index file (approximately 75% of the size of your SGML source), your "region" files and other information such as data dictionaries, and files you use to prepare your data. We recommend you use the following structure:

  • Store specialized scripts for your collection and its Makefile in $DLXSROOT/bin/c/collid/where $DLXSROOT is the "tree" where you install all DLXS components, c is the first letter of the name of the collection you are indexing, and collid is the collection ID of the collection you are indexing. For example, if your collection ID is "moa" and your DLXSROOT is "/l1", you will place the Makefile in /l1/bin/m/moa/, e.g., /l1/bin/m/moa/Makefile. See directory conventions for more information.
  • Store your source texts and any DTDs, doctype, and files for preparing your data in $DLXSROOT/prep/c/collid/. Unlike the contents of other directories, everything in prep should be ultimately expendable in the production environment.
  • Store the finalized, concatenated Text Class XML file for your text collection in $DLXSROOT/obj/c/collid/ , e.g., /l1/obj/m/moa/moa.xml.
  • Store index, region, data dictionary, and init files in $DLXSROOT/idx/c/collid/, e.g., /l1/idx/m/moa/moa.idx. See the XPAT documentation for more on these types of files.

The files that are located in $DLXSROOT/bin/s/sampletc_utf8 and$DLXSROOT/prep/s/sampletc_utf8 should be copied into your collection directories and used to index your collection. The following files may need to be editted so that the #! points to your location of perl:

  • $DLXSROOT/bin/t/text/isolat128bit.pl
  • $DLXSROOT/bin/t/text/output.dd.frag.pl
  • $DLXSROOT/bin/t/text/inc.extra.dd.pl

The following files will need to be edited to reflect your collection names and paths:

  • $DLXSROOT/bin/s/sampletc_utf8/Makefile
  • $DLXSROOT/prep/s/sampletc_utf8/sampletc_utf8.blank.dd
  • $DLXSROOT/prep/s/sampletc_utf8/sampletc_utf8.extra.srch
  • $DLXSROOT/prep/s/sampletc_utf8/sampletc_utf8.inp

Top

Personal tools