Creating Text Class Wordwheels

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search

Revision as of 10:09, 30 September 2007

Main Page > Mounting Collections: Class-specific Steps > Mounting a Text Class Collection > Creating Text Class Wordwheels


General Information

The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection. The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.

In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the wwdd field, usually containing /idx/c/collid/WW/collid.ww.dd) and the wwrealms and wwrealmseng fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").

Building the Word Wheel

In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel. The SGML file that this process creates will be created and stored in the directory /l1/prep/c/collid/WW which must exist before you begin running the scripts. The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in /l1/obj/c/collid, and the indexes and data dictionary will be stored in /l1/idx/c/collid/WW/.

  1. Copy $DLXSROOT/bin/WW/sample.ww.blank.dd to $DLXSROOT/idx/c/collid/WW/collid.ww.blank.dd and edit it to reflect the name of your collection.
  2. Copy $DLXSROOT/bin/WW/sample.ww.inp to $DLXSROOT/idx/c/collid/WW/collid.ww.inp and
           eidt it to add or point to any character entity declarations
           
    
    necessary for this collection
  3. Copy $DLXSROOT/bin/WW/Makefile to $DLXSROOT/idx/c/collid/idx/WW/Makefile and
           edit.
           
    
  4. Copy $DLXSROOT/bin/WWmakeWordWheelFiles.sample.cfg to
           $DLXSROOT/idx/c/collid/idx/WW/makeWordWheelFiles.cfg and edit to point to the proper directories.
           
    
  5. cd to $DLXSROOT/idx/c/collid/WW and run:
    % $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfg
    
            
    This will create collid.ww.unnorm.sgm in $DLXSROOT/prep/c/collid/WW


  6. collid.ww.unnorm is then normalized (in $DLXSROOT/obj/c/collid) and indexed by the Makefile, thereby creating a XPAT indexed wordwheel file for your collection.

Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can consist of one or more .sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).

     Note 2: The configuration (.cfg) file can specify an array of dd files for
     collections that have multiple indexes.
     
     Currently these two mechanisms are mutually exclusive.  Either a
     single collection can have multiple .dd files or a collection
     of multiple .sgm files will have a single index.

Top

Personal tools