Creating Text Class Wordwheels

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (09:04, 7 June 2011) (edit) (undo)
 
(One intermediate revision not shown.)
Line 2: Line 2:
<hr>
<hr>
      
      
-
     <h1>General Information</h1><p>The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection.  The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.  </p><p>In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the <tt>wwdd</tt> field, usually containing <tt>/idx/c/collid/WW/collid.ww.dd</tt>) and the <tt>wwrealms</tt> and <tt>wwrealmseng</tt> fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").</p><h1>Building the Word Wheel</h1><p> In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel.  The SGML file that this process creates will be created and stored in the directory <tt>/l1/prep/c/collid/WW</tt> which <b>must</b> exist before you begin running the scripts.
+
     <h1>General Information</h1>
 +
<p>The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection.  The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.  </p>
 +
 
 +
<p>In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the <tt>wwdd</tt> field, usually containing <tt>/idx/c/collid/WW/collid.ww.dd</tt>) and the <tt>wwrealms</tt> and <tt>wwrealmseng</tt> fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").</p>
 +
 
 +
<h1>Building the Word Wheel</h1><p> In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel.  The SGML file that this process creates will be created and stored in the directory <tt>/l1/prep/c/collid/WW</tt> which <b>must</b> exist before you begin running the scripts.
        
        
-
       The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in <tt>/l1/obj/c/collid</tt>, and the indexes and data dictionary will be stored in <tt> /l1/idx/c/collid/WW/</tt>.</p><ol><li>Copy <tt>$DLXSROOT/bin/WW/sample.ww.blank.dd</tt> to
+
       The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in <tt>/l1/obj/c/collid</tt>, and the indexes and data dictionary will be stored in <tt> /l1/idx/c/collid/WW/</tt>.</p>
-
         <tt>$DLXSROOT/idx/c/collid/WW/collid.ww.blank.dd</tt> and edit it to reflect the name of your collection.</li><li>Copy <tt>$DLXSROOT/bin/WW/sample.ww.inp</tt> to <tt>$DLXSROOT/idx/c/collid/WW/collid.ww.inp</tt> and
+
 
-
        eidt it to add or point to any character entity declarations
+
<ol>
-
       
+
<li>Copy <tt>$DLXSROOT/bin/WW/sample.ww.blank.dd</tt> to
-
        necessary for this collection</li><li> Copy <tt>$DLXSROOT/bin/WW/Makefile</tt> to <tt>$DLXSROOT/idx/c/collid/idx/WW/Makefile</tt> and
+
         <tt>$DLXSROOT/idx/c/collid/WW/collid.ww.blank.dd</tt> and edit it to reflect the name of your collection.</li>
-
         edit.
+
 
-
       
+
<li> Copy <tt>$DLXSROOT/bin/WW/Makefile</tt> to <tt>$DLXSROOT/idx/c/collid/idx/WW/Makefile</tt> and edit.</li>
-
      </li><li> Copy <tt>$DLXSROOT/bin/WWmakeWordWheelFiles.sample.cfg</tt> to
+
 
-
        <tt>$DLXSROOT/idx/c/collid/idx/WW/makeWordWheelFiles.cfg</tt> and edit to point to the proper directories.
+
<li> cd to <tt>$DLXSROOT/idx/c/collid/WW</tt> and run:
-
       
+
         <pre>% $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfg</pre>This will create collid.ww.unnorm.sgm in <tt>$DLXSROOT/prep/c/collid/WW</tt></li>
-
      </li><li> cd to <tt>$DLXSROOT/idx/c/collid/WW</tt> and run:
+
 
-
         <pre>% $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfg
+
<li>collid.ww.unnorm is then normalized (in <tt>$DLXSROOT/obj/c/collid)</tt> and indexed by the
 +
         Makefile, thereby creating a XPAT indexed wordwheel file for your collection.</li></ol>
-
        </pre>This will create collid.ww.unnorm.sgm in <tt>$DLXSROOT/prep/c/collid/WW</tt>
+
<p>Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can
 +
      consist of <b>one or more  </b>.sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).</p>
-
       
+
<p>
-
      </li><li>collid.ww.unnorm is then normalized (in <tt>$DLXSROOT/obj/c/collid)</tt> and indexed by the
+
-
        Makefile, thereby creating a XPAT indexed wordwheel file for your collection.</li></ol><p>Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can
+
-
      consist of <b>one or more  </b>.sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).</p><p>
+
        
        
       Note 2: The configuration (.cfg) file can specify an array of dd files for
       Note 2: The configuration (.cfg) file can specify an array of dd files for

Current revision

Main Page > Mounting Collections: Class-specific Steps > Mounting a Text Class Collection > Creating Text Class Wordwheels


General Information

The word wheel tools extract each word in a text and build it into a small SGML file, along with a count of the number of times the word appears in a given collection. The SGML file is then normalized and indexed, ready to be used by the Text Class middleware.

In order to make word wheels available for your collection, you must both build the word wheel and fill in the appropriate fields in the collection manager, indicating the location of the index (the wwdd field, usually containing /idx/c/collid/WW/collid.ww.dd) and the wwrealms and wwrealmseng fields, which identify the fields available (e.g., full text, author, title) and indicate how they should appear in the interface (e.g., perhaps as "Full Text" or "all the words" or some other variation on "full text").

Building the Word Wheel

In the DLXS release, there are files in the directory $DLXSROOT/bin/WW to help you build the word wheel. The SGML file that this process creates will be created and stored in the directory /l1/prep/c/collid/WW which must exist before you begin running the scripts. The normalized SGML that results from running all the word wheel creation steps will be stored with the collection SGML in /l1/obj/c/collid, and the indexes and data dictionary will be stored in /l1/idx/c/collid/WW/.

  1. Copy $DLXSROOT/bin/WW/sample.ww.blank.dd to $DLXSROOT/idx/c/collid/WW/collid.ww.blank.dd and edit it to reflect the name of your collection.
  2. Copy $DLXSROOT/bin/WW/Makefile to $DLXSROOT/idx/c/collid/idx/WW/Makefile and edit.
  3. cd to $DLXSROOT/idx/c/collid/WW and run:
    % $DLXSROOT/bin/WW/makeWordWheelFiles.pl makeWordWheelFiles.cfg
    This will create collid.ww.unnorm.sgm in $DLXSROOT/prep/c/collid/WW
  4. collid.ww.unnorm is then normalized (in $DLXSROOT/obj/c/collid) and indexed by the Makefile, thereby creating a XPAT indexed wordwheel file for your collection.

Note 1: Input to makeWordWheelFiles.pl as specified in the .cfg can consist of one or more .sgm files, e.g., collections indexed with a single file or collections indexed through multi-file system indexing (MFS).

Note 2: The configuration (.cfg) file can specify an array of dd files for collections that have multiple indexes. Currently these two mechanisms are mutually exclusive. Either a single collection can have multiple .dd files or a collection of multiple .sgm files will have a single index.

Top

Personal tools