Working with XPAT

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (16:49, 4 March 2009) (edit) (undo)
 
(One intermediate revision not shown.)
Line 4: Line 4:
<p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text.  The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text.  These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.)  These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems.  Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p>
<p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text.  The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text.  These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.)  These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems.  Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p>
-
       <ul>[[XPAT Manual Overview | Overview]]
+
       <ul>
-
The XPAT manual is still located in the HTML version of the DLXS documentation. [http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/manual.html You can find it here.]
+
        <li>[[Full XPAT Manual]]</li>
         <li>[[XPAT command, syntax, and concept guide]]</li>
         <li>[[XPAT command, syntax, and concept guide]]</li>
-
 
         <li>[[XPAT FAQ]]</li>
         <li>[[XPAT FAQ]]</li>
       </ul>
       </ul>

Current revision

Main Page > Working with XPAT

The XPAT search engine is a licensed software program, made available through the University of Michigan's Digital Library eXtension Service. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.

XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.

The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.

Top

Personal tools