Working with XPAT

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (21:18, 13 August 2007) (edit) (undo)
 
Line 5: Line 5:
<p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p>
<p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p>
<ul>
<ul>
-
<li>[[XPAT Manual]] (currently incomplete online; DLXS customers are provided a printed volume of documentation to PAT 5.0, the precursor to XPAT).</li>
+
<li>XPAT Manual (currently incomplete online; DLXS customers are provided a printed volume of documentation to PAT 5.0, the precursor to XPAT).</li>
 +
The XPAT manual is still located in the HTML version of the DLXS documentation. [http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/manual.html You can find it here.]
<li>[http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/commands.html The XPAT command, syntax, and concept guide is provided online.]</li>
<li>[http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/commands.html The XPAT command, syntax, and concept guide is provided online.]</li>

Current revision

Main Page > Working with XPAT

The XPAT search engine is a licensed software program, made available through the University of Michigan's Digital Library eXtension Service. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.

XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.

The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.

Top

Personal tools