Working with XPAT
From DLXS Documentation
(2 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
+ | [[DLXS Wiki|Main Page]] > Working with XPAT | ||
+ | |||
<p>The XPAT search engine is a [http://www.dlxs.org/forms/DLXS-contract.pdf licensed software program], made available through the University of Michigan's [http://www.dlxs.org/ Digital Library eXtension Service]. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.</p> | <p>The XPAT search engine is a [http://www.dlxs.org/forms/DLXS-contract.pdf licensed software program], made available through the University of Michigan's [http://www.dlxs.org/ Digital Library eXtension Service]. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.</p> | ||
<p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p> | <p>XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see <i>Information Retrieval: Data Structures and Algorithms</i>, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.</p> | ||
<ul> | <ul> | ||
- | <li> | + | <li>XPAT Manual (currently incomplete online; DLXS customers are provided a printed volume of documentation to PAT 5.0, the precursor to XPAT).</li> |
- | <li>The XPAT command, syntax, and concept guide is provided online.</li> | + | The XPAT manual is still located in the HTML version of the DLXS documentation. [http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/manual.html You can find it here.] |
+ | <li>[http://dev-linux.umdl.umich.edu/d/dlxs/docs/13/xpat/commands.html The XPAT command, syntax, and concept guide is provided online.]</li> | ||
<li>[[XPAT FAQ]]</li> | <li>[[XPAT FAQ]]</li> | ||
</ul> | </ul> | ||
<p>The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.</p> | <p>The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.</p> | ||
+ | |||
+ | [[#top|Top]] |
Current revision
Main Page > Working with XPAT
The XPAT search engine is a licensed software program, made available through the University of Michigan's Digital Library eXtension Service. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.
XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.
- XPAT Manual (currently incomplete online; DLXS customers are provided a printed volume of documentation to PAT 5.0, the precursor to XPAT).
- The XPAT command, syntax, and concept guide is provided online.
- XPAT FAQ
The XPAT manual is still located in the HTML version of the DLXS documentation. You can find it here.
The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.