Working with XPAT

From DLXS Documentation

Jump to: navigation, search

Main Page > Working with XPAT

The XPAT search engine is a licensed software program, made available through the University of Michigan's Digital Library eXtension Service. It is based on the Open Text Corporation's pat50 source code, but has been enhanced by the Digital Library Production Service for DLXS.

XPAT is ideally suited to digital library applications, especially those involving large amounts of text. The indexes XPAT builds are based on a structure called a Patricia Tree, which analyzes substrings of the entire text. These substrings are also known as semi-infinite strings or sisstrings. (For more information on these, see Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. S. Baeza-Yates, 1992.) These semi-infinite strings start at some offset in the entire string of the text file and stretch to the end of the text file (implying that they always overlap with each other), a feature that helps to provide excellent string and phrase searching, which we believe is critical for digital library systems. Another important feature of XPAT is its ability to index SGML and XML elements, attributes, and tags, giving it the ability to support complex searches that reach into "regions" of text (and even nested regions) based on the markup elements, thus further aiding in full text retrieval.

The University of Michigan Digital Library Production Service is committed to continuing to fix bugs and add features to the XPAT search engine. DLXS Release 11 includes a major addition to support Unicode indexing.


Personal tools