Guide to Searching XPAT

From DLXS Documentation

Revision as of 15:35, 4 March 2009 by Pfarber (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Main Page > Working with XPAT > Full XPAT Manual > Guide to Searching XPAT

Contents

[edit] Introducing DLXS XPAT

DLXS XPAT is a tool for searching text. You type a prefix, word, or phrase; DLXS XPAT tells you how many times it appears in the text. You can then print all or some fraction of the matches, each match surrounded by some context. You can find prefixes, words or phrases that appear near other specified text, or find those that appear most often. Searches can also be restricted to certain parts of the text. Whoever installs DLXS XPAT to work with a particular document defines what characters signal the beginning of a prefix, word, or phrase; what characters (if any) should be interpreted by DLXS XPAT as other characters (upper case characters to lower, for example); and whether DLXS XPAT should ignore certain characters completely (common words, for example).

[edit] How to Use This Manual

This manual was written to help you get started using DLXS XPAT and to serve as a reference guide once you become more familiar with the program. The chart on the facing page outlines where you can go in the manual to get the type of information you need. For example, if you are a new user of DLXS XPAT and are in a hurry to get started, you may want to turn to the section "Trying Out Commands".

This manual was not written to help you install or set DLXS XPAT up on your system. For installation information, refer to the DLXS XPAT Installation Guide.

Some particulars of DLXS XPAT will vary from system to system, such as whether upper case characters are translated to lower case, or what DLXS XPAT considers a word. The section "How DLXS XPAT Searches" outlines some of the possibilities.

The main texts used for examples are the first edition of the Oxford English Dictionary and Sir Arthur Conan Doyle's Hound of the Baskervilles. Although your understanding and use of DLXS XPAT will be strongly influenced by the text you are searching, these two texts will show the characteristics of searching a straightforward text structure like that of Hound of the Baskervilles as compared with the highly complex text structure found in The Oxford English Dictionary.

This manual is divided into two-page spreads where the right page presents the textual or explanatory information and the left page presents the visual or exemplary information. DLXS XPAT commands that appear on the right page (in bold) are more fully developed in a real example on the left. Technical or unfamiliar uses of words appear in italics the first time they are used and are fully defined in the glossary.

[edit] What is DLXS XPAT?

Prefix
In the linguistic sense, a prefix is an element placed before a word or stem to form another word. In this manual, the term is used to describe any character(s) (letters, numbers, or special characters) that start a word regardless of whether they form a grammatical prefix.
Word
In the linguistic sense, a word is a meaningful element surrounded by a space on either side. In this manual, the term also includes elements that are not necessarily surrounded by a space on either side (a word that appears flush beside a tag, as in <p>The, for example). What DLXS XPAT considers a word is defined at the time of index building.
Phrase
In the linguistic sense, a phrase is a group of words that plays a particular role in a sentence. In this manual, phrase refers to any sequence of characters containing more than one word. See Word.
Tags
Tags are often used to label the start and end of an element in the text for ease of retrieval. For example, the tag <Q> identifies the start of a quotation and the tag </Q> identifies its end. Tags represent descriptive markup when they describe what the text is rather than how it should be formatted.


<paper><title>________________________________________</title>

<author>_____________________________________________</author>
<abstract>____________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
___________________________________________________</abstract>
<body>________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
______________________________________________________________
_______________________________________________________</body>
</paper>

DLXS XPAT is a tool for searching text. You can use DLXS XPAT to search for prefixes, words, or phrases. DLXS XPAT can search for a lengthy phrase as quickly as for a single word. DLXS XPAT can also search for text that appears near other specified text and for text that appears frequently.

Using DLXS XPAT, you can search through the entire text or only a portion. For example, a search through a collection of technical papers can be restricted to only the abstracts. Restricting searches to selected components of text is easiest when the text contains tags that mark the start and end of each component (<abstract> and </abstract>, for example). A collection of technical papers that has its titles, authors, abstracts, and content marked by different tags can be easily segmented into portions for searching (see facing page).

DLXS XPAT can be used to search a variety of texts, everything from dictionaries, encyclopedias, or literary works to business texts such as insurance policies or contracts. The following examples show how DLXS XPAT can be used to satisfy a variety of search requests.

  • A student searches an encyclopedia for all entries that contain the word revolution near the word french or France.
  • A statistician searches a collection of insurance files to find the most popular types of policies.
  • A Shakespearean scholar finds the most frequently cited work by Shakespeare in a collection of literary journals.
  • A computer scientist searches through a collection of technical papers, retrieving the titles of the papers written by a certain author.
  • A linguist searches a dictionary for the number of English words that have Old Norman French in their etymology and are still in current use.
  • A lexicographer creates a concordance of a text consisting of all words that begin with the letter j.
  • A literary scholar searches for long repeated phrases of text in a work, aiming to discover the author's favourite expressions.

[edit] Starting and Leaving DLXS XPAT

Prompt
A prompt is a symbol that appears on the screen when the computer is ready to accept commands. Prompts vary from system to system. The prompt within DLXS XPAT looks like >>.


% xpat research
  Digital Library eXtension Service, XPAT, Release 5.2
  COPYRIGHT (c) 2000 The Regents of the University of Michigan
  All Rights Reserved

>>

Leaving DLXS XPAT


>> stop
used 0.18 cpu seconds
%

Let's assume someone has installed the text file you want to search and the DLXS XPAT program on your system according to the procedures outlined in the DLXS XPAT Installation Guide.

To start DLXS XPAT, type pat, followed by the name of the file that contains the text you want to search. For example, if research is the name of your text file, the command is:

pat research

The >> prompt appears once you have successfully started DLXS XPAT. This means DLXS XPAT is ready for your first search instruction. After each instruction to DLXS XPAT, press the Enter or Return key.

To interrupt DLXS XPAT in the middle of a search or the printing of results, press the key or sequence of keys that normally performs an interrupt function on your keyboard (commonly the Delete key or a Control c key sequence).

When you are ready to leave DLXS XPAT, type stop, quit, or done after the prompt:

>> stop

A message displays how much computer time you used in the DLXS XPAT session.

[edit] Your First Search

Match point
The match point is the first letter of the matched text. If you search for car, for example, the match point is c. The match point normally appears as the 15th character printed as the result of a pr command (not including the initial ..). It serves as a reference point for the match. For example, the position of the match point (the number that appears at the beginning of a line of context) can be used to retrieve the match.
Pattern
A pattern is the text that you ask DLXS XPAT to search for. DLXS XPAT tries to find the same sequence of characters in the text. If it does, a match results.
Set number
A set number identifies a specific set of results. Each time you do a search, DLXS XPAT assigns the results a set number. You can use this number to later reference this set. The history command gives you a list of all your previous sets and their numbers.
Character
A character includes anything that can be entered on a keyboard. This includes letters, numbers, punctuation marks, spaces and special characters such as percent signs, hash marks, etc.


>> tall
  1: 9 matches

Printing


>> pr
    95641, .., and saw the tall, austere figure of Holmes standing motionless..
   170296, ..limpse of the tall, black-bearded figure, his shoulders rounded,..
   106780, ..-looking man, tall, handsome, with a square black beard and pale..
   104943, .. Hall!" <p> A tall man had stepped from the shadow of the porch ..
   128721, .., elegant and tall. She had a proud, finely cut face, so regular..
   185866, ..was that of a tall, thin man. He stood with his legs a little se..
     9050, ..He was a very tall, thin man, with a long nose like a beak, whic..
   186167, ..he was a much taller man. With a cry of surprise I pointed him o..
   191405, ..igure was far taller than that of Stapleton, far thinner than th..

The first time you use DLXS XPAT, the screen conventions will be unfamiliar. The facing page labels the important information on the screen.

The >> prompt means DLXS XPAT is ready to accept a command. Typing a prefix, word, phrase, number or other text after the prompt and pressing the Enter or Return key starts the search, for example:

>> tall

What you type is often referred to as a search pattern or pattern for short.

After you enter a pattern, DLXS XPAT displays a line like the following:

1: 9 matches

The number 1 is called the set number. It names a set of results with a number so you can use it in further searches. Following the set number is the result of the search. The number 9 is the number of times the pattern tall appears in the example text.

The pr command (short for print) shows one line of context around each occurrence of the search pattern, for example:


95641, .., and saw the tall, austere figure of Holmes standing motionless..

The number in front stands for the position of the first character of the match (referred to as the match point). In the example, the letter t in tall is the 95,641st character in the text.

For each match, DLXS XPAT prints two periods followed by 64 characters (14 to the left of the match point and 49 to the right) followed by two more periods. Note that spaces and punctuation as well as letters, numbers, and other symbols count as characters.

[edit] Trying Out Commands

>> "hound "
  1: 70 matches

>> "my dear Watson "
  2: 10 matches

>> "murder"
  3: 13 matches

>> pr
   317137, ..ch in proving murder against our man. There seemed to be no alte..
   272128, ..ase as one of murder, and the evidence may implicate not only yo..
   242073, ..d, deliberate murder. Do not ask me for particulars. My nets are..
   318579, .. accessory to murder. She was ready to warn Sir Henry so far as ..
   242021, ..red -- "It is murder, Watson -- refined, cold-blooded, deliberat..
   300400, ..the charge of murder which hung over her in connection with the ..
   247073, ..hew have been murdered -- the one frightened to death by the ver..
   305433, ..t to the real murderer. <p> "Having conceived the idea, he proce..
   101334, .. Notting Hill murderer." <p> I remembered the case well, for it ..
   309111, ..inst the real murderer. His only accomplice was one who could ne..
   239840, ..ng face and a murderous heart. <p> "It is he, then, who is our e..
   283319, .. two men, the murderous host and the unconscious guest, still ch..
   276256, .. the Anderson murders in North Carolina, but this case possesses..


>> pr 2
     6043, ..nto thin air, my dear Watson, and there emerges a young fellow u..
    69249, ..number!" <p> "My dear Watson, clumsy as I have been, you surely ..
    66383, ..or the world, my dear Watson. I am perfectly satisfied with your..
     4106, .. <p> "No, no, my dear Watson, not all -- by no means all. I woul..
   232005, ..vely evening, my dear Watson," said a well-known voice. "I reall..
     3733, .."I am afraid, my dear Watson, that most of your conclusions were..
    55102, ..<p> "And yet, my dear Watson, there is so very close a connectio..
   321800, ..lty. And now, my dear Watson, we have had some weeks of severe w..
   320210, ..tly. And now, my dear Watson, without referring to my notes, I c..
   256028, ..t once?" <p> "My dear Watson, you were born to be a man of actio..

To practise some of the basics of DLXS XPAT and become familiar with the search terminology, try searches similar to those that follow. If you run into difficulties, consult the error charts in Appendix A.

Type a word to find the number of times it appears in the text. Make sure you put it in quotation marks and leave a space after it:

>> "hound  "

Similarly, to find the number of times a phrase appears in the text, put the phrase in quotation marks and leave a space after the final word:

>> "my dear Watson  "

To look up words that start a certain way, type the pattern with or without quotation marks. The following finds the number of times the word murder and words starting with murder appear:

>> "murder"

To print a line of context for every occurrence of the previous pattern, type the following:

>> pr

To print results for a set other than the previous, give the set number:

>> pr 2

This prints the results of the second search.


>> "upon the moor "
  4: 42 matches

>> pr sample.5
   182067, ..it were loose upon the moor." <p> We stumbled slowly along in th..
    30696, ..him to go out upon the moor at night. Incredible as it may appea..
   299535, ..ny have done, upon the moor! I said it in London, Watson, and I ..
    20294, ..s we call it, upon the moor, some slinking away and some, with s..
   131547, .., as is usual upon the moor, were stunted and nipped, and the ef..

Displaying More Context


>> "the villain "
  5: 2 matches

>> pr.100
   273511, ..s lied to me, the villain, in very conceivable way. Not one word of
 truth has he ever told me. And w..
   248319, ..ne false move the villain may escape us yet." <p> "What can we do?"
 <p> "There will be plenty for us..

>> pr.200 [273511]
   273511, ..s lied to me, the villain, in very conceivable way. Not one word of
 truth has he ever told me. And why -- why? I imagined that all was for my own s
ake. But now I see that I was never anything but a to..

>> pr.200 [273411]
   273411, ..id, "this man had offered me marriage on condition that I could get
 a divorce from my husband. He has lied to me, the villain, in very conceivable
way. Not one word of truth has he ever told me. And w..


>> pr.200 shift.-100 [273511]
   273411, ..id, "this man had offered me marriage on condition that I could get
 a divorce from my husband. He has lied to me, the villain, in very conceivable
way. Not one word of truth has he ever told me. And w..

Printing one line of context for every occurrence often provides you with too much or too little information: too much if you only want to look at a few results and too little if you require more than one line of context to make sense of the results.

The sample command selects and prints 10 matches from the total set of results. DLXS XPAT chooses 10 matches that are evenly spaced throughout the text. You can print a different number of matches by appending a period and a number to the command:

>> pr sample.5

The next command prints 100 characters of context around every occurrence: the standard 14 characters to the left of the match point and 85 characters to the right:

>> pr.100

The following command prints 200 characters of context around the first match only:

>> pr.200 [273511]

Notice that the amount of context to the left of the match point stays constant (14 characters no matter how many characters are printed). To see more to the left, decrease the character position. For example, to see the same amount of context as above but to centre it roughly around the match point, subtract 100 (half of 200) from the character position:

>> pr.200 [273411]

Another way to get the same result is to use the shift command to do the subtraction for you:

>> pr.200 shift.-100 [273511]


>> hound fby black
  6: 2 matches


>> pr
   146450, ..Or a spectral hound, black silent, and monstrous? Was there a hu..
   286109, ..of the fog. A hound it was, and enormous coal-black hound, but n..

>> hound near black
  7: 7 matches

>> pr
   286169, ..ut not such a hound as mortal eyes have ever seen. Fire burst fr..
   146450, ..Or a spectral hound, black silent, and monstrous? Was there a hu..
   286147, ..us coal-black hound, but not such a hound as mortal eyes have ev..
   286109, ..of the fog. A hound it was, and enormous coal-black hound, but n..
    19588, ..nd him such a hound of hell as God forbid should ever be at my h..
    21294, ..rger than any hound that ever mortal eye has rested upon. And ev..
    21267, ..shaped like a hound, yet larger than any hound that ever mortal ..

Frequency Searching


>> signif ""
  8: 3318 matches, text=the

>> signif.-5 "hound "
  9: 8 matches, text=hound of
  10: 7 matches, text=hound of the
  11: 6 matches, text=hound and
  12: 5 matches, text=hound was
  13: 4 matches, text=hound but

>> signif.3 ""
  14: 42 matches, text=upon the moor

You can retrieve prefixes, words, or phrases that appear close to other prefixes, words, or phrases.

>> hound fby black

black must follow within 80 characters or less of hound.

The near command is similar to fby but works in both directions; black can show up on either side of hound.

>> hound near black

You may have to print more context for some matches to see both words.

You can find the words and phrases that appear most often in the text. The following finds the most frequent word in the text (the characters "" stand for the whole text):

>> signif ""

This next command finds the five most frequent words or phrases that follow the word hound:

>> signif.-5 "hound  "

Finally, this last command finds the most frequent phrase of a specific length, for example, three words long:

>> signif.3 ""


>> "murderer " within region chap
  15: 3 matches

>> pr
   101334, .. Notting Hill murderer." <p> I remembered the case well, for it ..
   305433, ..t to the real murderer. <p> "Having conceived the idea, he proce..
   309111, ..inst the real murderer. His only accomplice was one who could ne..

Finding Chapters That Contain a Word


>> region chap including "murderer "
  16: 2 matches

>> pr
    92085, ..more." </chap><chap><no>6</no><ctitle> BASKERVILLE HALL</ctitle>..
   299857, .. moor. </chap><chap><no>15</no><ctitle> A RETROSPECTION</ctitle>..


You can find prefixes, words, or phrases that appear within specific components of the text. You can either retrieve the prefixes, words, or phrases themselves or retrieve the start of the specific component that contains them. For example:

>> "murderer  " within region chap

This retrieves all the occurrences of murderer that appear within text marked as part of the chapter (as opposed to text in other areas such as titles).

Alternatively, you can find the starts of chapters that contain the word murderer.

>> region chap including "murderer  "

[edit] How DLXS XPAT Searches

DLXS XPAT takes what you type and looks for an exact replica in the text. You can match any prefixes, words, or phrases that appear in the text by typing each as it appears. DLXS XPAT then takes your pattern of characters and tries to find the same pattern in the text; DLXS XPAT has no knowledge of the meaning of the text and no built-in information about synonyms to words you type.

It is important to think of DLXS XPAT strictly as a pattern matcher so you can interpret your results accurately. For example, if you are using DLXS XPAT to search an encyclopedia for references to insects, typing insect is a logical starting point. You will find all occurrences to the pattern insect; that is, all words that start with the characters i,n,s,e,c,t.

However, there are many categories and classes of insects (bug, fly, vermin) not to mention specific types of insects (ladybug, louse, earwig) or even variant names of a specific insect (ladybug, ladybird or lady beetle). Furthermore, the term insect might be misspelled in the text. None of the above occurrences will be retrieved by searching for insect. You must anticipate what words might be used in the text to describe the type of information you are after. It is the person at the keyboard, not DLXS XPAT, who investigates and provides the appropriate search terms.

Index
The index determines what text can be matched. When texts are indexed by character, suffixes can be searched; when indexed by the start of a word, prefixes, words and phrases can be searched. An index must be built before the text can be searched with DLXS XPAT.
Suffix
In the linguistic sense, a suffix is an element placed after a word or stem to form another word. In this manual, the term is used to describe any character(s) (letters, numbers or special characters) that are not the first character of a word, regardless of whether they form a grammatical suffix.


>> pr light
     3532, ..indled and a slight flush sprang into his thin cheeks.  For an i..
    25398, ..it was like a light on a dark night. Everything which had been d..
     1759, ..ust where the light strikes it. No, thank you, I had some supper..
    23559, ..the circle of light thrown by the lamp, and as he did so be stop..
     4107, ..I should be delighted." </p> <p> "Could you go as far as Aldersh..
     1470, .."I shall be delighted if you will stay." </p> <p> "Thank you. I'..
    19465, .. and the room lighted. We know, also that he ran across the lawn..

Indexing by Word Beginnings


>> pr light
    25398, ..it was like a light on a dark night. Everything which had been d..
     1759, ..ust where the light strikes it. No, thank you, I had some supper..
    23559, ..the circle of light thrown by the lamp, and as he did so be stop..
    19465, .. and the room lighted. We know, also that he ran across the lawn..

>> pr bottle
    59547, ..te pen or ink-bottle is seldom allowed to be in such a state, an..
    59513, ..le ink in the bottle. Now, a private pen or ink-bottle is seldom..
   228879, ..d a half-full bottle of spirits standing in the corner. In the m..
   292576, .., your brandy-bottle! Put her in the chair! She has fainted from..
   220729, ..ers and their bottles. Both cases decided, Dr. Watson, and both ..

>> pr "-bottle"
    59546, ..ate pen or ink-bottle is seldom allowed to be in such a state, a..
   292575, ..e, your brandy-bottle! Put her in the chair! She has fainted fro..

>> pr sample.2 "<chap>"
   299857, .. moor. </chap><chap><no>15</no><ctitle> A RETROSPECTION</ctitle>..
    73037, ..otel." </chap><chap><no>5</no><ctitle> THREE BROKEN THREADS</cti..


>> pr "light "
    25398, ..it was like a light on a dark night. Everything which had been d..
     1759, ..ust where the light strikes it. No, thank you, I had some supper..
    23559, ..the circle of light thrown by the lamp, and as he did so be stop..

DLXS XPAT knows nothing about grammatical constructs like words or phrases. DLXS XPAT can be set up so that when you type a pattern like light, the program looks for this pattern anywhere in the text, for example, within delighted, lighted or as the word light. Every character in the text can be matched, or more technically, every character is indexed. See the facing page for an example.

It is not always desirable that every character be indexed. The example texts used in this manual are indexed at the beginning of each word, tag, or hyphenated stem, for example: light, <chap>, and -bottle or bottle in brandy-bottle. In other words, you can search for any pattern that starts with a - or <, or appears in the text following a - or space. It is normally more useful to index by word beginnings (unless you are interested in searching suffixes). You specify what type of index you wish to build as part of the index building procedure for a particular text. See the DLXS XPAT Installation Guide for more details.

Notice that DLXS XPAT knows nothing about the ends of words. DLXS XPAT simply finds whatever begins with what you type. The pattern light results in anything that starts with the characters light, which accounts for the match to lighted on the facing page. Similarly, the pattern "light " results in anything that starts with the characters light followed by a space ( DLXS XPAT does not understand this as a word, just as a sequence of characters, one of which happens to be a space).

Character mapping
Characters can be substituted for or mapped to other characters. For example, a period can be mapped to a space so that a search for a word followed by a space also gets occurrences of the word followed by a period. Similarly, upper case characters can be mapped to lower case. Character mappings are defined at the time of index building.
Stop words
Stop words are words that DLXS XPAT treats as if they do not exist in the text. These are often common words like the or a. Stop words are defined at the time of index building.


O' my luve's like a red, red rose That's newly sprung in June.

o my luve s like a red red rose that s newly sprung in june

Consider a search for the pattern "tall ". Although your aim is to search for the word tall, DLXS XPAT is searching for an exact match to the characters t, a, l, l followed by a space. If tall appears at the end of a sentence, it may be followed with a period, question mark or some punctuation other than a space. Similarly, if it begins a sentence it may have an initial capital.

Fortunately, DLXS XPAT can be set up at index building time to interpret specific characters in the text as if they were other characters. For example, DLXS XPAT can be told to interpret a period or question mark as a space or an upper case character as its lower case equivalent. This is called character mapping. The facing page shows punctuation marks mapped to spaces, and upper case characters to lower case characters. Notice that when character mapping results in two spaces in a row (, and space, for example), DLXS XPAT condenses these to a single space. All characters remain as they are in the text; DLXS XPAT only sees the difference when searching.

If you search for a word (by leaving a space after it) and get back the word with a character other than a space after it, it means that DLXS XPAT treats this character as a space.

At index building time, you can also tell DLXS XPAT to ignore certain words completely. It is often useful to ignore common words, like and or the, sometimes referred to as stop words. Here, a search for the would not result in a match. A search for wind rain would match the phrase wind and rain.

Instructions for specifying these features are part of the installation procedure defined in the DLXS XPAT Installation Guide.

[edit] Basic Searching

The most basic search in DLXS XPAT is simply to type some text after the prompt. DLXS XPAT finds all matches to that text. For example, a search for start could result in start, starts, startle or any text that begins with start. When you only want complete words, you add a space to the end of your search pattern and enclose it within quotation marks. DLXS XPAT responds to a search by displaying the number of times the pattern appears in the text. You can then choose to print some or all the matches in alphabetical order or in order of position of text, with some context displayed around each match. You may also choose to save your results in a file, rather than display them on the screen. To help you keep track of your progress, DLXS XPAT keeps a summary of what searches you have made and their results.

[edit] Searching for Text

>> skirt
  1: 2 matches

>> pr
   174120, ..n a shawl and skirt might have been comic were it not for the in..
    99613, ..e bridge, and skirted a noisy stream, which gushed swiftly down,..

Searching for a Word


>> "autobiography "
  2: 1 match


>> pr
   240303, ..true piece of autobiography upon the occasion when he first met ..

Searching for a Phrase


>> "old autocrat "
  3: 1 match

>> pr
   223873, ..re out of the old autocrat. His eyes looked malignantly at me, a..

Searching for a Number Prefix


>> pr "18"
    40348, ..died there in 1876 of yellow fever. Henry is the last of the Bas..
     7098, ..ism" (Lancet, 1882), "Do We Progress?" (Journal of Psychology, M..
     6801, ..es, M.R.C.S., 1882, Grimpen, Dartmoor, Devon. House-surgeon, fro..
     6853, ..surgeon, from 1882 to 1884, at Charing Cross Hospital. Winner of..
     7154, ..ology, March, 1883). Medical Officer for the parishes of Grimpen..
     6861, .. from 1882 to 1884, at Charing Cross Hospital. Winner of the Jac..
      658, ..ith the date "1884." It was just such a stick as the old-fashion..
   207588, ..ive up to the 18th of October, a time when these strange events ..

You start a search by typing a pattern of text. You can type any arrangement of characters including prefixes, words, phrases, and numbers.

>> skirt
>> "autobiography  "
>> "old autocrat  "
>> "18"

If the pattern contains spaces (as in the second and third examples), numbers, non-letter characters like <, or names of commands, you must put the pattern in quotation marks. If it contains none of the above, DLXS XPAT accepts it with or without quotation marks.

After a search, DLXS XPAT tells you how many time the pattern appears; it does not automatically display the results.

To display the results, use the pr command:

>> pr

You can combine a pr command and search pattern on the same line, as the last example on the facing page shows. However, when you do, no set number results. The results are not saved and you cannot access the results later without typing the pattern again.

If there is more than a screenful of matches, the results will scroll off the screen. Your computer has key sequences that allow you to stop and start scrolling (Control s and Control q on some systems). Check your computer manual.

To stop the display of the matches completely, use the key or sequence of keys that normally performs an interrupt function on your keyboard (commonly the Delete key or a Control c key sequence). The DLXS XPAT prompt redisplays on your screen, awaiting your next search.

[edit] Displaying More Context

Modifier
A modifier qualifies a command. The modifier is attached to the end of a command and is identified by a period followed by numbers or letters. In the commands, pr.200 and pr.region for example, .200 and .region are the modifiers. The first prints 200 lines of context, the second prints a text component that surrounds each match.


>> pr.100 "elementary "
    56513, ..e of the most elementary branches of knowledge to the special exper
t in crime, though I confess that..
     3393, ..sting, though elementary," said he, as he returned to his favourite
 corner of the settee. "There are..

Printing More to Right of One Match


>> pr.145 [3393]
     3393, ..sting, though elementary," said he, as he returned to his favourite
 corner of the settee. "There are certainly one or two indications upon the st..

Defining a New Print Length


>> {PrintLength 145}
>> pr
    56513, ..e of the most elementary branches of knowledge to the special exper
t in crime, though I confess that once when I was very young I confused the Le..
     3393, ..sting, though elementary," said he, as he returned to his favourite
 corner of the settee. "There are certainly one or two indications upon the st..

Printing More on Both Sides


>> pr.200 [3293]
     3293, ..rette, and, carrying the cane to the window, he looked over it agai
n with a convex lens. <p> "Interesting, though elementary," said he, as he retur
ned to his favourite corner of the settee. "There are..


>> pr.200 shift.-100 [3393]
     3293, ..rette, and, carrying the cane to the window, he looked over it agai
n with a convex lens. <p> "Interesting, though elementary," said he, as he retur
ned to his favourite corner of the settee. "There are..

DLXS XPAT normally displays 64 characters of context for a match, 14 before the match point and 49 after. This may not be enough. To increase the number of characters displayed to the right of the match, follow the command with a period and a number that serves as a modifier. For example, to display 100 characters of context (14 before and 85 after the match point):

>> pr.100 "elementary  "

You can also print more around one match instead of the whole set:

>> pr.145 [3393]

Alternatively, you can change the number of characters normally printed (from 64 to 145, for example):

>> {PrintLength 145}

From now on, DLXS XPAT prints 145 characters unless you modify pr or change this value.

To increase the context on both sides, increase the number of characters printed to the right and decrease the position number. For example, subtract 100 from the position number and print forward 200.

>> pr.200 [3293]

A second way is to use the shift command to decrease the position number (move the match point back). Then use the pr command to print forward from that point.

>> pr.200 shift.-100 [3393]

Simply leave out the position number if you want to see more around the entire set of results:

>> pr.200 shift.-100


>> terror
  4: 6 matches

>> pr
   285897, ..ave a yell of terror and threw himself face downwards upon the g..
    38370, ..is a reign of terror in the district, and that it is a hardy man..
   231296, ..eness and the terror of that interview which every instant was b..
    21892, ..own hath less terror than that which is but hinted at and guesse..
   250402, ..a paroxysm of terror that he would risk recapture by screaming w..
   308381, ..t disease and terror. The hound had kept upon the grassy border ..

>> shift.10
  5: 6 matches

>> pr
    21902, ..ess terror than that which is but hinted at and guessed. Nor can..
    38380, .. of terror in the district, and that it is a hardy man who will ..
   231306, ..the terror of that interview which every instant was bringing ne..
   250412, .. of terror that he would risk recapture by screaming wildly for ..
   285907, .. of terror and threw himself face downwards upon the ground. I s..
   308391, ..and terror. The hound had kept upon the grassy border while the ..

The shift command changes the match point. At the same time, shift reorders the original matches in the order in which they appear in the text, rather than in alphabetical order. This means that what was listed first in the original set will not necessarily be listed first in the shifted set.

When you search for a pattern, DLXS XPAT regards the first letter of the pattern as the match point. When you print the results of a search (see facing page), the match points line up at the 15th character. A shift command creates new match points from characters to the right or left of the original match points; for example:

>> shift.10

A shift of 10 makes a new match point 10 characters after the original one. (A shift of -10 makes a new match point 10 characters before the match point). When you print the results, the new match points appear at the 15th character of each line.

A shift of a positive number allows you to see more context to the right of the original match and a shift of a negative number allows you to see more context to the left.

DLXS XPAT shifts the previous set of matches. If you print and shift on the same line, DLXS XPAT does not record the set (no set number results). Consequently if you shift again, you will shift from the original set, not the previously-displayed results. However, if you shift and print separately, DLXS XPAT does record a set and you will shift from the new results, not the original set.

[edit] Selecting a Sample of Results

>> the
  6: 4149 matches

>> sample
  7: 10 matches


>> pr
   173114, ..d steadily in the centre of the black square framed by the windo..
   206880, ..urred pane at the driving clouds and at the tossing outline of t..
   294196, ..ow can he see the guiding wands tonight? We planted them togethe..
   116169, ..lipped out of the leading article of The Times. Was that his wor..
   100430, ..lying spur of the moor, lay in front of us. On the summit, hard ..
   290739, ..who met us in the passage. There was no light save in the dining..
   227280, ..barren scene, the sense of loneliness, and the mystery and urgen..
   214523, .. <p> "That is the truth." <p> Again and again I cross-questioned..
   273214, .. school. Read them, and see if you can doubt the identity of the..
   176700, ..l see that if there is blame in the matter it does not lie with ..

Changing the Size of the Sample


>> pr sample.5
   206880, ..urred pane at the driving clouds and at the tossing outline of t..
   116169, ..lipped out of the leading article of The Times. Was that his wor..
   290739, ..who met us in the passage. There was no light save in the dining..
   227280, ..barren scene, the sense of loneliness, and the mystery and urgen..
   273214, .. school. Read them, and see if you can doubt the identity of the..

Defining the Normal Sample Size


>> {SampleSize 4}
>> pr sample
   294196, ..ow can he see the guiding wands tonight? We planted them togethe..
   100430, ..lying spur of the moor, lay in front of us. On the summit, hard ..
   227280, ..barren scene, the sense of loneliness, and the mystery and urgen..
   273214, .. school. Read them, and see if you can doubt the identity of the..

In a lengthy text, you may only need to look at a fraction of the results. You can look at 10 matches, for example:

>> the
>> sample
>> pr

DLXS XPAT provides a sample by taking matches that are evenly spread throughout the text. For example, DLXS XPAT chooses 10 matches from a set of 100 matches by taking 1 from the first 10 matches, 1 from the next 10, and so on.

To see different matches, you must change the size of the sample taken; otherwise you get the same matches.

You can change the size of the sample by following the command with a number that serves as a modifier. For example, to get 5 matches and print the results:

>> pr sample.5

Without a modifier, DLXS XPAT retrieves 10 matches. You can change the number retrieved for a single command by using a modifier as above. You can also change the number DLXS XPAT normally prints without a modifier. To change the number normally provided to 4, type the following:

>> {SampleSize 4}

From this point until you leave DLXS XPAT, whenever you sample without a modifier, DLXS XPAT retrieves this new number of matches. However, if you give a modifier, DLXS XPAT gets the number you specify, regardless of the {SampleSize} setting.

[edit] Using Previous Match Sets

>> pr 2
   240303, ..true piece of autobiography upon the occasion when he first met ..

Accessing the Last Set


>> pr shift.10 %
   100440, .. of the moor, lay in front of us. On the summit, hard and clear ..
   116179, .. of the leading article of The Times. Was that his work, or was ..
   173124, .. in the centre of the black square framed by the window. <p> "Th..
   176710, .. if there is blame in the matter it does not lie with my husband..
   206890, .. at the driving clouds and at the tossing outline of the wind-sw..
   214533, .. is the truth." <p> Again and again I cross-questioned her, but ..
   227290, ..ne, the sense of loneliness, and the mystery and urgency of my t..
   273224, ..ead them, and see if you can doubt the identity of these people...
   290749, .. in the passage. There was no light save in the dining-room, but..
   294206, ..see the guiding wands tonight? We planted them together, he and ..

Displaying the History of a Session


>> history
  1:       2,  skirt
  2:       1,  "autobiography "
  3:       1,  "old autocrat "
  4:       6,  terror
  5:       6,  shift.10
  6:    4149,  the
  7:      10,  sample

Previous sets can be accessed by referring to the set's number. For example, to reprint the results of set 2:

>> pr 2

Your last set can be accessed by the set number or a % sign, for example:

>> pr shift.10 %

This shifts the match point from the last set.

To find the number of a previous set, use the history command:

>> history

DLXS XPAT displays the set number, followed by the number of matches, followed by the command or pattern that produced the match set. Up to 300 sets can be kept. After this limit, you have to leave DLXS XPAT and start it up again.

Notice that the results of pr commands are not saved. If you expect to reuse intermediate results, you should search in steps, for example:

>> the
>> sample
>> pr
>> shift.10
>> pr

The first, second and fourth commands produce sets that can be reaccessed.

Commands that change normal settings such as {PrintLength} are also not recorded.

[edit] Searching for a Range of Text

Concordance
A concordance is an alphabetical list of words occurring in a text. You create one in DLXS XPAT by searching for every word start that begins with a letter: "a".."z". Most search results are partial concordances since the pr command normally prints results in alphabetical order.


>> pr sample.5 "be".."bu"
   183436, ..h a bristling beard, and hung with matted hair, it might well ha..
   209275, ..o was sitting before a Remington typewriter, sprang up with a pl..
   113584, ..d she weep so bitterly? Already round this pale-faced, handsome,..
   122823, ..e queer hills breaking out of it. Do you observe anything remark..
    11659, ..s was silent, but his little darting glances showed me the inter..

Searching for a Range of Numbers


>> pr "1880".."1890"
     6801, ..es, M.R.C.S., 1882, Grimpen, Dartmoor, Devon. House-surgeon, fro..
     6853, ..surgeon, from 1882 to 1884, at Charing Cross Hospital. Winner of..
     7154, ..ology, March, 1883). Medical Officer for the parishes of Grimpen..
     6861, .. from 1882 to 1884, at Charing Cross Hospital. Winner of the Jac..
      658, ..ith the date "1884." It was just such a stick as the old-fashion..

Retrieving the Full Range of Text


>> pr ""
    56188, ..ry curve, the -- " <p> "But this is my special hobby, and the di..
   194789, ..ll, Barrymore -- " <p> "God bless you, sir, and thank you from m..
    57282, ..it with paste -- " <p> "Gum," said Holmes. <p> "With gum on to t..
    45136, ..inly, but how -- ?" <p> He laughed at my bewildered expression. ..
   261671, ..have no doubt -- " <p> He stopped suddenly and stared fixedly up..
   238874, ..tach his wife -- " <p> "His wife?" <p> "I am giving you some inf..
    89849, ..eet and along --" <p> "I know," said Holmes. <p> "Until we got t..
   250665, ..s are correct --" <p> "I presume nothing." <p> "Well, then, why ..
    76905, ..such a trifle --" <p> "I think it's well worth troubling about."..
    12255, ..ert in Europe -- " <p> "Indeed, sir! May I inquire who has the h..
    12683, ..inadvertently -- " <p> "Just a little," said Holmes. "I think, D..
    10370, ..ames Mortimer --" <p> "Mister, sir, Mister -- a humble M.R.C.S."..
   173287, ..sure you, sir --" <p> "Move your light across the window, Watson..
^C*** Interrupted ***

You can search for a range of patterns such as a range of letters or numbers. An example of a range is every word that starts with a through the alphabet to every word that starts with z. This alphabetical arrangement of words or phrases in a text is known as a concordance. To display a full concordance of text, type the following:

>> pr "a".."z"

When the list is lengthy, use the key that normally interrupts to stop the display (commonly the Delete key or a Control c key sequence). A sample of a smaller range of letters, be through bu, appears on the facing page.

An example of a search for a range of numbers is all text that starts with 1880 up to and including all text that starts with 1890. Range searching on numbers can be useful for searching numeric constructs like dates.

>> pr "1880".."1890"

In this example, all the numbers happen to represent dates. However, numbers such as 18801 or 18823 could also result from this search since these start with characters in the range 1880..1890.

Occasionally, you may want to retrieve every match point in the text. Double quotation marks ("") represent the set of all match points in the text, for example:

>> pr ""

Since the example text is indexed by word start, this prints a sorted list of all words: those starting with a to z (the concordance example), numbers, and special characters. In the example on the facing page, the display was stopped after the hyphen, the first character sorted.

[edit] Saving Your Results in a File

Default value
A default value is the original or normal value. For example, when requested to print a sample of results, DLXS XPAT prints 10 unless you specify otherwise.


>> pr.200 handsome
   166897, ..d make a more handsome apology than he has done." <p> "Did he give
any explanation of his conduct?" <p> "His sister is everything in his life, he s
ays. That is natural enough, and I am glad that he sh..
   113625, ..s pale-faced, handsome, black-bearded man there was gathering an at
mosphere of mystery and of gloom. It was he who had been the first to discover t
he body of Sir Charles, and we had only his word for ..
     1812, ..inally a very handsome one, has been so knocked about that I can ha
rdly imagine a town practitioner carrying it. The thick iron ferrule is worn dow
n, so it is evident that he has done a great amount o..
   106786, ..ng man, tall, handsome, with a square black beard and pale distingu
ished features. <p> "Would you wish dinner to be served at once, sir?" <p> "Is i
t ready?" <p> "In a very few minutes, sir. You will f..
   210086, ..nce of a very handsome woman, and that she was asking me the reason
s for my visit. I had not quite understood until that instant how delicate my mi
ssion was. <p> "I have the pleasure," said I, "of kno..


>> save.200 handsome
Saving in pat.results

Saving in Another File


>> {SaveFile "references"}
>> save "smoke "
Saving in references

Rather than displaying the results of a search on the screen, you can save them in a file. The default filename is pat.results. This means that unless you specify otherwise (see below), results will be appended to the file of that name. The results appear in exactly the same format in the file as they would on the screen. The save command takes all the same modifiers that pr does, for example:

>> save.200 handsome

In the above example, DLXS XPAT stores 200 characters for each match. The next time you save, the results will be added to the end of the pat.results file; any results already in this file will stay intact.

If you prefer, you can have the results placed in a file of your choice. The following command designates the file references as the place to put subsequent results:

>> {SaveFile "references"}

The results of your next save command are added to the file references in your current directory. This is in effect until you change the name again or leave the program.

[edit] Sorting of Matches: Alphabetical or by Position

>> {PrintMode 2}
>> pr 6
    21892, ..own hath less terror than that which is but hinted at and guesse..
    38370, ..is a reign of terror in the district, and that it is a hardy man..
   231296, ..eness and the terror of that interview which every instant was b..
   250402, ..a paroxysm of terror that he would risk recapture by screaming w..
   285897, ..ave a yell of terror and threw himself face downwards upon the g..
   308381, ..t disease and terror. The hound had kept upon the grassy border ..

Printing in Alphabetical Order


>> {PrintMode 1}
>> pr 6
   285897, ..ave a yell of terror and threw himself face downwards upon the g..
    38370, ..is a reign of terror in the district, and that it is a hardy man..
   231296, ..eness and the terror of that interview which every instant was b..
    21892, ..own hath less terror than that which is but hinted at and guesse..
   250402, ..a paroxysm of terror that he would risk recapture by screaming w..
   308381, ..t disease and terror. The hound had kept upon the grassy border ..

DLXS XPAT normally sorts matches alphabetically, except after shift commands when the sorting is by position in the text (first to last). This default mode of printing is known as {PrintMode 0}. You can change the mode of printing so DLXS XPAT prints in order of position for the duration of the session:

>> {PrintMode 2}

Generating a concordance doesn't make much sense in this mode since the results will not be alphabetical.

DLXS XPAT will sort by position until you leave the program or change the mode of printing.

You can also change the mode of printing so that DLXS XPAT prints in alphabetical order for the duration of the session:

>> {PrintMode 1}

The results now print in alphabetical order (including results of shift commands).

[edit] Refining Searches

Searching for some patterns may yield a large set of results. You may wish to narrow down your set of results by making your search more specific. Proximity searches allow you to find only those matches to a pattern that are within or outside a certain distance of another pattern. You can also search for the most frequent occurrences of certain patterns or long patterns that appear more than once in the text.

[edit] Searching Based on Proximity

Proximity
Proximity refers to the position of one thing in relation to another. DLXS XPAT allows you to define at what distance a prefix, word or phrase is proximate to another. You define this distance as a number of characters.
Proximity range
Proximity range refers to the number of characters used to determine whether two or more patterns are proximate to one another. For example, the normal proximity range is 80; that is, the first letter of the second pattern must appear 80 characters or less from the first letter of the first pattern.


>> "war " fby "peace "
  1: 71 matches

>> pr sample.3
234870887, ..ive of making war and peace.  For it is held by all the writers ..
253247094, ..5 <T>Does not war create or reenliven numerous branches of indus..
312123630, ..in a state of war or in a state of peace. </T></Q><Q><D>1890</D>..

>> pr.100 [253247094]
253247094, ..5 <T>Does not war create or reenliven numerous branches of industr
y as well as peace?</T></Q></QP></..

Finding Text on Either Side


>> "war " near "peace "
  2: 132 matches

>> pr sample.3
 72443035, ..h in peace or war. </T></Q><Q><D>1875</D> <A>W. S. Hayward</A> <..
254046989, ..ery Nerves of War and the Regales of Peace. </T></Q><Q><D>1773</..
322675872, ..idized by the War Office in peace time while in their owners' ha..


Finding Text Not Proximate to Other Text


>> "war " not fby "peace "
  3: 5992 matches

>> "war " not near "peace "
  4: 5931 matches

Proximity is the closeness of one piece of text to another. DLXS XPAT has four proximity commands (near, fby, not near and not fby). An example of each follows:

>> "war  " fby "peace  "
>> "war  " near "peace  "
>> "war  " not fby "peace  "
>> "war  " not near "peace  "

The first example matches on occurrences of war that are followed within 80 characters by peace. The second example matches on occurrences that are followed or preceded by peace. The third and fourth examples find occurrences of war that are not followed by or not near peace.

The number of characters used to determine proximity is referred to as the proximity range. Normally the proximity range is 80 characters measured from the first letter of the first pattern to the first letter of the second pattern. For near and fby, a match results if the two patterns are within this distance of each other. For not near and not fby, a match results if the two patterns are not within this distance of each other.

When you print the results of these searches, the first letter of the first pattern (w) lines up in the 15th column since DLXS XPAT considers it the match point. The second pattern (peace) may not appear in the display at all if your line length is short. Consequently, you may have to display a few lines of text so that you can see both references. (See the first example on the facing page.)


>> "war " fby.150 "peace "
  5: 89 matches

>> "war " near.150 "peace "
  6: 161 matches


>> "war " not fby.150 "peace "
  7: 5974 matches

>> "war " not near.150 "peace "
  8: 5902 matches

Changing the Default Proximity Range


>> {Proximity 100}
>> "war " fby "peace "
  9: 75 matches

You can change the proximity range by adding a modifier to any of the proximity commands. Varying the modifier can sometimes significantly change the results. To increase the range to 150, for example, use a modifier as follows:

>> "war  " fby.150 "peace  "

Notice from the facing page that this change increases the number of matches (89 as compared to 71 without the modifier).

Using a modifier changes the proximity range for a single command. You can also change what DLXS XPAT considers the default or normal proximity range (80 characters). To change this value to 100, the command is as follows:

>> {Proximity 100}

From now on, any proximity commands you type without a modifier will use 100 as the proximity range.

[edit] Searching for Text that Occurs Frequently

2 Finding the Most Frequent Four-Word Phrase

3 Finding the Most Frequent Word or Phrase Within a Set

Every time you search for a pattern, DLXS XPAT tells you how many times it appears in the text. If you're interested in the frequency of specific patterns, this is the easiest way to get at that information. However, if you're interested in the frequency of text in general, DLXS XPAT provides a command called signif which displays a list of the most frequent words or phrases in the text.

There are three main ways to use this command:

  1. Search for the most frequent words or phrases in order of most to least frequent (-3 stands for the three most frequent). Note the - before the number.
>> signif.-3 ""

From the passage on the facing page, the three most frequent words or phrases are the, of, and a.

  1. Search for the most frequent phrases of a specified length in words (4 stands for a length of four words). There is no - in this form of the command.
>> signif.4 ""

The most frequent four-word phrase in the passage is the date for a.

  1. Search for the most frequent word or phrase. Then, extend the word or phrase further by searching for frequent words and phrases within the first set of results.
>> signif ""
>> signif

The most frequent word or phrase in the passage is the. The most frequent word or phrase following these occurrences is date for a.

The following pages discuss each of the three applications in detail.


>> signif.-4 ""
  1: 3318 matches, text=the
  2: 1619 matches, text=and
  3: 1585 matches, text=of
  4: 1498 matches, text=i

Starting With a Prefix to a Word


>> signif.-4 the
  5: 3318 matches, text=the
  6: 310 matches, text=there
  7: 151 matches, text=then
  8: 132 matches, text=the moor

Starting with a Full Word


>> signif.-4 "the "
  9: 132 matches, text=the moor
  10: 51 matches, text=the man
  11: 43 matches, text=the baronet
  12: 39 matches, text=the same
  1. The first application of signif allows you to search for the most frequent words and phrases.

For example, you can search for the four most frequent words or phrases within a text:

>> signif.-4 ""

Other words or phrases in the text may appear just as frequently as the last one listed. In the example above, if there were three phrases tied for fourth position only one would be listed. You can also search for the four most frequent words and phrases that begin as specified, for example:

>> signif.-4 the
>> signif.-4 "the  "

The first command uses the prefix the as the starting point while the second uses the word the. Notice the difference between the results of these two (see facing page). DLXS XPAT selects the word the as the most frequent word or phrase from those beginning with the prefix the. DLXS XPAT selects the phrase the moor as the most frequent phrase beginning with the word the. The first results would be identical to the second if the prefix the only appeared as the word the (never started another word like then or there).


>> signif.3 ""
  13: 42 matches, text=upon the moor

>> pr sample.2
   218388, ..e is anywhere upon the moor?" "I know it because I have seen wit..
   151473, ..g or somebody upon the moor. The night was very dark, so that I ..

Starting With a Prefix to a Word


>> signif.3 the
  14: 17 matches, text=there is no

>> pr sample.2
    63324, ..r.Holmes, and there is no man upon earth who can prevent me from..
    39052, ..les's will." "There is no other claimant, I presume?" "None. The..

Starting With a Full Word


>> signif.3 "the "
  15: 14 matches, text=the death of

>> pr sample.2
    24341, ..onnected with the death of Sir Charles cannot be said to have be..
   289364, ..morning after the death of the hound the fog had lifted and we w..
  • The second application of signif allows you to search for the most frequent phrase of a specified length (in words).

For example, you can search for the most frequent phrase of three words in the text:

>> signif.3 ""

In the example above, if three phrases tied for top position, only one would be listed. You can also search for frequent three-word phrases that begin as specified, for example:

>> signif.3 the
>> signif.3 "the "

The first command uses the prefix the as the starting point while the second uses the word the. Notice the difference between the results of these two (see facing page). The most frequent three-word phrase beginning with the prefix the is there is no but the most frequent three-word phrase beginning with the word the is the death of. The first results would be identical to the second if the prefix the only appeared as the word the (never started another word like then or there). DLXS XPAT may return more words than you specify if the phrase continues to repeat after that number of words. For example, if the most frequent three-word phrase is the death of but all occurrences of this phrase are followed by the words Sir Charles, DLXS XPAT will return five words: the death of Sir Charles.


>> signif ""
  16: 3318 matches, text=the

>> signif
  17: 132 matches, text=the moor

>> signif
  18: 13 matches, text=the moor and

Starting With a Prefix to a Word


>> signif the
  19: 3318 matches, text=the


>> signif
  20: 132 matches, text=the moor

>> signif
  21: 13 matches, text=the moor and

Starting With a Full Word


>> signif "the "
  22: 132 matches, text=the moor

>> signif
  23: 13 matches, text=the moor and

>> signif
  24: 3 matches, text=the moor and i
  1. The third application of signif allows you to search for the most frequent word or phrase and extend the word or phrase further by searching within this new set.

Suppose you search for the most frequent word or phrase in the text and the result is a set of matches to the word the. If you want to find the most frequent phrases in this set (the set of all phrases beginning the) you can reapply signif again. You can continue doing so until you run out of repeated sequences of text beginning the. For example, you can search for the most frequent word or phrase in the text and then extend the phrase further:

>> signif ""
>> signif

If there is a tie for the most frequent word or phrase, only one is listed. You can also search for frequent words or phrases that begin as specified, for example:

>> signif the
>> signif "the "

In each case, you can extend the phrase further by later signif commands with no modifiers. signif with no modifier is exactly the same as signif.-1. Notice the difference between the results of the last two commands (see facing page). The most frequent word or phrase beginning with the prefix the is the word the but the most frequent phrase beginning with the word the is the moor. The first results would be identical to the second if the prefix the only appeared as the word the (never started another word like then or there).

[edit] Finding Long Repetitions of Text

>> lrep ""
                The longest repetition is 88 chars long
  25: 2 matches

>> pr
   193010, ..and it said: 'Please, please, as you are a gentleman, burn this ..
   208848, ..ter. It ran, 'Please, please, as you are a gentleman, burn this ..

Finding the Longest Repetition Starting as Specified


>> lrep "one "
                The longest repetition is 201 chars long
  1: 2 matches

>> pr
324837974, ..ur bones; the one articulated to the tympanic pedicle is called ..
151819756, ..ur bones; the one articulated to the tympanic pedicle is called ..

Giving a Minimum Length


>> lrep.50 "one "
                The longest repetition is 201 chars long
  2: 8 matches

>> pr
324837974, ..ur bones; the one articulated to the tympanic pedicle is called ..
151819756, ..ur bones; the one articulated to the tympanic pedicle is called ..
211986737, ..ay the end of one Brick about half way over the end of another, ..
355769421, ..ay the end of one Brick about half way over the end of another, ..
134563508, ..ostylism)..In one individual the flowers all have a long style a..
 75218845, ..ostylism)..In one individual the flowers all have a long style a..
231415578, ..le oestrus at one or more particular times of the year (bitch), ..
190678247, ..le oestrus at one or more particular times of the year (bitch), ..

The lrep command, like signif, helps you find words or phrases that are repeated somewhere else in the text. Instead of finding frequent repetitions of text, lrep finds the longest repetitions in the text.

Long repetitions of text may show that one part of the text was copied from another or perhaps reveal phrases or cliches in the language.

The following command will find the longest repeated phrase in the text:

>> lrep ""

DLXS XPAT returns with a statement telling you the length of the repeated phrase and the number of matches of that length. Refer to the facing page for an example.

You can also find the longest repeated phrase that starts as specified; for example, with the word one:

>> lrep "one "

Like signif, if you do not specify some starting text, lrep works on the previous match set.

If you only want repeated text that is longer than a certain number of characters, you can modify the lrep command. For example, to search for repeated phrases of 50 or more characters, type:

>> lrep.50 "one "

[edit] Searching Components of Text

It is often useful to restrict your search to selected components of the text. You may want to restrict your search to logical components of a text, such as titles, abstracts or chapters, for example. Or you may want to restrict your search to a continuous chunk of text, the first three chapters of a book, for example. Besides searching for and displaying a pattern within the restricted area, you can search for and display the start of the text component that contains a pattern: the start of the abstract or chapter that contains a word, for example. You can also display the complete component of text that contains the pattern: the complete abstract, for example. You can define the components of the text within which you want to search either at the time of index building or during a DLXS XPAT session.

[edit] Restricting Your Search Area

<title>
Red, Red Rose
</title>
<content>
O, my luve's like a red, red rose
That's newly sprung in June.
O, my luve is like the melodie
That's sweetly played in tune.
</content>

<author>
Robert Burns
</author>

DLXS XPAT searches for a pattern anywhere in the text. There may be times, however, when you want to restrict your search to certain logical components of the text. In a typical novel, these components might be the chapter titles, the chapters, or paragraphs. In a highly-structured text like a dictionary, there may be many components: the main words, pronunciations, etymologies, definitions and quotations, to name a few possibilities. Or you may want to restrict your search to one chunk of text such as an individual chapter or all the dictionary entries for the letter a.

You can restrict your search any way you like as long as you can define to DLXS XPAT where the restricted area starts and where it ends. This is easiest if text components like chapters or paragraphs are marked in the text, by tags for example. The poem on the facing page contains tags that mark the start and end of the title, contents, and author components of a poem. Assume that this poem is one in a collection of poems marked in the same manner. By giving instructions to DLXS XPAT to search only within text that falls between <title> and </title> tags, for example, you can restrict your search to all the poem titles in the collection.

You don't have to use tags to define the area to search; you can use any text including the actual content. Suppose you wanted to restrict your search to an individual poem. The short poem on the facing page serves as a simplified example. You can define the content of this poem to be all text that lies between the phrase O my luve's and the word tune. Notice that a phrase shorter than O my luve's could not be used since it would repeat in the text; your start and end points must be unique in the entire text (not just the selected poem) or you will define more than one part of text.

Heavily used text components are normally defined at index building time. These have the advantage of being available in any session. You can also define components during an individual session; however, these are lost upon leaving the session.

[edit] Searching Pre-Defined Components of Text

Document
A document is a component or portion of the text. Examples of documents are paragraphs, headings, or chapters. Documents do not have to be structural; they can consist of any block of text. The first few pages of a chapter or the last half of a book could constitute a document.


>> "lrep" within region p
  1: 4 matches

>> pr
    79956, ..eywd>The <TT>lrep</TT> command, like <TT>signif</TT>, helps ..
    80132, .. of text <TT>lrep</TT> finds the longest repetition in the tex..
    80923, ..ing text,<TT>lrep</TT> works on the previous match set.</p> <p..
    81109, ..dify the <TT>lrep</TT> command. For example, to search for rep..

Finding Paragraphs That Contain a Pattern


>> region p including "lrep"
  2: 3 matches

>> pr
    79919, ..    <p>The <TT>lrep</TT> co..
    80850, ..</PRE> <p>Like <TT>signif</TT>, if you do not specify s..
    80971, ..atch set.</p> <p>If you only want re..

Printing the Complete Paragraph


>> pr.region.p
    79919, ..<p>The <TT>lrep</TT> command, like <TT>
signif</TT>, helps you find words or phrases that are repeated somewhere else i
n the text.  Instead of finding frequent repetitions of text <TT>lrep</TT> fin
ds the longest repetition in the text.</p>..
    80850, ..<p>Like <TT>signif</TT>, if you do not specify some starting text
, <TT>lrep</TT> works on the previous match set.</p>..
    80971, ..<p>If you only want repeated text that
is longer than a certain number of characters, you can modify the <TT>lrep</cmd

> command. For example, to search for repeated phrases of 50 or more characters,
 type:</p>..

The person who installed DLXS XPAT on your system can tell you whether any components of text have been pre-defined and, if so, what they are called. Searching within components of the text can be noticeably slower than searching the complete text.

The electronic version of this manual has pre-defined components of text that include headings, paragraphs, and examples. All are enclosed within tags. Every paragraph, for example, is enclosed within <p> and </p> tags. Assume you want to search for all occurrences of the pattern lrep within paragraphs (called p):

>> "lrep" within region p

This command finds all matches to the pattern lrep that appear within paragraphs. The region stands for components or parts of text.

The quotation marks around lrep are necessary since you are searching for a pattern that DLXS XPAT would normally treat as a command.

In addition to searching for a pattern within components of text, you can search for the start of the components that contain the pattern, for example:

>> region p including "lrep"

Here, the match point is on the beginning <p> tag, rather than the pattern.

The pr command can be modified to print from the match point to the end of a defined text component. For example, the following command prints the entire paragraph that contains the match:

>> pr.region.p

Substituting save for pr puts the results in a file rather than displaying them on the screen.


>> region including "{PrintMode}"
  3: 1 match

Restricting the Number Found


>> region including "save"
  4: 3 matches

The most-used component of text can be set up as the default component. This means you do not have to specify the text component you wish to search or print, for example:

>> region including "{PrintMode}"
>> pr.region

In this manual, the most-used components are the sections or modules (text from the beginning of one main heading to the beginning of the next).

The first example on the facing page searches for modules that include references to {Printmode}. This allows you to see the heading of the module and to start reading at the beginning of the module rather than somewhere in the middle.

You may want to restrict your search to those components that contain more than a certain number of references to a word. For example, if you wanted to search this manual for information on saving search results, you would expect that the module with the most information would mention the word save at least more than once. To get only those modules that have 2 or more references to save within them, the command would be as follows:

>> region including.2 "save"

You can see on the facing page that although three modules contain the word save only one module contains more than one reference to the word.

[edit] Defining Your Own Components

>> chapter = region "<chap>".."</chap>"
  chapter = 14 matches

>> pr
       54, ..LLES</btitle> <chap><no>1</no><ctitle> MR. SHERLOCK HOLMES</ctit..
    12902, ..ance." </chap><chap><no>2</no><ctitle> THE CURSE OF THE BASKERVI..
    34180, ..ound!" </chap><chap><no>3</no><ctitle> THE PROBLEM</ctitle> <p> ..
    50432, ..ning." </chap><chap><no>4</no><ctitle> SIR HENRY BASKERVILLE</ct..
    73037, ..otel." </chap><chap><no>5</no><ctitle> THREE BROKEN THREADS</cti..
    92085, ..more." </chap><chap><no>6</no><ctitle> BASKERVILLE HALL</ctitle>..
   111298, .. wall. </chap><chap><no>7</no><ctitle> THE STAPLETONS OF MERRIPI..
   138270, .. Hall. </chap><chap><no>8</no><ctitle> FIRST REPORT OF DR. WATSO..
   153241, ..etter> </chap><chap><no>9</no><ctitle> SECOND REPORT OF DR. WATS..
   188107, ..etter> </chap><chap><no>10</no><ctitle> EXTRACT FROM THE DIARY O..
   207432, ..stery. </chap><chap><no>11</no><ctitle> THE MAN ON THE TOR</ctit..
   232122, ..n in." </chap><chap><no>12</no><ctitle> DEATH ON THE MOOR</ctitl..
   255131, ..s end. </chap><chap><no>13</no><ctitle> FIXING THE NETS</ctitle>..
   277326, ..isit." </chap><chap><no>14</no><ctitle> THE HOUND OF THE BASKERV..

Searching Chapter Components


>> "mrs. lyons" within *chapter
  2: 8 matches

>> pr sample.3
   212590, ..ontinued. <p> Mrs. Lyons flushed with anger again. <p> "Really, ..
   214623, ..t point. <p> "Mrs. Lyons," said I, as I rose from this long and ..
   209503, ..ssion left by Mrs. Lyons was one of extreme beauty. Her eyes and..


>> *chapter including "mrs. lyons"
  3: 2 matches

>> pr
   207432, ..stery. </chap><chap><no>11</no><ctitle> THE MAN ON THE TOR</ctit..
   255131, ..s end. </chap><chap><no>13</no><ctitle> FIXING THE NETS</ctitle>..

You can define your own components if your text has no pre-defined ones or if those that exist do not suit your purpose. However, these are only available for the duration of your DLXS XPAT session.

You define a text component by specifying its starting and ending text. For example, suppose you wanted to define chapter components for every chapter in a book. Let's assume that each chapter has the following tags:


<chap><no>13</no><ctitle>FIXING THE NETS</ctitle>...</chap>

The <chap> tag marks the beginning of the chapter and the </chap> tag marks its end. To create this text component, the command is as follows:

>> chapter = region "<chap>".."</chap>"

The name to the left of the equals sign is the name you'll use later to search this component. The region command identifies this command sequence as a document definition rather than a range search. The first pattern is the starting text of the component and the second pattern is the ending text of the component.

Searching your own components differs slightly from searching pre-defined components. Instead of putting region in front of the component name, the component name is preceded by an asterisk. For example, to search for occurrences of a pattern within chapters, the command is:

>> "mrs. lyons" within *chapter

To search for the beginning of chapters that contain references to a pattern, the command is:

>> *chapter including "mrs. lyons"

A pr command after this example search shows that chapters 11 and 13 contain the reference.


>> *ctitle = region "<ctitle>".."</ctitle>"
  ctitle = 15 matches

>> *ctitle including holmes
  5: 1 match

>> pr.region.*ctitle
       70, ..<ctitle> MR. SHERLOCK HOLMES<..

Printing a Poem Defined by Content


>> poem = region "O, my luve's".."played in tune."
  poem = 1 match

>> pr.region.*poem
       38, ..O, my luve's like a red, red rose That's newly sprung in June. O, my
luve is like the melodie That's sweetly p..

Shifting the End Point


>> poem = region "O, my luve's"..(shift.14"played in tune.")
  poem = 1 match

>> pr.region.*poem
       38, ..O, my luve's like a red, red rose That's newly sprung in June. O, my
 luve is like the melodie That's sweetly played in tune...

Printing your own components differs slightly from printing pre-defined components. Assume you have defined components called ctitle which contain chapter titles.

To print out the titles that contain the word holmes, the commands would be as follows:

>> *ctitle including holmes
>> pr.region.*ctitle

When printing a component you've defined, you need to include an asterisk (*) before its name.

Substituting save for pr puts the results in a file rather than displaying them on the screen.

Notice that the titles end at the first letter of the tag rather than at the end of the closing tag. DLXS XPAT creates components by starting at the first character of the starting text (the < in <ctitle>) and going to the first character of the ending text (the < in </ctitle>). This means that anything after the first character in the ending text is not included in the definition of the component.

This can be a problem when you use the content of the text to define a component. Suppose you have defined a poem by specifying its first and last phrase as follows:

>> poem = region "O my luve's".."played in tune."

You can not search for the phrase in tune within *poem because the component ends at the p in played. To get around the problem, when defining the component you can use the shift command to move the match point to the end of the phrase:

>> poem = region "O my luve's"..(shift.14 "played in tune.")

This moves the match point 14 characters to the right, to the period after tune.

[edit] Searching a Hierarchy of Text Components

Some documents, such as dictionaries or encyclopedias, are highly structured. The Oxford English Dictionary (OED) is a good example of a highly structured text. The entry for broom-man from the electronic version of this text appears on the facing page.

The OED has many tags to label the components of the text. A few levels of text components have been marked in the excerpt. The beginning and ending entry tags <E> and </E> surround the whole entry. Within the entry are many other tags. For example, the tags <ET> and </ET> surround the etymology section of the entry, the part that describes the origin of the word. The <Q> and </Q> tags surround each supporting quotation. Within the quotation tags, the <D>, <A>, <W> and <T> series of tags surround the date, author, work, and text components of the quotation, respectively.

Components can exist on the same level or be nested within one another.


>> region Q including science
  1: 2967 matches

>> region E including %
  2: 2503 matches

>> region HL within %
  3: 2564 matches

>> pr.region.HL sample
 42619114, ..<HT>cipher</HT>..
 79862157, ..<HT>dogmatism</HT>..
116078846, ..<HT>geo-</HT>..
148869689, ..<HT>install</HT>..
189204259, ..<HT>monogenetic</HT>..
221894660, ..<HT>phlebotomy</HT>..
253789940, ..<HT>refractory</HT>..
298490165, ..<HT>snobo.grapher</HT>..
326252481, ..<HT>supermu.ndal</HT>..
361110614, ..<HT>undo.tted</HT>..

Combining Commands


>> region HL within (region E including (region Q including science))
  4: 2564 matches

In a highly structured text like the OED, there will be times when you'll need to search through a hierarchy of components for a pattern. An example is a search for a component that contains a sub-component that contains a pattern. For example, consider the following problem:

You would like to find the names of the dictionary entries that contain quotations with the pattern science. You can break this search into three steps.

  1. Find quotation components that contain science.
>> region Q including science
  1. Find the entry components that contain these quotation components.
>> region E including %
  1. Find the entry name (headword lemma) within these entries.
>> region HL within %

% stands for the previous results.

You can then use pr.region.HL to print only the headword lemmas.

An alternative way to do this search is to type it all on one line, using parentheses to show DLXS XPAT what steps it should do first:

>> region HL within (region E including (region Q including
science))

DLXS XPAT searches within parentheses first. Using this set of results, DLXS XPAT does the final search, producing the final set of results. With this alternative, you don't save any intermediate results.

[edit] Manipulating Sets of Results

When doing more complex searches, you may deal with many sets of results. To keep track of them, you can give them descriptive names and refer to them by name rather than by set number. You can also do basic set operations. You can combine sets (union), find the difference between sets (difference) and find where sets share results (intersection).

[edit] Naming Set Results

>> bldog = "black " near "dog "
  bldog = 40 matches

>> "black " near "cat "
  2: 26 matches

>> blcat = 2
  blcat = 26 matches

>> blackcat = *blcat
  blackcat = 26 matches

Using a Name in a Search


>> pr.200 sample.1 *bldog
 23379319, ..his luck..The black dog was on his back, as people say, in terrifyi
ng nursery metaphor.</T></Q></QP></S6></S4></E><E><HG><HT>black drop..

Previous set results are frequently referred to by their set numbers; however, you can also refer to them by name. You name a set of results by preceding the search with the name and an equals sign, for example:

>> bldog = "black  " near "dog  "

The results are assigned to the name bldog. You can also name a previous set of results, for example:

>> blcat = 2

The above command assigns the name blcat to the results of set 2. You can use this name instead of the number 2 when referring to this set.

When you refer to a set by name you must precede it with an asterisk; otherwise, DLXS XPAT will search for the name as a pattern rather than as a set of results. The only time you can specify the name without an asterisk is when you assign a value to it (when the name appears before the equals sign), for example:

>> blackcat=*blcat

With the above command, there are now three ways to refer to our example set: by set number, by the name blcat or blackcat.

A second example of referring to a named set is as follows:

>> pr.200 sample.1 *bldog

The above command prints 200 characters around a single match in the set identified by *bldog.

Names can contain any letter in the alphabet or any number; however, they cannot start with a number. You cannot use special characters such as . or ? in your name unless you enclose the whole name within quotation marks: "name?", for example.

[edit] Combining and Comparing Sets of Results

Union
The union of sets refers to the merging of two sets into one. Any duplicates are eliminated from the final set.
Difference
Set difference refers to the difference between two sets. Matches that appear in the first set that also appear in the second are eliminated from the final set. In other words, matches from the first set that do not appear in the second set form the new set.
Intersection
The intersection of sets refers to the comparison of two sets to find common matches. These common matches form a new set.

Difference Between Sets

Intersection of Sets

You can further refine your search by combining or comparing sets. You can compare sets to find the different or common elements. These operations are the same as the mathematical set operations: union, difference and intersection.

The pictures on this and the facing page may help you visualize these set operations. The circles beside represent sets of results. The shaded portion represents the result of the operation.

The first example combines the contents of two sets into one set. The matches to the word clue are combined with the matches to the word evidence producing one set of results.

The second example finds the difference between the first and second sets. The difference can be defined as any matches in the first set that are not also in the second. Looking on the facing page, a set containing quotations dated 1591 is compared with a set of quotations attributed to Shakespeare. The common match point is the <Q> tag (stands for the beginning of a quotation). The two quotations that appear in the first set but are not in the second make up the final set. These are quotations dated 1591 that are not attributed to Shakespeare

The third example finds the matches common to both sets. For example, the set of quotations dated 1591 is compared to the set of quotations attributed to Shakespeare. DLXS XPAT produces a new set containing matches common to both sets, quotations that are both dated 1591 and are attributed to Shakespeare. The result is one match.


>> title = "mr. " + "mrs. "
  title = 27780 matches

>> pr sample
270892615, ..h of blood to Mr. Dombey's face. </T></Q><Q><D>1873</D> `<A>Ouid..
293441616, ..T>Enquire..of Mr. Kimpson at the Castle, a *Silk-shop. </T></Q><..
134648143, ..Jan. 134/3 <T>Mr Scott's *hexalogy closes with what we may call ..
392105523, ..D>1846</D> <A>Mrs. A. Marsh</A> <W>Fr. Darcy</W> xxxiv, <T>In a ..
364579121, ..D>1844</D> <A>Mrs. Browning</A> <W>Vis. Poets</W> cxlii, <T>The ..
397138778, ..D>1779</D> <A>Mrs. Delany</A> <W>Let. to Mrs. Port</W> 17 Apr., ..
216295871, ..D>1773</D> <A>Mrs. Grant</A> <W>Lett. fr. Mount.</W> (1813) I. x..
272959533, ..D>1842</D> <A>Mrs. Kirkland</A> <W>Forest Life</W> I. 180 <T>In ..
374852111, ..D>1789</D> <A>Mrs. Piozzi</A> <W>Journ. France</W> II. 42, <T>I ..
298915502, ..people', said Mrs. Silchester slyly.</T></Q></QP></S6></S4><p><S..


>> region A including ("mrs. a. marsh" + "mrs. grant" + "mrs. kirkland")
  2: 151 matches

>> region Q including %
  3: 151 matches

>> pr.region.Q sample.4
276705709, ..<Q><D>1839</D> <A>Mrs. Kirkland</A> in Griswold <W>Prose Writers Am
er.</W> (1847) 464 <T>Fetch the broom, Betsey! and the scrub-broom, Betsey! </T>

</Q>..
355452981, ..<Q><D>1805</D> <A>Mrs. Grant</A> in Campbell <W>Mem. & Corr.</W

> (1844) I. 59 <T>Two hours of tweedle-dum and tweedle-dee were too much for me.
 </T></Q>..
367573536, ..<Q><D>1839</D> <A>Mrs. Kirkland</A> <W>New Home</W> xxxiv. 231 <T>T
hose [ladies] who had unwarily sported silks and other unwashables, looked acid
and uncomfortable.</T></Q>..
383947574, ..<Q><D>1846</D> <A>Mrs. A. Marsh</A> <W>Father Darcy</W> xxviii, <T>

 `How are the ways?'  `Deep and difficult enough, please your honour.' </T></Q>..

Combining several sets of results into one is advantageous when you are interested in collective results. If you are interested in the number of occurrences of the word Mrs. as compared to the word Mr., you can search for each individually, and produce two sets of results. If, however, you are interested in titles in general, you can use the + operator, for example:

>> title = "mr.  " + "mrs.  "

This command finds the set of matches for mr. and the set of matches for mrs. and combines them in one set.

When combining sets, duplications are eliminated. For example, if you combine the set of matches to mrs. with the set of matches to mrs. delany, your answer will be the number of matches to mrs. (since all matches to mrs. delany already appear in the first set.)

A third example searches for quotations taken from the works of three authors:

>> region A including ("mrs. a. marsh" + "mrs. grant" + "mrs. kirkland")
>> region Q including %


>> dictat - "dictator "
  4: 520 matches

>> pr sample.4
341282602, .. to have been dictated by actual theophobia. </T></Q><Q><D>1899<..
248518830, ..o follows the Dictates of his own Fancy. </T></Q></EQ><Q><D>1791..
 74125842, ..Feb. 6/3 <T>A dictation cylinder will contain from 1,000 to 1,20..
241598024, ../XR>: cf. <CF>dictatorial</CF>, <CF>senatorial</CF>, etc.</ET> <..


>> ("1880".."1884") - "1883"
  5: 57438 matches

>> pr sample.4
384817917, ..<PQP><Q><D>C. 1880</D> <W>Cassell's Nat. Hist.</W> IV. 324 <T>Fa..
143528300, ..</T></Q><Q><D>1881</D> <W>Athenaeum</W> No. 2811. 348/1 <T>For H..
231719451, ..</T></Q><Q><D>1882</D> <A>Ogilvie</A>, <T><i>Polyspermal, Polysp..
112158046, ../Q></EQ><Q><D>1884</D> <A>G. W. Sears</A> <W>Woodcraft</W> (Cent..


>> quotes1591 = region Q including (region D including "1591")
  quotes1591 = 5240 matches

>> quotesshaks = region Q including (region A including shaks)
  quotesshaks = 32867 matches

>> *quotes1591 - *quotesshaks
  8: 3814 matches

>> pr.region.Q sample.1
144408445, ..<Q><D>1591</D> <A>Percivall</A> <W>Sp. Dict.</W>, <T><i>Desatino</i

>,..rashnesse, inconsideratenesse, folly. </T></Q>..

>> region Q including %
  9: 3814 matches

>> "Tears of Muses" within %
  10: 21 matches

The - operator can be used to exclude specific matches from a search. For example, to find all words that begin with dictat except for the word dictator, the command is as follows:

>> dictat - "dictator  "

This next example excludes the number 1883 from the set of numbers beginning 1880 to 1884.

>> ("1880" .. "1884") - "1883"

A third example searches the OED for all quotations dated 1591 except those attributed to Shakespeare. You can break this search into three steps:

  1. Find quotations dated 1591 (D is the component defined by the <D> and </D> tags: dates in this example).
>> quotes1591 = region Q including (region D including "1591")
  1. Find quotations by Shakespeare (A is the component defined by the <A> and </A> tags: authors in this example).
>> quotsshaks = region Q including (region A including shaks)
  1. Find the difference between the two sets.
>> *quotes1591 - *quotesshaks

The result of any set operation is a set of match points, not a set of components. The result of the set difference above is a set of match points <Q>, not the entire Q components. For this reason, you cannot now search for a pattern within this set without first searching for the Q components that contain the match points. Then you can search within those components, for example:

>> region Q including %
>> "Tears of Muses" within %


>> *quotes1591 ^ *quotesshaks
  11: 1426 matches

>> pr.region.Q sample.2
127749138, ..<Q><D>1591</D> <A>Shaks.</A> <W>Two Gent.</W> <sc>iv. </sc>ii. 136

<T>By my hallidome, I was fast asleepe. </T></Q>..
270915591, ..<Q><D>1591</D> <A>Shaks.</A> <W>Two Gent.</W> <sc>i. </sc>iii. 41 <

T>To-morrow..Don Alphonso, With other Gentlemen of good esteeme Are iournying, t
o salute the Emperor, And to commend their seruice to his will. </T></Q>..

>> region Q including %
  12: 1426 matches

>> "1 Hen. VI" within %
  13: 696 matches

>> oxford = "university " fby "oxford "
  oxford = 94 matches

>> cambridge = "university " fby "cambridge "
  cambridge = 56 matches

>> *oxford ^ *cambridge
  16: 10 matches

>> pr.135 sample.3
207959421, ..orator of the university of Cambridge. </T></Q><Q><D>1899</D> <W>Ox
ford Univ. Cal.</W> 1 <T>Public Orator.  1880 William Walter Merry, ..
337249607, ..lished in the University of Cambridge, was the earliest University
office at Oxford [<i>c</i> 1209].</T></Q></QP></S6></S4><p><S4><#>2<..
 48244107, ..n the ancient University of Paris, and the ancient colleges of Oxfo
rd and Cambridge. </DEF></S6><S6><#>b</#> <DEF>A foundation of the s..


The ^ operator lets you determine if two or more sets share any results. In the previous example, set difference was used to find quotations dated 1591 not attributed to Shakespeare. These same two sets can be intersected to find quotations dated 1591 that are attributed to Shakespeare:

>> *quotes1591 ^ *quotesshaks

This query can also be solved by searching for quotations dated 1591 in the way described previously and then using the command % including (region A including shaks) to pull out those quotations that also have shaks as the author.

The result of any set operation is a set of match points, not a set of components. The result of the set intersection above is a set of match points <Q>, not the entire Q components. For this reason, you cannot search for a pattern within this set without first searching for the Q components that contain the match points. Then you can search within those components, for example:

>> region Q including %
>> "1 Hen. VI" within %

Consider another example of intersection. Suppose you are interested in the word university only if it's followed by a reference to both oxford and cambridge. The first step is to look for university followed by oxford:

>> oxford = "university  " fby "oxford  "

The second step is to look for university followed by cambridge:

> cambridge = "university  " fby "cambridge  "

Finally, to find matches that are common to both sets of results, the command is as follows:

>> *oxford ^ *cambridge

This query can also be solved by searching for university followed by oxford as described and then typing the command % fby cambridge.

[edit] Appendices

There are some common errors that people make when using DLXS XPAT the first few times. It's easy to mistype command names or leave out closing quotation marks when you're not completely familiar with the program. Appendix A explains how to interpret common error messages and solve common problems. Following Appendix A is a pullout Quick Reference Guide that lists all the commands and examples of each.

[edit] Appendix A: Solving Problems

Figuring Out Error Messages

Figuring Out Error Messages cont'd

General Problems

Personal tools