XPAT command, syntax, and concept guide
From DLXS Documentation
Main Page > Working with XPAT > Full XPAT Manual > XPAT command, syntax, and concept guide
[edit] XPAT Command Manual
The following provides a summary of XPAT commands, settings, and concepts, and is based extensively on Open Text's PAT 5.0 documentation. Many of the commands included here are not implemented in DLXS middleware.
[edit] List of Commands (TOC)
- [#{CommandFile} {CommandFile}]
- [#comment comment]
- [#{DefaultRegion} {DefaultRegion}]
- [#difference difference]
- [#done done]
- [#double quote double quote]
- [#exec exec]
- [#export export]
- [#{ExportFile} {ExportFile}]
- [#fby fby]
- [#first first]
- [#~free ~free]
- [#~freeall ~freeall]
- [#history history]
- [#{History} {History}]
- [#{HistoryFile} {HistoryFile}]
- [#import import]
- [#including including]
- [#index_point index point]
- [#intersect intersect]
- [#{Label} {Label}]
- [#last set last set]
- [#{LeftContext} {LeftContext}]
- [#macro macro]
- [#naming sets naming sets]
- [#near near]
- [#next next]
- [#~nextemp ~nextemp]
- [#not not]
- [#offsets offsets]
- [#pr pr]
- [#{PrintLength} {PrintLength}]
- [#{Proximity} {Proximity}]
- [#~qnum ~qnum]
- [#quiet mode quiet mode]
- [#{QuietOff} {QuietOff}]
- [#{QuietOn} {QuietOn}]
- [#quit quit]
- [#range range]
- [#rankedby rankedby]
- [#region region]
- [#sample sample]
- [#{SampleSize} {SampleSize}]
- [#save save]
- [#save.commands save.commands]
- [#{SaveFile} {SaveFile}]
- [#save.history save.history]
- [#set name set name]
- [#set number set number]
- [#sets sets]
- [#{Settings} {Settings}]
- [#shift shift]
- [#signif signif]
- [#{SortOrder} {SortOrder}]
- [#stop stop]
- [#string search string search]
- [#subset subset]
- [#~sync ~sync]
- [#thesaurus thesaurus]
- [#union union]
- [#within within]
[edit] Command and Settings Documentation
[edit] {CommandFile}
{CommandFile
string}
changes the file name used by save.commands
and exec
.
The CommandFile
setting determines which file the save.commands
command writes to and the exec
command reads from. It has a default value of xpat.cmd
. If the string begins with a numeral or contains blanks or non-alphanumeric characters, it must be enclosed within double quote
marks. The file name must also conform to the file naming conventions of the host operating system. It can be changed at any time during a XPAT session and remains in effect until changed again or until the end of the session. The current value of CommandFile
is displayed by the command {Settings}
.
[edit] Examples:
>> {CommandFile "/usr/new/output_file"}
This changes the setting to the value /usr/new/output_file
which any subsequent save.commands
command writes to and exec
command reads from.
[edit] See also:
exec
, save.commands
, Settings
[edit] comment
#
marks the start of a comment.
The comment, that is the #
and the rest of the line following the #
, is ignored by XPAT. The comment can be placed on a line by itself or following a XPAT query. It is useful for annotating queries stored in a file to be processed in batch mode or to be read in by the exec
command. The queries may be created externally or generated during a XPAT session and saved by save.commands
for later use.
[edit] Examples:
>> #
find all the Shakespearean quotations
>> region Quote incl (region Author incl "shaks")
The line beginning with the #
is ignored by XPAT.
>> first = region "<E>" .. "</L>"
#
find first language
XPAT creates a new region set with this command. The rest of the line, beginning with the #
, is ignored.
[edit] See also:
exec
, save.commands
, save.history
[edit] {DefaultRegion}
{DefaultRegion
string}
determines which region set is the current default.
The DefaultRegion
setting designates a special region set, known as the default region. The default region can be referred to as region
without specifying the actual region name. The setting can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session.
If the string giving the setting value begins with a numeral or contains blanks or non-alphanumeric characters it must be enclosed within double quote
marks.
Using the default region in a command, without having previously specified one, is illegal and results in the following message.
No information for region in the data dictionary
For convenience, a frequently used DefaultRegion
setting can be defined within an init
file whose location is given in the data dictionary file. The init
file is read and executed by a XPAT session when it is started (see the data dictionary documentation for details).
[edit] Examples:
>> region including "constitution"
The region set referred to in the example by region
is the one designated by the DefaultRegion
setting.
>> {DefaultRegion HeadLine}
>> region including constitution
The first line changes the DefaultRegion
setting to the value HeadLine
. The command that follows uses the region set HeadLine
even though it is not specified.
[edit] See also:
data dictionary documentation, including
, pr
, region
save
, Settings
, within
[edit] difference
set1 -
set2
removes members from a set.
The difference
operator (-
) creates a new set containing the members of set1 that are not members of set2. Set1 and set2 can be either point sets or region sets. The new set is of the same type as set1.
If either set1 or set2 is a region set, the first pointer delineating each region is used to determine if a member of set1 also occurs in set2. Thus, for set arithmetic (difference, union and intersection) in XPAT, set members of a region set are considered to be equal if they start at the same location in the text. The end point of a region is ignored in such operations.
[edit] Examples:
>> "to"-
"to "-
"to<"
Note that these operators are parsed left to right and can be combined without bracketing. This query creates a point set that contains all the matches to the prefix to
excluding those to the string to
followed by a blank or a left angle bracket. Assuming an index in which all punctuation has been mapped to blanks, the result contains words starting with to
) but not the word to
.
>> ("q" -
"qu") within region HeadWord
This query creates a point set. The point set includes all words located in a Headword region that begin with q
but not with qu
.
>> region Story incl "music " -
region
Story incl "art "
This query creates a region set. The region set is comprised of all Story regions that include the string music
but not the string art
.
>> region Q -
"<Q><D>"
Assume that the regions described by region Q
all begin with the string <Q>
. The above query creates a region set of the members of region Q
that do not have the string <D>
immediately following the <Q>
.
[edit] See also:
intersection
, union
[edit] done
done
terminates a XPAT session.
The done
command ends the session and causes the XPAT process to exit. A message may be generated telling how much computer time has been used during the session.
[edit] See also:
quit
, stop
[edit] double quote
"
string"
allows the use of strings that include special characters.
Normally, XPAT interprets a sequence of characters as a string and searches the database for matches to it. However, there are certain types of strings that XPAT cannot recognize as search targets unless they are enclosed within double quote
marks. The special strings are: strings which begin with a numeral, for example 2nd
; strings which contain blanks or non-alphanumeric characters, for example end of the year
or <Author>Scott
; and strings which are XPAT commands, for example near
and within
. In each case, a string that is not enclosed in double quote
marks but should be will result in a syntax error or unexpected result.
Note that if numbers are not enclosed in double quote
marks, they are interpreted as a reference to the number of a set previously calculated in the XPAT session.
A pair of quotes representing an empty string (""
) stands for the set of all index points
in the text being searched.
[edit] Examples:
>> "done " >> done
The first command creates a point set containing matches to the word done
. The second command ends the XPAT session.
>> 19 within region Date >> "19" within region Date
The first query finds those members of the previously calculated set, identified by the number 19, that are within region Date
. The second query finds the matches to the string 19
within region Date
.
>> ""
This command produces a list of every point indexed in the text.
>> "_XPat_1" = "match this string " >> "_XPat_OP1" = region "Region Set 5" >> "_XPat_2" = *"_XPat_1" within *"_XPat_OP1"
The above sequence of commands might be produced by a program that accepts input from a user and generates commands that are sent to XPAT. Since the names contain non-alphanumeric data they must be bounded by quotation marks.
[edit] See also:
index point
, region
, set name
, string search
[edit] exec
exec
reads a file into a XPAT session and executes the commands contained in the file.
The name of the file read by the exec
command is determined by the value of the CommandFile
setting. By default, the value is xpat.cmd
but can be changed at any time during the XPAT session.
The exec
command can be used to enter queries to a XPAT session. The queries, for example macro definitions, may be recorded in a file using an editor or saved in a file from a previous XPAT session using save.commands
.
[edit] Examples:
>>{CommandFile "/usr/xpat/srch023.q"}
>>exec
The first command sets the name of the file to be read by any exec
command to /usr/xpat/srch023.q
. The second command reads the file /usr/xpat/srch023.q
and executes the commands contained in the file.
[edit] See also:
save.commands
[edit] Settings:
CommandFile
[edit] export
export
set1
saves information about sets created in a XPAT session.
Export
writes a detailed description of the members of set1, created during a XPAT session, to a file. The description includes the type (region or point) of the set and sufficient information to recreate a copy of the set. The name of the file is determined by the value of the ExportFile
setting. By default the file name is xpat.exp
but can be changed during a session by using the command ExportFile
. When export
writes to the named file it writes over anything that may currently exist in the file. Assuming a default ExportFile
setting of xpat.exp
, the following message is given:
Exporting to xpat.exp.
The file may subsequently be read into a XPAT session by the import
command.
If the saved set is a frequently used region set, it can be made available as a predefined region in future XPAT sessions by editing the data dictionary file and adding the appropriate information. If the new region set, containing 150 regions, is named newregion
and saved in the file newregion_file
, the following lines, added to the data dictionary, would make it available to XPAT.
<Region> <Name>newregion</Name> <Desc>This new region set describes ....</Desc> <File> <SysName>newregion_file</SysName> <Offset>0</Offset> </File> <Count>300</Count> <Type>pairs</Type> </Region>
[edit] Examples:
>> "tax" near "increase"
>> export
%
The first query creates a point set of the matches to the string tax
when it is within the current Proximity
of the string increase
. The second command writes this point set to the file xpat.exp
. The information written to the file contains header information followed by details about each element in the set.
>> {ExportFile "v.exp"}
>> verse = region "<V>" .. "</V>"
>> export
*verse
The first line of the example changes the ExportFile
setting to v.exp
. The second line creates a region set and names it verse
. The third command writes header information and a description of each member of the region verse
to the file v.exp
.
[edit] See also:
data dictionary documentation, import
[edit] Settings:
ExportFile
[edit] {ExportFile}
{ExportFile
string}
changes the file name used by export
and import
.
The ExportFile
setting determines the file written by the export
command and read by the import
command. It has a default value of xpat.exp
. If the string begins with a numeral or contains blanks or non-alphanumeric characters, it must be enclosed within double quote
marks. The file name must also conform to the file naming conventions of the host operating system. It can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session. The current value of the ExportFile
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {ExportFile "/usr/new/export_file"}
This changes the value of the setting so that any subsequent export
or import
command utilizes the file /usr/new/export_file
.
[edit] See also:
export
, import
, Settings
[edit] fby
set1 fby
set2
finds members of sets that occur close to each other in a specified order.
Fby
(followed by) creates a set containing those members of set1 that have one or more members of set2 within a specified number of characters to their right . Set1 and set2 may be either point sets or region sets. The new set is of the same type as set1.
The distance between members of the two sets is calculated by counting the number of characters in the text from the first character of a member of set1 to the first character of a member of set2. The measure used to determine closeness
is the value of the Proximity
setting which has a default value of 80 characters. This can be changed for all subsequent uses of fby
by changing the Proximity
setting, or it can be changed for an individual use of fby
by using a modifier attached to the command. The form of the modifier is a period followed by a number representing the maximum distance (in characters).
If either set1 or set2 is a region set, the first of the two pointers delineating the region is used to determine the distance between the set members.
Multiple fby
commands are not parsed left to right. A command of the form
set1fby
set2fby
set3
is handled as if parenthesized as follows:
set1fby
(set2fby
set3)
The command not fby
creates a set containing the members of set1 that are not
within the specified distance to the left of any member of set2.
set1not fby
(set2fby
set3)
is the same as
set1 - (set1fby
(set2fby
set3))
[edit] Examples:
>> "law " fby
"order "
Assuming a Proximity
of 80, this query creates a point set containing the matches to law
with one or more matches to order
within 80 characters to their right, counting from the l
in law
to the o
in order
.
>> region Title fby.30
region Author
This query creates a region set containing the members of the set region Title
that have one or more members in the set region Author
within 30 characters to the right. The distance is measured as the number of characters from the first character of a Title region to the first character of an Author region.
>> "law " not fby
"order "
This query creates a point set containing the matches to law
that do not have a match to order
within 80 characters to the right, calculating the distance as in the first example.
>> "law " not fby.30
"order "
This query creates a point set containing the matches to law
that do not have a match to order
within 30 characters to the right.
[edit] See also:
near
[edit] Settings:
Proximity
[edit] first
first
set1
finds a specific number of contiguous members from the start of a set.
First
creates a set of a specified size which is comprised of members from the beginning of set1. The members of the new set are in the order they appear in set1. Set1 may be either a region set or a point set. The new set is of the same type as set1.
The operation of the first
command involves the set member counter that keeps track of the selected members, the identification of the size of the requested set, and the SortOrder
setting that determines which members are in the new set.
First
selects members from the beginning of a set. The ordering of a set, and hence which members occur at the beginning, is controlled by the SortOrder
setting. If the SortOrder
setting is Alpha
, the set is ordered alphabetically. If the SortOrder
setting is Occur
or OccurHead
, the set is ordered according to occurrence in the text. If the SortOrder
setting is AsIs
, the set order is the current one which may be either alphabetic or occurrence order.
Each set that is used with a first
, next
or ~nextemp
command has a cursor (set member counter) associated with it. The cursor indicates the location in set1 at which to start selecting members for the set being created. Each first
command resets the cursor so members for the new set are chosen from the beginning of set1. On completion of the first
command the cursor is updated to point at the beginning of the next set. Note, when the SortOrder
setting changes and the set ordering is changed, the cursor is reset to the beginning of set1.
The size of the set created is determined by the value of SampleSize
which has a default value of 10. If the size of set1 is less than SampleSize
then the new set created is the same size as set1. Changing the SampleSize
setting affects all subsequent uses of first
during the current session. For an individual use of the command, the size of the new set can be specified by using a modifier attached to the first
command. This modifier is in the form of a period followed by a numeric value giving the desired set size.
The first
command can be used by itself or with the pr
, save
or export
commands.
[edit] Examples:
>> {SampleSize 40}
>> first
5
The first line changes the SampleSize
setting to 40 and the second line creates a set that contains the first 40 members of set number 5 created earlier in the XPAT session.
>> first
.10 "the best of "
This line creates a set containing the first 10 members in the set of matches to the phrase the best of
.
>> first
.0 3
This query resets the cursor to the first member of set number 3.
[edit] See also:
next
, ~nextemp
, sample
, set number
, subset
[edit] Settings:
SampleSize
, SortOrder
[edit] ~free
~free
number
releases a XPAT set.
Following the ~free
command, the set number is no longer available for reference in a XPAT command. The set is no longer displayed by the history
command.
If the sets freed are at the end of the current history list, the set numbers will be reused for the next sets created in the XPAT session. For example, if the history list contains set numbers 1 to 8, and 6 through 8 are freed using the ~free
command, the next set number assigned is 6. However, if set number 2 is freed and the history list includes set numbers 1 to 8, the next set is number 9.
[edit] Examples:
>> ~free
4
This removes set number 4 from the history list. The set can no longer be accessed by number reference.
[edit] See also:
~freeall
, history
[edit] ~freeall
~freeall
releases all XPAT sets.
Following the ~freeall
command, all the sets that existed in the current session are no longer available for reference in a XPAT command. In addition, those sets are no longer displayed by the history
command.
Following the ~freeall
, the next set number assigned is 1.
[edit] Examples:
>> ~freeall
This removes all the current sets in the history list from the history list. Following the command, no previously created sets can be referenced, and the next set that is produced is assigned the number 1.
[edit] See also:
~free
, history
[edit] history
history
displays the record of the current XPAT session.
Information about each set created during the XPAT session is recorded in a history list. For each of the sets, history
displays a set number, the number of members in the set and the query that produced the set. Sets created during the current session can be accessed by referring to the number of the set in the history list. The results of pr
, save
, Settings
, { }
, and certain tilde (~
) commands do not appear in this list since no sets are produced by these commands.
As the entire history list may become quite long, it is useful to be able to view only a part of the list. The History
setting determines what portion of the history list is displayed by the history
command. The History
setting has a default value of 0. This indicates that the entire history list is to be displayed. When set to an integer n (any integer greater than zero) the final n elements in the history list are displayed by any subsequent use of the history
command during the session.
The items listed can also be changed for an individual use of the history
command. Modifiers may be attached to the history
command to request that a certain number of items and that a particular portion of the list be displayed.
The first modifier, in the form of a period followed by a number, indicates where in the history list to begin the display. A positive integer p requests that the display start at the pth item from the start of the history list. A negative integer p requests that the display start at the pth item from the end of the history list. The number of items displayed is the value of the History
setting.
The number of items displayed can also be changed for an individual use of the history
command by using a second modifier attached to an already modified history
command. This second modifier is also in the form of a period followed by a number giving the number of items to be displayed.
The default maximum size of the history list is 300 items. If more than 300 sets are created the last 300 sets created during this XPAT session are retained in the list. This maximum size can be altered by a command line parameter when starting a XPAT session.
Note that a set can be removed from the history list by the ~free
command.
[edit] Examples:
>> "univ"
>> pr sample %
>> "waterloo"
>> 1 near 2
>> pr
>> history
Assuming the above are the only commands executed in the XPAT session to this point, the result of the history
command would be as follows:
1: 11680, "univ" 2: 209, "waterloo" 3: 4, 1 near 2
>> {History 5}
>> history
The first command, in this example, sets the value of the History
setting to 5. The second command, and subsequent uses of the history
command in the session, will show information about the five final sets in the history list. The second command shows information about the final five sets in the history list.
>> history.3
This use of the history
command gives information about the commands in the history starting at the third element in the history list. Using the XPAT session described in the first example, above, the result of this would be.
3: 4, 1 near 2
>> history.-2
This use of the command gives information starting at the second element from the end of the history list. Again, using the first example, the result of this would be.
2: 209, waterloo 3: 4, 1 near 2
>> history.4.10
This use of history
gives information from the history list starting at the fourth entry on the list and continuing for ten entries.
>> history.-4.2
This use of history
gives information about the final two entries in the history list.
[edit] See also:
~free
, save.commands
, save.history
, set number
[edit] Settings:
History
[edit] {History}
{History
number}
changes the number of items from the history list displayed by the history
command.
The History
setting determines the number of items displayed by the history
command. Note that the setting may be overridden and the number of items displayed determined by a modifier for an individual use of the history
command. The default value of the setting is 0 indicating that all sets created in this session are to be shown by the history
command. The setting can be changed at any time during a XPAT session and stays in effect until changed again or until the end of the session. The current value of the History
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {History 30}
This changes the setting to the value 30 so that any subsequent use of the history
command during the session displays 30 items.
[edit] See also:
history
, Settings
[edit] {HistoryFile}
{HistoryFile
string}
changes the file name used by save.history
.
The HistoryFile
setting determines the file written by the save.history
command. It has a default value of xpat.his
. If the string begins with a numeral or contains blanks or non-alphanumeric characters, it must be enclosed within double quote
marks. The file name must also conform to the file naming conventions of the host operating system. It can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session. The current value of the HistoryFile
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {HistoryFile "/usr/new/history_file"}
This changes the HistoryFile
setting so that any subsequent use of the save.history
command during the session writes to the file /usr/new/history_file
.
[edit] See also:
save.history
, Settings
[edit] import
import
reads information that has been saved in a file by the export
command.
Import
reads data from a file and creates a new set which can be used as if it had been created during the current XPAT session. XPAT determines from the header information whether the saved set is a point set or a region set, and the new set is of the same type. The file read is determined by the ExportFile
setting which has a default value of xpat.exp
. The file name can be changed during a session by resetting the ExportFile
setting.
Assuming the default setting of ExportFile
, if the imported set is a region set, the following message is generated:
Importing regions from 'xpat.exp'
If it is a point set, the message generated is:
Importing point set from 'xpat.exp'.
[edit] Examples:
>> import
>> % within region Quote
The first command reads from the file xpat.exp
. The second line uses the imported set as an operand to a within
command and finds the members of the set that occur within region Quote
.
>> {ExportFile "v.exp"}
>> verse = import
>> *verse including ("blind " fby "ditch ")
The first command resets the ExportFile
setting to v.exp
. The second reads a set from the file v.exp
and names it verse
. The third query finds the set of imported regions that include the string blind
when it is followed by the string ditch
(the assumption has been made that this set is a region set).
[edit] See also:
export
[edit] Settings:
ExportFile
[edit] including
set1 including
set2
set1 incl
set2
find regions that contain members of a set.
Including
or incl
creates a set comprised of members of set1 that include one or more members of set2. Set1 must be a region set. Set2 may be either a point set or a region set. The new set is a region set.
Set1 may be a predefined region set, a region set created during the XPAT session using the region
command, a region set resulting from the use of the import
command, or the result of a previous query in the session.
If set2 is a point set, and if one or more of the points occur in a region from set1, then that set1 region is included in the new set.
If set2 is a region set, and the first of the pair of pointers (offsets into the text) describing a region of set2 is contained in a region of set1, that set1 region is included in the new set. The second pointer of the pair delineating set2 does not have to fall within the region of set1 in order that the set1 region be included in the new set.
The including
command can also be used to find regions that contain more than one member of set2, by attaching a modifier specifying the minimum number of members of set2 to the including
command. This modifier is in the form of a period followed by the value of the minimum number of members.
The command not including
creates a set containing those members of set1 that do not contain any of the members in set2.
set1 not including
set2
is the same as
set1 - (set1 including
set2)
Including
and within
are similar in that they both restrict searches to specified regions in the text. They differ in the set that is created. The including
command creates a set of regions that contain one or more members of another set, while within
creates a set of pointers or regions that are contained in members of a region set.
[edit] Examples:
>> region Story including
("Free trade"
near "Canada")
This query finds the regions described by region Story
that contain one or more matches to the string Free trade
when it occurs close to the string Canada
.
>> region Story including.3
("Free trade"
near "Canada")
This query finds the regions described by region Story
that contain at least three matches to the string Free trade
when it occurs close to the string Canada
.
>> region Quote not including
region Author
This query creates a set of Quote regions that do not contain the first pointer of the pair delineating an Author region.
>> dates = "1800" .. "1825"
>> region Date including
*dates
The first query creates a point set containing all the numbers that are alphabetically between 1800
and 1825
. The second creates the set of Date regions that contain one or more of these numbers.
>> region Quotationincluding
"Wright" >> %including
"Waterloo"
The first query creates a region set of quotations that contain the string Wright
. The second query finds the members in the new region set that also contain the string Waterloo
.
>> (*speech including "republican") including "democrat"
This query is similar to the previous one. It assumes that a region set named speech
has been defined and it finds the members of this set that contain both the string republican
and the string democrat
.
>> (*definition incl ("men" + "women")) incl "education" >> *definition including (("men" + "women") ^ "education")
The first query creates a set of definition regions that include the string education
as well as either men
or women
. Note that the second query does not create the same set but actually creates a set of size 0. This result is due to the fact that the intersection operation - (("men" + "women") ^ "education")
- produces an empty result. This result occurs since there are no members of the union set men + women
that are also members of the set education
(see definition of the union
operator).
[edit] See also:
intersect
, not
, region
, within
[edit] index point
XPAT views the entire text as one long string. In contrast to traditional text indices, which deal with words, XPAT indexes strings. The indexed strings extend from each index point
to the end of the text.
The XPAT index is made up of the starting points of each string. The index points make up the possible match points for a string search. Parameters set when the index is built determine which strings are in the index. The parameters specify patterns in the text that define the beginnings of strings to be indexed. For example, one pattern could specify that every character in the text is to be indexed, while another pattern could specify that each printable character following a blank is to be indexed.
When the index is created, two additional settings can alter how XPAT sees the text. Character mappings cause XPAT to see certain characters as equivalent to other characters. For example, all upper case letters may be mapped to lower case letters so that XPAT does not distinguish between upper and lower case when searching for a string. Also, some words may be designated as stopwords. XPAT views the text as if these words are not there. XPAT ignores strings in the text that start at an index point and match the given stopword strings followed by a blank after the character mappings have been applied. The character mappings also affect the strings chosen to be index points. For example, if a >
is mapped to a blank and if the index points are defined as blanks followed by printable characters, in the text ...<tag>wisdom...
the w
in the string wisdom
is an index point. Text with character mappings applied and stopwords removed is referred to as converted text.
When searching for a given string, a match is found if the given string (after having the character mappings applied to it and the stopwords removed) is the same as the converted text that begins one of the indexed strings.
[edit] See also:
data dictionary documentation, double quote
, offsets
, quiet mode
, range
, shift
, string search
[edit] intersect
set1 ^
set2
finds members common to two sets.
The intersect
operator (^
) creates a new set consisting of the members in set1 that are also in set2. Set1 and set2 can be either point sets or region sets. The new set is of the same type as set1.
If either of set1 or set2 is a region set, only the first of the pointers describing the region is used in the comparison to determine if a member should be included in the new set. Two members of a region set are considered to be equal if they start at the same location in the text.
[edit] Examples:
>> (region Verse incl "eye") ^
(region
Verse
incl "seed")
This query creates a region set. It includes verse regions that contain both the string eye
and the string seed
.
>> ("research" near "medical") ^
("research" near "biolog")
This query creates a point set. It includes the matches to research
that appear close to both the string medical
and the string biolog
.
[edit] See also:
difference
, region
, union
[edit] {Label}
{Label
string}
specifies an identifying string to be used as a label.
When XPAT is operating in quiet mode
with labels requested, any set displayed by a pr
or save
command shows the label string preceding the numeric value of the text offset. This can be used to identify which database the information is from. In a XPAT session, if a value for Label
has not been set by this command, the default value used is the name of the data dictionary. The label string must begin with an alphabetic character and contain no blanks or non-alphanumeric characters. The setting can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session.
[edit] Examples:
>> {Label Database1}
>> {QuietOn Label}
pr ("Ontario" near ("B.C." + "British Columbia"))
The tagged output from the pr
command shows the numeric offset in the file preceded by the string Database1
, in the form
<PSet><Start>Database1:12345</Start></PSet>
[edit] See also:
offsets
, quiet mode
[edit] Settings:
QuietOff
, QuietOn
[edit] last set
%
refers to the previous result.
%
is used as shorthand to refer to the set created most recently in the XPAT session. The set is the final one in the current history list. Some commands, such as pr
and save
, do not create sets that are saved and recorded in the history list and thus cannot be accessed by using the %
. If there is no history, the last set
is the null set which contains all index points.
[edit] Examples:
>> region Author including "Hemingway" >> pr sample % >> % within region Quote
The %
in the second line of the example refers to the set created by the including
command in the first query. The %
in the third line also refers to the set created by the first line and not to the result of the pr
in the second line which does not produce a set.
[edit] See also:
~free
, ~freeall
, history
[edit] {LeftContext}
{LeftContext
number}
specifies how many characters of context are displayed to the left of a set member.
By default, when a set is displayed with the pr
command or written to a file by the save
command, the text has 14 characters to the left of the match point. The setting can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the current session. The current value of the LeftContext
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {LeftContext 40}
This changes the setting to the value 40 so that any subsequent pr
or save
command produces text with 40 characters to the left of the match point.
[edit] See also:
pr
, save
, Settings
[edit] Settings:
PrintLength
[edit] macro
The macro
capability facilitates the use of frequently used sequences of XPAT commands.
A macro can be defined in a XPAT session and be available only for the duration of that session, or a macro can be created externally and read into any XPAT session by an exec
command or during initialization.
The definition of a macro (here called name
) begins with the following: name = macro
After this line the system prompt changes from >>
to ||
for the duration of the macro definition. The body of the macro may begin on the same line or on a subsequent line. XPAT interprets anything immediately following the word macro
, that is not a blank or new line, as the beginning of the macro definition. The body of the macro may contain arguments. The nth argument to the macro is identified within the macro definition by the string $n$
. Any sets that are created by the macro may also be used in its definition. The string *n*
refers to the nth set created within the macro. The end of the macro definition is indicated by a @
. After the @
the system prompt returns to the form >>
.
References to other macros may be used within the definition of a macro. If the macro contains more than one XPAT query, they can be put on separate lines or on the same line with the queries separated by a semi-colon. The body of the macro is not checked for syntax errors when it is defined. Any errors are reported when the macro is used.
The macro is invoked by the following call:
name(arg1,arg2..)
If the number of arguments in the macro call is less than the number in the macro definition a syntax error is reported. If it is greater, the extra arguments are ignored.
Each argument consists of all the text occurring between argument delimiters: parentheses and commas. That is, if a macro takes three arguments - (arg1,arg2,arg3)
- arg1
consists of the text between the opening parenthesis and the first comma, arg2
consists of the text between the first and second comma, and arg3
consists of the text between the second comma and closing parenthesis. If a macro takes only one argument - (arg1)
- the parentheses are the argument delimiters. Note that any spaces entered with an argument string will be included with the parameter substitution which is unlikely to be the intent of the user. To avoid unexpected results, enter only the exact text that you wish to be substituted in the arguments of the macro call. Also note that macros may have no arguments.
When the macro is invoked, the invocation is replaced by an exact copy of the body of the macro with the arguments substituted for the formal parameters. This means that the macro can be used within other XPAT queries. This may require that the macro definition have the closing @
on the same line as the final line of the body of the macro definition to avoid introducing an unwanted new-line character.
If improperly used, macros that produce multiple sets and are used within other queries may cause more than one syntax error to be reported. Care must be taken with bracketing in order to ensure that the results reflect what was actually intended.
A macro can be redefined during a XPAT session. When the macro is redefined, the previous definition of the macro is displayed following the first new line entered after the word macro
. The format of this previous definition consists of the macro name followed by a colon, followed by the body of the macro on subsequent lines.
For convenience, macros that are used frequently can be defined within an init file whose location is given in the data dictionary file. The init file is read and executed by a XPAT session when it is initially started. (See the data dictionary documentation for details.)
[edit] Examples:
>> word = macro
|| ( "$1$ " + "$1$<" + "$1$-" ) @
>> word(pad)
within *definitions
This macro is used with text that contains tags that start with a <
and where the tags may follow text without blanks appearing before the tag. The macro defines a word as a string of characters followed by a blank, <
, or -
(in this definition an index that has all punctuation mapped to a blank is assumed). Since the macro definition has the @
sign on the same line as the body of the definition, the macro can be used within a more complicated query as shown. Note the brackets included in the macro definition. The example assumes that there is a region definition
and finds all occurrences of pad
, as a word, inside one of these regions.
>> both = macro || region $1$ || *1* including $2$ || *2* including $3$ || @ >>both(Line
,"juliet"
,"romeo")
With the macro defined here, the members of a predefined region set which contain both of two given strings are found. In the macro call above, the macro is applied to a database of Shakespearean texts in order to find the members of the predefined region set named Line
containing references to both romeo
and juliet
. The definition of this macro returns more than one set. It also has the @
on the line following the body and thus could not be used within another query. The resulting output from XPAT showing the three sets produced by the macro would appear as below:
16: 128794 matches 17: 214 matches 18: 25 matches
[edit] See also:
data dictionary documentation, thesaurus
[edit] naming sets
name =
set1
assigns a name to a set.
A set which has been named can be referred to either by that name or its set number. Set1 can be either a point set or a region set.
A name that starts with a letter and contains only letters and numbers does not need to be enclosed within double quote
marks. However, if the name contains special characters (blanks or non-alphanumeric characters), or does not start with a letter, it must be enclosed within double quote
marks both in the assignment statement and in subsequent use.
To use the name in a query it must be preceded by an asterisk (*
). Without the asterisk (*
), XPAT interprets the name as a string rather than as the name of a set.
[edit] Examples:
>> UK =
"U.K."+"Britain"+"Great Brit"+"United King"
>> region Headline including *UK
The first line assigns the name UK
to a set of matches to four alternate ways of referring to the United Kingdom. The second line finds Headline regions that contain any of the matches.
>> "min_hiring" =
region Minutes incl ("hiring" near "policy")
>> region Attendees within *"min_hiring"
The first line of the example assigns the name min_hiring
to Minutes regions that include matches to hiring
appearing close to matches to policy
. The second line finds the Attendees regions that are within one of the resulting Minutes regions from the first query.
[edit] See also:
double quote
, set name
[edit] near
set1 near
set2
finds members of sets that are close to each other.
Near
creates a set containing the members of set1 that are within a specified number of characters before or after one or more members of set2. Set1 and set2 may be either point sets or region sets. The new set is of the same type as set1.
The distance between members of the two sets is calculated by counting the number of characters in the text between the first character of a member of set1 and the first character of a member of set2. The measure used to determine closeness
is the value of the Proximity
setting which has a default value of 80 characters. The value can be changed for all subsequent uses of near
by changing the Proximity
setting, or it can be changed for an individual use of near
by using a modifier attached to the command. The form of the modifier is a period followed by a number representing the maximum distance (in characters).
If either set1 or set2 is a region set, the first of the two pointers describing the region is used in finding the distance between the members of the sets.
Multiple near
commands are not parsed left to right. A command of the form
set1near
set2near
set3
is handled as if parenthesized as follows:
set1near
(set2near
set3)
The command not near
creates a set containing those members of set1 that are not
within the specified distance of any member of set2.
set1 not near
set2
is the same as
set1 - (set1 near
set2)
[edit] Examples:
>> "love " near
"hate "
Assuming a Proximity
of 80, this query creates a point set containing those matches to love
that are within 80 characters of matches to hate
, counting from the l
in love
to the h
in hate
. The string hate
can occur before or after love
in the text.
>> region Title near.30
region Author
This query creates a region set containing the members of region Title
that are within 30 characters of one or more members of region Author
. In this case the distance is measured as the number of characters between the first character of a Title region and the first character of an Author region.
>> "love " not near
"hate "
This query creates a point set containing those matches to love
that do not occur within 80 characters of a match to hate
calculating the distance as in the first example.
>> "love " not near.30
"hate "
This query creates a point set containing the matches to love
that do not occur within 30 characters of a match to hate
.
[edit] See also:
fby
, not
[edit] Settings:
Proximity
[edit] next
next
set1
finds a specified number of contiguous members of a set following members already identified by a first
or next
command.
Next
creates a set of a specified size containing the members of set1 that start at the current cursor position associated with this set. The cursor position is determined by a previous first
or next
command applied to set1. The members of the new set are in the order they appear in set1. Set1 may be either a region set or a point set. The new set is of the same type as set1.
The operation of the next
command depends on the set order established by the SortOrder
setting. If the SortOrder
setting is Alpha
, the set is ordered alphabetically; if the SortOrder
setting is Occur
or OccurHead
, the set is ordered as the members occur in the text; and if the SortOrder
setting is AsIs
, the set ordering is the current one and may thus be either alphabetic or occurrence order.
Each set that is used with a first
, next
or ~nextemp
command has a cursor (set member counter) associated with it. The cursor indicates the location in set1 at which to begin selection for the set being created. On completion of the next
command the cursor is updated to point at the beginning of the next set. Note, when the SortOrder
setting changes and the set ordering is changed, the cursor is reset to the first element.
The size of the set created is determined by the value of SampleSize
which has a default value of 10. If the size of set1 is less than SampleSize
, then the new set created is the same size as set1. Changing the SampleSize
affects all subsequent uses of next
during the current session. For an individual use of the command, the size of the new set can be specified by using a modifier attached to the next
command. The modifier is in the form of a period followed by a numeric value giving the desired set size.
The next
command can be used by itself or with the pr
, save
or export
commands. Note that next
may only be used in conjunction with these commands.
[edit] Examples:
>> {SampleSize 40} >>first
.0 5 >>next
5
The first line of the example changes the SampleSize
setting to 40. The first
command resets the cursor associated with set number 5 to the first member of the set and creates a set of size 0 (thereby leaving the cursor at the first member). The third line creates a set that contains the first 40 members of set number 5.
>> next
.10 5
If this command follows the previous example, a set of ten members is created. The cursor associated with set number 5 indicates that 40 members have been used to create the set in the previous next
command and so this new set starts at the 41st member of set number 5.
[edit] See also:
first
, ~nextemp
, sample
, subset
[edit] Settings:
SampleSize
, SortOrder
[edit] ~nextemp
~nextemp
set1
finds a specified number of contiguous members of a set following members already identified by a first
or next
command.
The command ~nextemp
creates a set of a specified size containing the members of set1 that start at the current cursor position associated with the set. The cursor position is determined by the previous first
or next
command applied to set1. The members of the new set are in the order they appear in set1. Set1 may be either a region set or a point set. The new set is of the same type as set1.
The ~nextemp
command is identical to the next
command except that the cursor is unchanged by the ~nextemp
command.
The operation of the ~nextemp
command depends on the set order established by the SortOrder
setting. If the SortOrder
setting is Alpha
, the set is ordered alphabetically; if the SortOrder
setting is Occur
or OccurHead
, the set is ordered as the members occur in the text; and if the SortOrder
setting is AsIs
, the set ordering is the current one and may thus be either alphabetic or occurrence order.
Each set that is used with a first
, next
or ~nextemp
command has a cursor (set member counter) associated with it. The cursor indicates the location in set1 to start selecting members for the set being created. After completion of the ~nextemp
command, the cursor is unchanged. This differs from the behaviour of the next
command, which updates the cursor to point at the last member of set1 selected for the new set. Note, when the SortOrder
setting and the set ordering change, the cursor is reset to the first element.
The size of the set created is determined by the value of the SampleSize
setting which has a default value of 10. If the size of set1 is less than SampleSize
, then the new set created is the same size as set1. Changing the SampleSize
setting affects all subsequent uses of ~nextemp
during the current session. For an individual use of the command, the size of the new set can be specified by using a modifier attached to the ~nextemp
command. This modifier is in the form of a period followed by a numeric value giving the desired set size.
The ~nextemp
command can be used by itself or with the pr
, save
or export
commands. Note that ~nextemp
may only be used in conjunction with these commands.
[edit] Examples:
>> {SampleSize 40} >>first
.0 5 >>~nextemp
5
The first line changes the SampleSize
setting. The first
command initializes the cursor associated with set number 5 to the first member of the set and creates a result set of size 0 (thereby leaving the cursor at the first member). The third line creates a set that contains the first 40 members of set number 5.
>> ~nextemp
.10 5
Assume this command follows the previous example. On completion of the previous query, the cursor still points to the beginning of the set as the ~nextemp
command does not change the cursor setting. The set created by this query contains 10 elements from the beginning of set number 5.
[edit] See also:
first
, next
, sample
, subset
[edit] Settings:
SampleSize
, SortOrder
[edit] not
is used to modify four XPAT commands. The forms in which not
can appear are not fby
, not including
, not near
, and not within
. These uses are described in the entries for fby
, including
, near
, and within
. Not
cannot be used to modify any other commands.
[edit] See also:
fby
, including
, near
, within
[edit] offsets
[number]
[label:number]
generate a point set containing a specified position in the text.
The number in the square brackets is a logical position in the text and need not be an index point. The number indicates the offset, measured in number of characters, from the beginning of the text database. The first character of the text has offset [1]. If the number used in square brackets exceeds the size of the text XPAT gives the message
Error: Input number too large.
Note that the new set is a point set with only one member.
The second form of the command, shown above, uses offsets that are produced when XPAT is operating in quiet mode and using labels. In this form, in order to produce correct results, the label string must be the current value of the setting Label
. When the label is different from the current Label
setting the resulting set has size 0.
[edit] Examples:
>> region Quote including[
20000]
This query finds the Quote region that includes the offset 20000.
>> {Label news} >> region Quote including[
news:20000]
This query uses an offset in the form produced by XPAT in quiet mode (having requested labels with the offsets). Since the Label
has been set to the value news
by the previous command, the query finds the region set named Quote
containing the given offset. If the label, prefixed to the offset, is anything other than news
the query would produce a set of size 0.
[edit] See also:
quiet mode
, sets
[edit] Settings:
Label
[edit] pr
pr
set1
displays contents of XPAT sets.
Pr
displays each member of set1 with surrounding context. A modifier can be attached to the pr
command in order to control the context exactly. Set1 can be any region set or point set. If set1 is a region set, the first of the pair of points describing each region in the text is displayed by the pr
command. If no set1 is given, the operand for the command is the most recent set created in the session.
For each member in the given set, the output is in the form of an integer giving the offset of the set member in the text file, followed by a comma, a blank, two periods and then the characters surrounding the set member. The first character in the database is considered to be offset 1. The order in which the set is displayed depends on the current SortOrder
setting.
With no modifier, pr
prints a line of text for each element in set1. The PrintLength
and LeftContext
settings determine the content of the line printed. With the default settings, the printed text is 64 characters in length of which 14 precede the match point. The number of characters displayed to the left of the match point can be altered by changing the LeftContext
. The total number of characters printed can be altered by changing the PrintLength
setting.
The total number of characters to be displayed can be set, for a single instance of the command, by using a numeric modifier attached to the pr
. The modifier is in the form of a period followed by a number giving the total number of characters to be displayed. The left context that is displayed is still determined by the value of the LeftContext
setting.
The second form the modifier can have is a period followed by the string region
. When the modifier .region
is used, the output text starts at the match point and continues to the end of the default region in which the match point occurs. If the match point is not within the default region, no output is displayed for the match point.
The second form of the modifier can be refined to request that the text displayed is a region other than the default region. An additional modifier specifying a defined region set can be attached to the already modified pr
command (i.e. to the pr.region
). The additional modifier can specify the region in one of three ways: a string giving the name of a predefined region, the number of a region set created in the XPAT session, or a string preceded by an asterisk (*
) referring to a named region set defined in the XPAT session (see the examples below). As with the form pr.region
, described above, this use results in the displayed text starting at the match point with no left context and continuing to the end of the region. When the match point is not contained in the designated region set, no output is displayed.
[edit] Examples:
>>"Kipling"
>>pr
This command displays a line of context for each member of the previously calculated set. Assuming the PrintLength
and LeftContext
still have the default values, each line will contain 64 characters of which 14 will be before the match point.
>> {PrintLength 300}
>> pr
"my dear Watson"
As with the previous example, this command prints a line of context for each member in the point set matching the string my dear Watson
. In this case, the line printed for each member in the set is 300 characters long but still has 14 characters preceding the match point.
>> pr
region including "detective"
This command will print a line for each member in the set of default regions that contains the string detective
. The text displayed starts at the beginning of the default region
.
>> pr.200
shift.-100 ("city" near "oxford")
This command prints a line of 200 characters for each member in the set of matches to the string city
when it appears near the string oxford
. Since the match points in this set have been shifted 100 characters to the left the displayed text actually begins 114 characters to the left of the string city
(assuming the LeftContext
is set to 14).
>> region incl (region EQ incl ("<D>1980" .. "<D>1986"))
>> pr.region
The first query finds the members of the default region
(in this example they might be dictionary entries) that contain EQ regions which are in the period from 1980 to 1986. The second command prints these entries. After the offset, comma, blank and two periods, the displayed text starts at the match point which is at the beginning of the default region, and continues to the end of the default region.
>> region Quote including ("univ" near "waterloo")
>> pr.region.Quote
The first query finds the Quote regions which contain the string univ
occurring near the string waterloo
. The second command displays these regions. The output consists of an offset, comma, blank, two periods and the text starting at the beginning of the Quote region and continuing to the end of the Quote region.
>> pr.region.5
"law" fby "order"
This command displays data from the set of matches to the string law
when followed by order
. The text that is printed starts at the matches to the string law
and continues to the end of the regions which contain the match point.
>> *verse including "faith, hope, charity"
>> pr.region.*verse
The first query finds the regions that contain the string faith, hope, charity
occurring in the set that has been created and named verse
during the XPAT session. For each of the members in this region set, the second command prints information starting at the beginning of the region and continuing to the end of the region described by *verse
.
[edit] See also:
history
, naming sets
, quiet mode
, region
, save
[edit] Settings:
DefaultRegion
, LeftContext
, PrintLength
, SortOrder
[edit] {PrintLength}
{PrintLength
number}
specifies how many characters of text are displayed.
By default, when the members of a set are displayed with the pr
command or written to a file by the save
command, each member contains 64 characters of context, 14 to the left of the match point, the match point itself, and 49 to the right. This setting may be overridden so that the number of characters processed is determined by a modifier for an individual use of the pr
or save
commands. The PrintLength
setting determines the total number of characters processed and thus affects the number of characters shown to the right of the match point. The number of characters to the left of the match point is determined by the LeftContext
setting.
The setting can be changed at any time during a XPAT session and remains in effect until changed again or until the end of the session. The current value of the PrintLength
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {PrintLength 100}
>> pr ("Yukon" near ("B.C." + "British Columbia"))
This changes the setting to the value 100 so that any subsequent pr
or save
command produces text 100 characters in length. The set displayed has 14 characters to the left of the match point and 85 characters to the right, assuming a default value of 14 characters for left context.
[edit] See also:
pr
, save
, Settings
[edit] Settings:
LeftContext
[edit] {Proximity}
{Proximity
number}
specifies the measure of closeness for the near
and fby
commands.
The Proximity
default for the fby
and near
commands is 80 characters. That is, a match point of a member of set1 must be within 80 characters of a match point of a member of set2 to be included in a new set created by the near
and fby
commands.
The Proximity
setting may be overridden for an individual use of the fby
and near
commands by appending a modifier to the command.
The Proximity
setting can also be changed at any time during a XPAT session and remains in effect until changed again or until the end of the session (see example below). The current value of the Proximity
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {Proximity 200}
>> "Canada" near ("U.S." + "United States" + "the States")
The first line of the example changes the Proximity
setting to the value 200 so that any subsequent Proximity
commands use this value. In the query, XPAT finds the occurrences of the string Canada
that occur within 200 characters either to the left or right of members of the set produced by the union of the sets matching the strings U.S.
, United States
and the States
.
[edit] See also:
fby
, near
, Settings
[edit] ~qnum
~qnum
outputs a query number.
The ~qnum
command operates in both standard and quiet mode. In standard mode, the number of the next query is output. In quiet mode, the information is tagged and the number of the next query is contained within <Qnum> tags.
[edit] Examples:
>> "testing"
>> ~qnum
If testing
is the first query in the XPAT session, the output from the ~qnum
command is the set number 2. In quiet mode, this appears as the string <Qnum>2</Qnum>
.
[edit] See also:
quiet mode
[edit] quiet mode
{QuietOn
Raw Converted Label Persistent}
{QuietOff}
changes the mode of operation of XPAT. {QuietOn}
causes XPAT to operate in quiet mode. {QuietOff}
causes XPAT to revert to standard (non-quiet) mode.
Each of the four arguments to QuietOn
is optional and may appear in any order. When an argument is present in a QuietOn
command, the corresponding setting is turned on. Conversely, when an argument is not present in a QuietOn
command, the corresponding setting is turned off. Settings are not carried forward from one QuietOn
command to the next but are reset with each QuietOn
command.
All XPAT commands that create sets operate the same way in quiet mode and in standard mode. However, the output generated by XPAT is different in the two modes. No prompt or newline appears when XPAT is operating in quiet mode. In addition, the output from XPAT in quiet mode is in a tagged format.
In standard mode, when a command or query creates a new set, a set number and the number of matches is output. In quiet mode, the tagged output contains the number of matches within <SSize> tags but no set number. For example, if a set of 122 matches is created by a XPAT query the output is of the form:
<SSize>122</SSize>
In standard mode, information displayed about a set by a pr
command is affected if a modifier is attached to the command. The output from a pr
command is preceded by the offset in the text of the set member being printed. If the set is a region set, the offset is the start of each region in the set. In quiet mode, the output contains the numeric offset in a tagged format. The settings of Raw, Converted, Label and Persistent affect the information displayed by the pr
command. Each of the settings is discussed below.
{QuietOn}
With Persistent turned off, the values of the offsets that are output are the logical offsets into the file. The logical and persistent offsets are different and non- interchangeable. Persistent offsets are designed for use with the update system. (See the documentation for the XPAT update system.)
If the pr
command is not of the form pr.region
, the offset of each set member is contained within <Start> tags and the entire output is contained within <PSet> (for Point Set) tags. For example, if the set is of size 2, the output might look as follows (without the line breaks).
<PSet><Start>1234</Start> <Start>5554</Start></PSet>
If the modifier to the pr
command is .region
, in standard mode the text displayed is from the match point to the end of a specified region. In quiet mode, the tagged output contains both the offset of the match point and the offset of the end of the specified region. The offsets of the ends of the region are contained within <End> tags and the entire output is contained within <RSet> (for Region Set) tags. For example, for the above set of size 2, output from a pr.region
might look like
<RSet><Start>1234</Start><End>1444</End> <Start>5554</Start><End>6000</End></RSet>
{Quiet On
Label}
When Label is turned on, the form in which the offset is printed changes. The numeric value of the offset into the text within the <Start> or <End> tags is preceded by an identifying label and a colon. This label string is the value of the Label
setting. If Label
has not been set, the label used in the output is the name of the data dictionary file up to the first non-alphanumeric character. For example, if the data dictionary is news.dd
and Label
has not been set, the output from a pr
command would look like:
<PSet><Start>news:1234</Start> <Start>news:5554</Start></PSet>
{QuietOn
Raw }
When Raw is turned on, in addition to the tagged offsets, the output contains text showing the match point and surrounding context. For each member of the set, this additional information is output within <Raw> tags following the tagged offset information for each member of the set.
The length of the string being output is given within <Size> tags and is followed by the text. As in standard mode, if pr
has no modifier, the length of string output is determined by the PrintLength
setting and the context shown to the left of the match point is determined by the LeftContext
setting. If the modifier is a numeric value, this value determines the length of the string and the left context is still determined by the LeftContext
setting. If the modifier to the pr
command is .region
, the text starts at the match point and continues to the end of the specified region.
For example, assuming a PrintLength
setting of 25, and a LeftContext
setting of 5, the output from a pr
command applied to a set of 2 matches to the string sample
would be (without the line breaks shown here):
<PSet><Start>1234</Start><Raw><Size>25</Size> This sample is to be firs</Raw> <Start>3456</Start> <Raw><Size>25</Size> This sample is to be seco</Raw> </PSet>
If the SortOrder
setting is OccurHead
, in addition to the above output, the descriptive header is output in a tagged format. (See the entry for SortOrder
for a description of the header). This information is contained within <Hdr> tags and includes the length of the descriptive string within <Size> tags followed by the string of the header. If the SortOrder
setting was OccurHead
in the above example, the output would be (without the line breaks shown here):
<PSet><Start>1234</Start> <Hdr><Size>10</Size>First </Hdr> <Raw><Size>25</Size> This sample is to be firs</Raw> <Start>3456</Start> <Hdr><Size>10</Size>Second </Hdr> <Raw><Size>25</Size> This sample is to be seco</Raw> </PSet>
{QuietOn
Converted}
When Converted is turned on, in addition to the tagged offsets, text following the match point is output for each member of the set. This text is displayed with the appropriate character mappings for the XPAT index and any stopwords removed. For example, if upper case is mapped to lower case when creating the index, the text is displayed in lower case. If the index has the word to
as a stopword, to
would not appear in the converted text.
For each member of the set, this additional information is output within <Cvt> tags. The length of the output text string is enclosed within <Size> tags and is followed by the text itself. For each set member, the text string shown starts at the match point. This is in contrast to the Raw text output which shows the match point with some left context. If the pr
has no modifier, the length of the string is determined by the PrintLength
setting. If the modifier is numeric, this determines the string length. If the modifier is .region
, the length of the string is the value of the difference between the offsets of the match point and the end of the region. As the displayed text is converted text, it is possible that some text conversions cause output, such as multiple blanks resulting from character mappings or stopwords, to be suppressed. This may result in text that occurs past the end of the region to be displayed.
For example, using the above example of a set of size 2 and further assuming that to
and be
are stopwords the output might be:
<PSet><Start>1234</Start> <Cvt><Size>25</Size> sample is first used for </Cvt> <Start>3456</Start> <Cvt><Size>25</Size> sample is second used for</Cvt> </PSet>
If the SortOrder
setting is OccurHead
, in addition to the above output, the descriptive header is given in a tagged format. (See the entry for SortOrder
for a description of the header). The information giving the descriptive string precedes the <Cvt> tag and does not have the character mappings applied to it. The previous example would change to:
<PSet><Start>1234</Start> <Hdr><Size>10</Size>First ..<Cvt><Size>25</Size> this sample is first used</Cvt> <Start>3456</Start> <Hdr><Size>10</Size>Second ..<Cvt><Size>25</Size> this sample is second use</Cvt> </PSet>
{QuietOn
Persistent}
When Persistent is turned on, the offsets that are output are the persistent (persistent) positions within the text database. As noted earlier, in a database that has not been initialized for update, the persistent and logical offsets are identical.
{QuietOn
Raw Converted Label}
Any combination of the QuietOn
arguments may be used. Thus, after the command {QuietOn
Raw Converted Label}
, the following would result:
<PSet><Start>news:1234</Start> <Raw><Size>25</Size> This sample is to be firs</Raw> <Cvt><Size>25</Size> sample is used first for </Cvt> <Start>news:3456</Start> <Raw><Size>25</Size> This sample size is to be seco</Raw> <Cvt><Size>25</Size> sample is second used for</Cvt> </PSet>
The save
command results in identical behaviour to that of the pr
command except that the information is written to a designated file rather than displayed on the standard output.
Syntax errors that occur during the XPAT session are reported in a tagged format. A set size of -1 is indicated and the error information is contained within <Error> tags. For example, if a command uses the default region before it is set, the error shown is (without the line breaks shown here):
<SSize>-1</SSize> <Error>No information for default region </Error>
Although the sets created by the signif
command are the same in quiet and standard mode, signif
does not display the text string associated with the set in quiet mode. If signif
is modified with a negative integer n requesting n sets, only information about the last set created is shown.
History
and {Settings}
display no output in quiet mode.
[edit] See also:
history
, XPAT update system documentation, pr
, save
, Settings
, signif
[edit] Settings:
Label
, LeftContext
, PrintLength
, SortOrder
[edit] {QuietOff}
{QuietOff}
[edit] See:
quiet mode
[edit] {QuietOn}
{QuietOn
Raw Converted Label Persistent }
[edit] See:
quiet mode
[edit] quit
quit
terminates a XPAT session.
The use of the quit
command causes the session to end and the XPAT process to exit. A message may be generated telling how much computer time has been used during the XPAT session.
[edit] See also:
done
, stop
[edit] range
string1 ..
string2
finds strings that begin with strings occurring within an alphabetic range.
The range
operator creates a point set consisting of those indexed points in the text that fall alphabetically between string1 and string2 inclusive. String1 and string2 are patterns that may or may not actually occur in the text being searched. The resulting set contains the matches to both string1 and string2.
Both the operands to the range
command must be strings. Using a set number with the range
command is illegal and results in a syntax error.
[edit] Examples:
>> "n" ..
"z"
This query finds all indexed points in the text that occur in the alphabetic ordering between n
and z
.
>> "a" ..
"z"
Again, assuming the text has been indexed on words, this query creates a set of all the words and phrases in alphabetical order (that is, it produces a concordance of the text).
>> "1" ..
"200"
This query find all the strings that fall alphabetically between 1
and 200
. This gives all the indexed strings that begin with 1
or 200
. For example, the strings 1929
, 20034
as well as strings such as 2003/1
and 2000-15000
are in this range. The resulting set does not contain the strings 3
or 4
.
>> region Date including ("1920" ..
"1925")
This query finds Date regions that contain dates from 1920 to 1925 inclusive. The range 1920
..
1925
also contains strings such as "1925000" as they also fall within the range.
>> "<Date>1920" ..
"<Date>1925"
If dates are marked with the tag <Date>
and begin with a 4-digit value for the year, this query reliably finds dates between 1920 and 1925 inclusive, and only those dates.
[edit] See also:
data dictionary documentation, index points
[edit] rankedby
set1 rankedby
set2
ranks a region set by the number of contained members of another set.
Rankedby
creates a set containing those members of set1 that contain the greatest number of occurrences of members of set2. Set1 must be a region set. Set2 may be either a point set or a region set. The new set is a region set.
Set1 may be a predefined region set, a region set that has been created within the current XPAT session using the region
command, a region set resulting from the use of the import
command, or the result of a previous query during the current session.
The size of the new set is by default the value of the SampleSize
. Another size may be requested with a numeric modifier in the form of a period followed by the requested size.
The set that is created, when accessed by pr
, save
and subset
in SortOrder
AsIs
, is naturally ordered by rank. That is to say, the first member will be that element of set1 that contains the most occurrences of members of set2.
In detail, the rankedby
command operates as follows. It first splits all the members of set1 into groups. Each member of a group includes the same number of members of set2 as the other members of the group. In addition, within a group, the members are sorted into occurence order. After it has grouped the members of set1, the rankedby
command sorts the groups into decreasing order of number of included members of set2.
For example, say that set1 has 6 members, as follows: 3 members that each contain 2 members of set2, 2 members that each contain 4 members of set2, and 1 member that contains no members of set2. After rankedby
has grouped and sorted set1, the groups are be as follows. The first group consists of the 2 members of set1 that contain 4 members of set2. The second group consists of the 3 members of set1 that contain 2 members of set2, and the third group consists of the 1 member of set1 that contains no members of set2. Within each group, the members are in occurence order. If the user has requested the top 4 sets, the result set would contain both members of the first group and the first two members of the second group.
[edit] Examples:
>> region Story rankedby ("Free trade" near "Canada")
This query finds the regions described by region Story
that contain the greatest number of matches to Free trade
when it occurs close to the Canada
. The number of members in the new set is the value of the SampleSize
setting.
>> region Quote rankedby.5 region Author
This query creates a set whose members are the 5 members of region Quote
that contain the greatest number of members of region Author
.
[edit] See also:
including
, region
[edit] Settings:
SampleSize
[edit] region
region
region
string
region
set1 .. set2
produce region sets in a text database. The first two forms of the region
command refer to region sets that have been defined externally to a XPAT session and for which information is available in the data dictionary. These region sets may have been defined using patregion or any other program that generates information (in the form that XPAT understands) about regions in the text. The third form of the command defines a region set during a XPAT session. The results of any of these commands can be used as operands to any of the XPAT commands that operate on region sets.
region
Region
, used with no operand, refers to the particular predefined region set that has been designated as the default region. The default region is defined by the DefaultRegion
setting and can be reset for the remainder of a XPAT session by changing the setting. If no default region has been defined, using region
in this form causes an error. The following message is generated:
No information for default region.
region
string
The second form of the region
command indicates one of the named predefined region sets. The string is the name that has been given to the region set in the data dictionary. For example, the region sets might be the chapters of a book, the entries in a dictionary or the headlines in a newspaper database. The information about certain regions in the text database is generated by a program external to XPAT and is made available during a XPAT session via the data dictionary. One program that generates the information is patregion.
Note that the string giving the name of the region set can contain blanks or special characters, if it is enclosed within double quote
marks.
region
set1 .. set2
The third form of the region
command defines a new region set. The region set that is created by this command is only available for the duration of the XPAT session. Information about this region set can be written to a file using the export
command and read into a future XPAT session using the import
command.
Set1 and set2 are used to define regions in the new set. Set1 and set2 can be either point sets or region sets. If either set1 or set2 is a region set, the region
command uses only the first of the pair of pointers describing its members in defining the new region.
Each region in the new set is formed as follows. A member of set1 is the beginning of a region if it is followed by a member from set2 with no other member of set1 occurring between the two members. The end point of the new region is defined by the member in set2 that most closely follows the set1 member. The region contains the text from the beginning of the member of set1 up to but not including the member of set2. This produces the smallest non-overlapping region set that can be formed by set1 and set2. The size of the region set created is equal to or smaller than the size of set1. If the members of set2
are matches to a pattern, the new region set does not contain the occurrences of that pattern. For example, if set2 is the set of matches to the string End of Message
, the new region set contains no occurrences of the string End of Message
.
If set1 and set2 are identical, two extra regions may be included in the newly created set. These are: a region from the beginning of the text to the member of set1 that occurs earliest in the text; and a region from the last element of set1 in the text to the end of the text. If either of these regions is a substring of length zero, it is not included. If the shift
command is applied to set1 or set2, the extra regions are not included in the new set.
Some programs, such as patregion, that produce predefined region sets, define the end point of the region in a somewhat different manner. These programs deal with patterns of text (rather than points in the text) and the end point of the region that is defined is usually the last character in the pattern that is used to define the regions. If desired, the region
command within a XPAT session can be used in conjunction with the shift
command to create a set of regions in which the ends of the regions are at the end of a pattern. See the examples below.
XPAT does not support region sets whose members nest or overlap. As described above, using region
with operands that are patterns defining nested or overlapping regions, creates a region set which is the smallest non-overlapping set of regions. Patregion used on the same text creates a possibly different region set (also non-overlapping) consisting of regions from an opening pattern to the following end pattern.
[edit] Examples:
>> region
including ("Smith" near "Jones")
This query creates a region set, consisting of the members of the default region which contain a match to the string Smith
when it occurs within a prescribed distance of the string Jones
.
>> "Campbell" within region "Speaker Name"
In this example, we assume that one of the predefined regions has been named Speaker Name
. This query creates a point set that contains matches to the string Campbell
occurring within members of region Speaker Name
.
>> firstb =region
"<A>".."</B>" >> (region B
within *firstb) including "requested string"
The text, in this example, contains regions that begin with <A>
and end with </A>
. Each of these A
regions contains smaller regions that begin with <B>
and end with </B>
. Assume, in certain instances, that it is necessary to be able to find the first B
region within each A
region. The use of region in the first query creates a region set named firstb
that can be used to find these regions. The members of firstb
are the pieces of text that begin with the string <A>
and extend to the closest string </B>
. The second query finds the members of region B
that are within firstb
, and then finds the members of the latter that include requested string
.
>> quote = region
"<Q>" .. (shift.4 "</Q>")
If some components in the text are tagged with <Q>
and </Q>
this command creates a region set describing these components. Each region in the set extends from the opening tag <Q>
to the end of the closing tag </Q>
. By using the shift operator, applied to the </Q>
, the members of the point set used to find the ends of the new regions all point to the end of the string </Q>
rather than to the beginning of the tag.
>> mess1 =region
"From:" .. "From:" >> mess2 =region
"From:" .. (shift.0 "From:") >> from =region
*mess1 .. "Received:" >> "Bill" within *from
This set of queries is being applied to a database of mail messages. Each message has the string From:
at the beginning. The string Received:
appears at the beginning of the second line of the message indicating the time the message was received. Assume that the first query, identifying the matches to the string From:
, returns a set of size 10. Further assume that there is text in the database preceding the first From:
. The next query creates a region set of size 11 as two additional regions are included in the resulting set: one containing the text from the beginning of the text to the first occurrence of From:
and the other containing the text from the last occurrence of From:
to the end of the text. The third query creates a region set of size 9 as these two regions are not included in the new set. The next query creates a region set describing the sender of the message. The final query finds the matches to Bill
in the regions describing the sender of the message. Notice the use of an asterisk (*
) before the name of the new region set when it is used as an operand to a XPAT command.
[edit] See also:
data dictionary documentation, export
, import
, including
, index point
, naming sets
, pr
, save
, set name
, shift
, within
[edit] Settings:
DefaultRegion
[edit] sample
sample
set1
finds representative members of a larger set.
Sample
creates a set containing a specified number of members of set1. Set1 may be either a region set or a point set. The new set is of the same type as set1.
The size of the set created is determined by the value of the SampleSize
setting which has a default value of 10. If the size of set1 is less than SampleSize
, then the new set created is the same size as set1. The size can be changed for all subsequent uses of sample
during the current session by changing the SampleSize
setting. For an individual use of the sample
command, the setting can be changed by using a modifier attached to the command. The form of the modifier is a period followed by a number giving the desired size of the sample set.
The members of the sample set are chosen as follows. If the size of set1 is x and the sample size requested is y, each x/yth member of set1 is in the sample set. For example, if a sample of size 20 is requested from a set of size 2000, the 100th, 200th members etc. are chosen. The ordering of the set, and hence the members of the sample set, is determined when the set is created. The SortOrder
setting does not determine which members are included in the set created by the sample
command as it does for the subset
, next
, ~nextemp
, and first
commands. However, this setting does affect how the sample
set is ordered when used with a pr
command (or save
command).
The sample
command can be used by itself or with the pr
, save
or export
commands. The sample
may only be used in conjunction with these commands.
[edit] Examples:
>> sample
"shaks"
Assuming a SampleSize
setting of 10, this query creates a set of 10 examples from the set of matches to the string shaks
.
>> {SampleSize 30}
>> sample
"shaks"
The first command changes the SampleSize
setting to 30 and the second creates a set of 30 examples from the set of matches to the string shaks
.
>> region Quote including "Doyle"
>> sample
.20 %
The first query creates a region set containing Quote regions that include the string Doyle
. The second query creates a sample set of 20 members from the results of the first query.
>> region Quote including (sample
"Doyle")
This query is illegal and results in a syntax error.
[edit] See also:
first
, next
, ~nextemp
, subset
[edit] Settings:
SampleSize
, SortOrder
[edit] {SampleSize}
{SampleSize
number}
specifies the size of the set produced by the sample
, subset
, and rankedby
commands.
By default, sample
and subset
create a set of 10 members of a given set. This setting may be overridden and the size of the result determined by a modifier for an individual use of these commands. The SampleSize
setting can be changed at any time during a XPAT session and remains in effect until changed again or until the end of the session. The current value of the SampleSize
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {SampleSize 200}
>> pr sample 5
This changes the SampleSize
to 200 and any subsequent sample
or subset
command uses this value. In the second query XPAT prints information about 200 members of set number 5 created earlier in the session.
[edit] See also:
rankedby
, sample
, Settings
, subset
[edit] save
save
set1
writes the contents of a set to a file.
The save
command is identical to the pr
command except that the output is written to a file. The name of the file where the information is written is determined by the value of the setting SaveFile
. The default value of the setting is xpat.res
. The file used by the save
command can be changed at any time during the XPAT session by changing the setting. The information output by the save
command is concatenated onto the end of the save file if one of the same name already exists. Otherwise, a new file is created and the information is written to the new file. Assuming the default setting of SaveFile
, the following message is printed on execution of the save
command:
Saving in xpat.res.
For each member in the given set, the output is in the form of an integer giving the offset of the set member in the text file followed by a comma, a blank, two periods and the characters surrounding the set member. The order in which the set is output is determined by the current SortOrder
setting.
With no modifier, Save
outputs a line of text for each element in set1. The PrintLength
and LeftContext
determine the content of the line saved. With the default settings, the saved text is 64 characters in length of which 14 precede the match point. The number of characters to the left of the match point can be altered by changing the LeftContext
setting. The total number of characters printed can be altered by changing the PrintLength
setting.
The total number of characters to be saved can be set for a single instance of the command, by using a numeric modifier attached to the save
command. The modifier is in the form of a period followed by a number giving the total number of characters to be saved. The left context that is saved is still determined by the value of the LeftContext
setting.
The second form the modifier can have is a period followed by the string region
. When the command save.region
is used, the output text starts at the match point and continues to the end of the default region in which the match point occurs. If the match point is not within a default region, no output is saved for the match point.
The second form of the modifier can be refined to request that the text output be in a region other than the default region. An additional modifier, specifying a defined set of regions, can be attached to the already modified save
command (i.e. to the save.region
). This additional modifier can be in one of three forms: a string giving the name of a predefined region, the number of a region set created in the XPAT session, or a string preceded by an asterisk (*
) referring to a named region set defined in the XPAT session. As with the form save.region
, described above, this use results in the output text starting at the match point with no left context and continuing to the end of the region. When the match point is not contained in the designated set, no output is saved.
The similarly named commands save.commands
and save.history
result in very different behaviour and are described in separate entries.
[edit] Examples:
>> "Helen Maday"
>> save
As a result of this command, XPAT writes a line of context for each member in the most recently created set. The information is written to the file that is named by the setting SaveFile
. If the setting has not been changed during the session, the file used is xpat.res
. Note that the information is appended to the save file if one of the same name already exists.
>> save
"From: Tony Lopez "
A line of context for each member of the set that matches the string From:
is written to the save file.
>> {PrintLength 120}
>> save
region including "planet"
A line of context is written for each member in the set of regions created by the including
query. The line that is written starts at the beginning of each region in of the new set. Since the PrintLength
has been set to 120, each line contains 120 characters and has 14 characters to the left of the beginning of the displayed region.
>> save.200
shift.-100 ("procedure" near "policy")
In this case, a line of 200 characters is written to the save file for each member in the set created by the query shift.-100 ("procedure" near "policy")
. The text that is written starts 114 characters to the left of the string procedure
.
>> region including (region EQ including "<A>Doyle</A>")
>> save.region
The first query finds all the earliest quotes (defined by the region EQ
) that have Doyle
as the author. The second command saves information about each of the default regions that includes one of these quotes. The information that is written for each of these regions contains the offset in the text file of the region, a comma, a blank followed by two periods and the text of the default region. As no set is given as an operand to the save
command, it is understood that the command applies to the previous set.
>> region Quote including ("stadium" near "Toronto")
>> save.region.Quote
%
The first query finds all quotes that contain the strings stadium
within 80 characters of the string Toronto
. The second command saves information in the save file (xpat.res
unless the SaveFile
setting has been reset) about each of these regions. As in the example above, the output for each set member is in the form of an integer giving the text offset, a comma, a blank followed by two periods and the text beginning at the start of region Quote
and continuing to the end of the region.
>> save.region.5
"night" fby "day"
This command saves information about the set created by the query "night" fby "day"
. The information written to the save file (after the offset, comma, blank and two periods) starts at the text night
and continues to the end of the region defined by set number 5 that contains the match.
>> minutes = region "<Min>" .. "</Min>"
>> *minutes including "examination schedule"
>> save.region.*minutes
The first query in this example defines a set of regions that are named minutes
. The second query finds the regions in this set that contain the string examination schedule
. The third command saves information about this set in the save file. For each member in this set, the information saved contains the offset, comma, blank and two periods followed by the text of the region in the set named minutes
.
[edit] See also:
exec
, export
, import
, pr
, save.commands
, save.history
[edit] Settings:
DefaultRegion
, LeftContext
, PrintLength
, Savefile
, SortOrder
[edit] save.commands
save.commands
writes information to a file about the queries in the XPAT session. These are saved in a form that allows them to be used in another XPAT session.
Save.commands
saves, in a file, all the queries that have been executed and have produced sets during the current session. These are the queries that appear in the history list. Only the command is saved in the file, not the set number or number of matches. The setting CommandFile
, that determines the file where the information is written, has a default value of xpat.cmd
. The output file can be changed at any time during the session by changing the CommandFile
setting. If a file of this name already exists, the information is concatenated onto the end of the file. Otherwise a new file is created.
The saved information can be read into a XPAT session and executed using the exec
command.
[edit] Examples:
>> "love" near "hate"
>> pr sample
>> region Q including %
>> save.region.Q %
>> {CommandFile "/usr/my_commandfile"}
>> save.commands
The second last command sets a new name for the file to be used by save.commands
; /usr/my_commandfile
. The final command saves the information about the commands that has been generated to this point in the XPAT session. In the portion of the session shown here, only two commands generated sets and so the following is saved in the file /usr/my_commandfile
.
"love" near "hate" region Q including %
[edit] See also:
exec
, export
, import
, save
, save.history
[edit] Settings:
CommandFile
[edit] {SaveFile}
{SaveFile
string}
changes the file name used by save
.
The SaveFile
setting determines the file written by the save
command. It has a default value of xpat.res
. If the string begins with a numeral, or contains blanks or non-alphanumeric characters, it must be enclosed within double quote
marks. The file name must also conform to the file naming convention of the host operating system. It can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session. The current value of the SaveFile
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {SaveFile "output_file"}
This changes the setting to the value output_file
so that any subsequent use of the save
command writes to this file. The name of the file is not an absolute path name and is therefore located in the current working directory.
[edit] See also:
save
, Settings
[edit] save.history
save.history
writes information to a file about the queries and results in the current XPAT session.
Save.history
writes a record of a XPAT session. XPAT's history list records information about all queries that produce sets. For these queries, save.history
saves the set number, the number of members in the set, and the query that produced the set in a file. The setting HistoryFile
, that determines the file where the information is written, has a default value of xpat.his
. A different output file can be chosen at any time during a session by changing the setting. If a file of this name already exists, the information is concatenated onto the end of the file. Otherwise a new file is created. Note that comments are saved only if they are on the same line as the command itself.
[edit] Examples:
>> "fish" near "fowl"
>> pr
>> region definition including %
>> save.history
This command saves the information from the history list in the file xpat.his
. After the sequence of commands shown above, the history list contains information about two sets which are saved into a file:
1: 142, "fish" near "fowl" 2: 17, region definition including %
[edit] See also:
exec
, export
, history
, import
, save
, save.commands
[edit] Settings:
HistoryFile
[edit] set name
*
name
refers to a named set.
Query result sets may be named and subsequently referred to either by set number or by name. The name must be preceded by an asterisk (*
) to reference the set; otherwise, XPAT interprets the name as a command or search string.
[edit] Examples:
>> univ = "university" near "MIT" >> qu = region Quote including*
univ >>*
qu including "Harvard"
The first query creates a set of matches to university
occurring near MIT
. The second query uses the set *univ
and creates a new set *qu
. The third query finds the Quote regions that include the set of matches from the first query as well as the string Harvard
.
>> begin = region "<Title>" .. "</Summary>"
>> "Paris" within *
begin
The first query defines a new region set and calls the new set begin
. The second query creates a set containing the matches to Paris
that fall within one of these regions.
[edit] See also:
region
, including
, naming sets
, set number
, within
[edit] set number
number
references a previously created set.
After the first query in a session, XPAT displays a line of the form:
1: 300 matches
The number 1 here names the set of results and can be used in subsequent searches. The valid set numbers are those displayed by the history
command.
When an invalid set number is used XPAT generates a message. If, for example, set number 33 is referenced before it has been calculated or after it has been freed the message is:
Expression 33 is out of range
[edit] Examples:
>> region Author including 5
In this query, the number 5 refers to the fifth result of the session. For example, set 5 might be the set of matches to all the variants of spelling for a particular author's name.
[edit] See also:
history
, ~free
, ~qnum
, set name
[edit] Settings:
History
[edit] sets
In the XPAT system, queries are combinations of the XPAT commands described in this document. In response to each query, XPAT creates a set which is either a point set or a region set.
These sets can be used as operands in subsequent queries. In contrast to the conventional approach of a single, nested compound query, XPAT allows complex queries to be expressed as a series of simple queries. This provides an opportunity to try alternative ways of combining previous result sets to arrive at a solution. XPAT provides a history list of all previous sets created in a session and a convenient notation to access them.
A member of a point set is a location in the text which is the start of a string that continues to the end of the text. The XPAT system finds locations in the text where strings, matching pattern(s) given in the query, begin. The members of a point set are usually index points
, however, in the sets created by shift
or offsets
, (the notation [n]
), the members refer to positions in the text that may or may not be index points.
The members of a region set are substrings of the text, beginning and ending at specified points. Region sets that are the result of a query within a XPAT session are available only for the duration of the session. However, region sets can also be defined externally and be made available to the XPAT session. Each member of a region set is described by two locations in the database, indicating the start and end of the region and these locations may or may not be index points.
Region sets may be used in XPAT to restrict searches to desired parts of the text. The within
command finds the members of a set that are contained in a designated region set. The including
command finds the members of the designated region set that contain one or more members of a given set.
The sets produced within XPAT can be refined using set arithmetic or proximity commands. The difference (-
) and intersection (^
) commands remove members of an existing set. The proximity (fby
and near
) commands reduce sets by finding the members of a given set that have specified text close by.
In addition to refining sets, it is possible to combine two sets to create a larger one by using the union (+
) command.
XPAT queries, applied to a text database, may create large sets; analyzing a smaller, representative subset might aid in making decisions about how to proceed appropriately with a search. Several commands in XPAT provide this capability. Sample
creates a representative subset while subset
, next
, and first
each create contiguous smaller subsets of a larger set.
[edit] See also:
difference
, fby
, first
, including
, intersection
, near
, next
, offsets
, range
, region
, sample
, shift
, subset
, union
, within
[edit] {Settings}
{Settings}
shows the current values of a number of XPAT parameters.
[edit] Examples:
>> {Settings}
The output might be:
{CharMappings " " "Aa" "Bb" "Cc" "Dd" "Ee" "Ff" "Gg" "Hh" "Ii" "Jj" "Kk" "Ll" "Mm" "Nn" "Oo" "Pp" "Qq" "Rr" "Ss" "Tt" "Uu" "Vv" "Ww" "Xx" "Yy" "Zz" "[ " "\\ " "] " "^ " "_ " "` " "{ " "| " "} " "~ "} {StopWords} {WordStarters " \P" "\P-" "-\P" "\P<" "\P&."} {SortOrder AsIs} {PrintLength 64} {LeftContext 14} {Proximity 80} {SampleSize 10} {SaveFile xpat.res} {CommandFile xpat.cmd} {ExportFile xpat.exp} {HistoryFile xpat.his} {History 0} {QuietOff}
[edit] shift
shift
set1
creates a point set whose members are a specified distance from a set of matches.
Shift
creates a new set whose members are locations in the text which result from an equal shift being applied to all members of set1. Set1 may be either a point set or a region set. If it is a region set, the resulting set is a point set containing the first of the pair of pointers describing each member of the set. The set created by the shift
command is always a point set.
The shift
command creates a new set consisting of pointers that are (by default) 10 characters after the original set members. The set created is in occurrence order. This default shift distance (10) and direction can be changed by using a modifier attached to the shift
command. The modifier is of the form of a period followed by a positive or negative integer. If the modifier is a negative integer n, each member of set1 is shifted to n characters before the original location. If it is a positive integer n, the shift is to n characters after that location.
The points in the new set need not be index points.
[edit] Examples:
>> shift
"dog and cat"
This creates a new point set whose members are 10 characters after each match to the string dog and cat
.
>> "<Tag>"
>> pr
>> shift.5
%
>> pr
The first query creates a point set of matches to the string <Tag>
. The ordering of this set is alphabetical. The third line of the example creates a second point set whose members start 5 characters after a string <Tag>
. The members now point to the start of the contents of the tagged region. This set is in occurrence order. Thus, the order in which the members of the set are displayed by the second pr
is not necessarily the same as that seen with the first pr
command.
>> shift.5
region Tag
>> pr
Assuming that region Tag
describes a set that begins with a tag <Tag>
, the first query creates a point set whose members now point to the start of the contents of Tag regions. The set created is the same as the one in the above example.
>> pr shift.-20
"the best of times"
This displays more context before the match points without having to change the LeftContext
setting. The pr
command displays members of a set of matches to the string the best of times
with 34 characters showing before the matches (the default 14 characters plus 20 additional resulting from the shift
command). Because of the shift
command, this set is displayed in occurrence order. Note, as with other commands, since the pr
and shift
are on the same line the set is not saved, does not appear on the history list and so is not accessible by a set number.
[edit] See also:
offsets
, region
[edit] signif
signif
set1
finds frequently occurring words or phrases in the text.
Signif
finds the most frequent words or phrases following the text matching set1. Set1 can be either a point set or a region set for the first two forms of the command but has a restriction, as noted below, for the third form of the command. The set (or sets) created are point sets. If set1 is not given, signif
uses the last set produced as its operand.
The signif
command looks for words or phrases, in contrast to the other XPAT commands that operate on points and regions. For signif
, a word is defined as a string of characters ending either in a blank or a character that has been mapped to a blank by the character mappings used in building the index being used in the current XPAT session.
The signif
command examines the words or phrases that begin with a string. The string used is the longest string common to the text pointed to by each of the members of set1. If set1 is a point set resulting from a string search, signif
starts with that string. If set1 is not given and the previous result was from a signif
command, the string is the one associated with the set created. (Note that, in addition to the number of matches in the set, signif
returns a string value.) If set1 is a region set, the first of the pairs of pointers describing the regions are used to find any common string beginning in these regions. For example, this might be the pattern used to define the regions. If set1 is a point set that is not the result of a string search, XPAT checks for a common string beginning the text pointed to by each member of set1. In some cases, this common string is the null (empty) string.
Signif
has three different modes of operation. A modifier can be attached to the signif
command. The syntax of the modifier is a period followed by an integer n.
The first mode of the command, signif
with no modifier, finds a frequent word or phrase and then, by reapplying signif
to the resulting set, may be used to extend this phrase. In this mode, the command finds the string and then finds all possible extensions, in the text, of this string up to the next blank. Signif
creates the set, among the matches to these possible extensions, with the most members.
The second mode of the command, signif
with a positive integer n as modifier, finds the most commonly occurring phrase of length n words beginning with the string. The set created contains those members of set1 that are the matches to the most frequent phrase beginning with the given string that is at least n words in length.
The third mode of the command, signif
with a negative integer n as modifier, creates n sets which are the matches to the n most frequent phrases beginning with the string. This use of the signif
command is restricted to sets matching a string or a set created by a range
command. Using any other set as an operand is illegal and results in the error message:
Repetitive signif should be on strings or ranges only
Note that the set created by signif
is identical to that created by signif.-1
but not necessarily to the one created by signif.1
.(See the examples.)
The displayed output from the signif
command gives the number of matches and the word or phrase found (preceded by text=). The text is shown with the character mappings applied and stop words removed (see data dictionary documentation). For example, this means, if >
has been mapped to a blank and the word the
is not a stopword, one would see the following:
>> signif
"<HL>"
2: 604 matches, text=<hl the
[edit] Examples:
>>signif
" >>signif
>>signif.3
" >>signif.-10
"
The above queries, except the second one, operate on the entire text. In these cases, the string that the signif
command starts with is the empty string. The first query finds the most frequent word or phrase that occurs in the text. The second query operates on the set created by the first query and extends this result by one word. The third query finds the most frequent phrase of at least three words that occurs in the text. The fourth query finds the ten most frequent words or phrases within the text.
>> signif
"y"
This finds the most frequent word that starts with the letter y
in the text database.
>> signif
"to be"
Note that the string to be
used by the signif
command does not end in a blank. Since signif
looks at all extensions of the string up to the next blank, the only phrases eligible to be the most frequent phrase starting with this string are two-word phrases. In many texts, the most likely set created with this command would be the point set matching the two-word phrase to be
(ending in a blank).
>> signif
"to be "
In this example, the string given ends in a blank so the possible extensions of this string that are examined by signif
are all three-word phrases. The point set created as the answer to this command is the set of matches to the three-word phrase starting with the two words to be
that occurs most frequently in the text.
>> signif.1
"to be"
This command creates a set of the most frequent phrase whose word length is one. The newly created set is the set of matches to the word to
that are contained in the set of matches to the phrase to be
. This means that the size of the set is probably smaller than the set of matches to to
but that the text shown is the string to
. Also note that this set is not equal to the set created in the preceding example or to the set created in the following example.
>> signif.-1
"to be"
This command finds the most frequent phrase that begins with the string to be
. The answer to this is a two-word phrase that is identical to that found in the second example.
>> signif.-3
"to be"
This command finds three sets that are the matches to the three most frequent phrases beginning with the string to be
. The first set is the same set created in the example before last. The next set is created by applying signif
to two sets and comparing the resulting sets. Signif
is first applied to a set that is created by taking the difference between the set represented by the original string and the new set just created. Signif
is also applied to the set just created. The larger set from these two signif
applications is the second answer. This same procedure is repeated on the original set and on the two new sets to obtain the third set.
>>signif.4
"to be" >>signif
The first command creates the set of matches to the most frequent four-word phrase that begins with to be
. The second signif
is applied to the resulting four-word phrase. Since this result ends in a blank, the second Signif
searches for the most frequently occurring five-word phrase that begins with the four words located by the first command.
>> "aba" .. "abz"
>> signif
The first command creates a point set that matches all strings that are alphabetically between aba
and abz
. Signif
applied to this set creates a set of matches to the most frequent word in the text that begins with ab
.
>> auth = region "<A>" .. "</A>" >>signif
*auth >>signif.2
*auth >>signif.-4
*auth
The first command creates a region set. The second command creates a set representing the most frequent string at the beginning of these regions. Assuming that in the character mappings the >
is mapped to a blank, the string that signif
uses to find the extension consists at least of the word <A>
. Thus, the set created is the set of matches to the string <A>
followed by at least one other word. The third command gives the most common two word phrase starting the region set named auth
. The first word of this phrase will be <A>
. The last command is illegal and results in the following error message:
Repetitive signif should be on strings or ranges only
>> signif.-4
"<A>"
This command creates four sets. These are the sets of matches to the four most frequent phrases starting with the string <A>
. Notice that, if the >
has been mapped to a blank, the phrases are at least two words in length.
>> sample.100 "<A>"
>> signif.-2
%
This use of signif
is illegal since signif
with a negative modifier may be applied only to a set matching a string or created by a range
command. An error message is generated.
[edit] {SortOrder}
{SortOrder
number}
determines the ordering of a set.
The behaviour of first
, next
, pr
, save
, subset
and ~nextemp
are affected by the ordering associated with their operands. The SortOrder
setting indicates whether these sets are to be treated in alphabetical order or in the order that members of the set occur in the text (occurrence order). (A SortOrder
setting of OccurHead
(explanation below) also determines what is displayed by the pr
and save
commands.)
Every set in XPAT has an internal ordering which varies from set to set, as described below. The ordering is chosen by XPAT, for reasons of efficiency, and no assumptions can be made in this regard. It is often desirable, however, to present results in a certain order, and the SortOrder
setting exists to control this. When a set is an operand of a pr
or save
command or a new set is created by a first
, next
, subset
or ~nextemp
command, the ordering of the set and hence the behaviour of the command is determined by the SortOrder
setting. This may mean that the existing set must be reordered for processing with these commands. For some XPAT commands this results in a change in the internal ordering, and this change is reflected when subsequently operating with a SortOrder
setting of AsIs
(explanation below). The ordering of a set that is not an operand to one of the above commands is not affected when the SortOrder
setting changes.
The permissible values for the SortOrder
setting are AsIs
, Alpha
, Occur
and OccurHead
. The default value of SortOrder
is AsIs
. If the SortOrder
setting is AsIs
, the set is processed in the order in which it currently exists. If the SortOrder
setting is Alpha
, the set is processed in alphabetical order. If the SortOrder
setting is Occur
or OccurHead
, the set is processed in occurrence order. For sets whose internal ordering is not alphabetical, for example region sets, displaying results with a SortOrder
setting of Alpha
will require resorting which may result in additional computation delay depending on the set size.
Setting the SortOrder
setting to Occur
results in further changes to the behaviour of the pr
and save
commands. For Pr
and save
, with a SortOrder
setting of AsIs
, Alpha
or Occur
, the position offset for each set member is displayed. With a SortOrder
setting of OccurHead
, the contents of a named region set are output in place of the position offset. Setting SortOrder
to OccurHead
requires reference to two regions within the brace brackets used to change the SortOrder
setting. The first region referenced is the one whose contents are displayed in place of an offset when members of a set are displayed. The second region referenced must be one that contains both the match points of the members of a set and the first region referenced in the SortOrder
setting (OccurHead
).
The text displayed in place of the offset, as a result of SortOrder
being set to OccurHead
, is the first region found of the specified type within the containing region (also specified in the OccurHead
setting). If the text to be displayed begins with an opening angle bracket, the text until the closing angle bracket is ignored and the next character is displayed. If the next character is another angle bracket, the preceding process is repeated iteratively. A maximum of 10 characters or up to the next <
in the text is displayed. Both these region sets, named in the OccurHead
setting, must be in the data dictionary. If they are not, for example if X
is named in the setting but does not exist in the data dictionary, the following error message results:
No information for region X in the data dictionary
The SortOrder
setting can be changed at any time during a XPAT session and remains in effect until it is changed again or until the end of the session. The current value of the SortOrder
setting is displayed by the command {Settings}
.
[edit] Examples:
>> {SortOrder Alpha}
>> pr %
The first command sets the SortOrder
setting so that the displayed set, following the pr
command, is in alphabetical order.
>>{SortOrder Occur}
>> sample "Moriarity" >> pr % >>{SortOrder AsIs}
>> pr %
The sample set created in the first example has an alphabetical ordering. With the SortOrder
setting of Occur
, the first pr
displays the set in occurrence order. However, after the SortOrder
is reset to AsIs
, the set that is printed after the next pr
is displayed in alphabetic order. Note that the sample set is not affected by the SortOrder
setting at the time of its creation and that the reordering for printing is temporary.
>> {SortOrder OccurHead LF E}
>> pr "shaks"
One effect of setting the SortOrder
to OccurHead
is that the set is displayed in occurrence order by the pr
command. The beginning of each line, following the pr
command, contains the starting characters of the first region, named LF
, that occurs with the region named E
containing a member of the point set matching the string shaks
.
>> Ondaatje >> pr subset % >>{SortOrder Occur}
>> pr subset % >>{SortOrder AsIs}
>> pr %
The subset displayed by the first pr
command is shown in alphabetical order. With the SortOrder
set to Occur
, the next pr
command displays the subset in occurrence order. With the SortOrder
set to AsIs
, the final pr
displays the set of matches to the string Ondaatje
in occurrence order since this point set was reordered permanently as a result of the subset command executed when the SortOrder
was Occur
.
[edit] See also:
first
, next
, ~nextemp
, pr
, save
, Settings
, subset
[edit] stop
stop
terminates a XPAT session. The use of this command causes the session to end and the XPAT process to exit. A message may be generated telling how much computer time has been used during the XPAT session.
[edit] See also:
done
, quit
[edit] string search
A command consisting only of a string causes XPAT to search for occurrences of the string in the text database. A set is created whose members are matches to all index points in the text that begin with the given string. A match occurs when the given string (after having the character mappings applied to it and stopwords removed) is the same as the text that begins at an index point (also having had the character mappings applied and stopwords removed). Searching for phrases with a XPAT index is as fast as searching for a word or a prefix of a word. After a search, the number of matches to the pattern is displayed, but the results of the search are not shown unless requested by a pr
command.
[edit] Examples:
>> in
If the index currently being used is based on words, the matches returned from this input string are the matches to all the phrases in the text that begin with the two characters in
. That is, there will be matches to strings beginning with the word in
as well as to strings beginning with inside
, into
etc. In order to match only strings beginning with the word in
a blank must be added to the search string and the string enclosed within quotation marks.
If each character is indexed, the matches returned would also include strings that appear as part of words such as within
and getting
.
If the index has been made with character mappings that map upper case to lower case, the matches would also include matches to strings that include In
.
>> "to be or not to be that is the question"
If the index used when searching for the above string was created with the stopwords to
, be
, or
, not
, that
, is
and the
, this string search is equivalent to a search on the string question
.
[edit] See also:
data dictionary documentation, double quote
, index point
, offsets
, quiet mode
, range
, shift
[edit] subset
subset
set1
finds a number of contiguous members of a set.
Subset
creates a set of a specified size containing members starting at a designated location in set1. The members of the new set are in the order they appear in set1. Set1 may be a region set or a point set. The new set is of the same type as set1.
The operation of the subset
command is affected by the size of the set requested and the SortOrder
setting.
The ordering of a set, and hence which members are chosen to be in the set created by the subset
command, is controlled by the SortOrder
setting. If the SortOrder
setting is Alpha
, the set is ordered alphabetically; if it is Occur
or OccurHead
, the set is ordered as the members occur in the text; and if the SortOrder
setting is AsIs
, the set order is the current one and may thus be either alphabetic or occurrence order. The location within set1 to start selecting members for the new set is indicated by a numeric location in the ordered set. This numeric location is given to the subset
command as a modifier attached to the command. The modifier is in the form of a period followed by an integer that can be either positive or negative. A positive integer gives the desired location relative to the beginning of the set and a negative integer gives it relative to the end of the set. Without any modifier the subset
is taken from the start of set1.
The size of the set created is determined by the value of the setting SampleSize
which has a default value of 10. If the size of set1 is less than SampleSize
, then the new set created is the same size as set1. Changing the SampleSize
setting affects all subsequent uses of subset
during the current session. The size of the subset
can be specified for an individual use of the command by using a second modifier attached to the already modified subset
command. This modifier is also of the form of a period followed by a numeric value giving the desired set size.
The subset
command can be used by itself or with the pr
, save
or export
commands. The subset
may only be used in conjunction with these commands.
[edit] Examples:
>> {SampleSize 40}
>> subset
%
The first command changes the SampleSize setting and the query in the second line returns a set that contains the first 40 members of the most recent result in the session.
>> subset
.10 "Montreal "
This query creates a set of 40 members (assuming the SampleSize setting in the first example) starting at the tenth member in the set of matches to the string Montreal
.
>> subset
.-10 "Montreal "
This is similar to the previous query but the new set starts at the tenth member from the end of the set. Therefore, the resulting set size is only 10 even though the SampleSize setting is 40.
>> subset
.5.30 %
This query creates a set of 30 members starting from the fifth member of the most recent result in the session.
>> {SortOrder Occur}
>> subset
.-20.20 5
The query in the second line of the example creates a set containing the final 20 members in the set represented by set number 5. The SortOrder
setting means that both set number 5 and the new set are in occurrence order.
[edit] See also:
first
, next
, ~nextemp
, sample
[edit] Settings:
SampleSize
, SortOrder
[edit] ~sync
~sync
string
outputs a tagged identifier.
The command ~sync
is available only when the XPAT session is operating in quiet mode. ~sync
outputs a message tagged with Sync tags containing the given string. This command is mainly used when XPAT is integrated into a more complex system. The output from the ~sync
command can then be used to identify a position in an input stream when information is being received from several different sources.
[edit] Examples:
>> ~sync
"festival"
The output from this command is the tagged string: <Sync>festival</Sync>
.
[edit] See also:
~qnum
[edit] thesaurus
provides an efficient way to describe patterns that have some common quality. For example, if many searches of a database involve finding references in the text to money in different currencies, the thesaurus provides the capability to define a variable describing all the possible patterns to be used in these searches.
The thesaurus
variable is defined in a file named in the data dictionary. Within the thesaurus file, each separate variable, called a word, is surrounded by <Entry> tags. Within the <Entry> tags are other tagged areas: the name of the variable is contained within <Word> tags, followed by the associated query contained within <Query> tags. The thesaurus capability is implemented using macros
so the query may be a complex one creating more than one set. The same cautions, described for macros, apply to bracketing and syntax errors.
To invoke a thesaurus variable the name is preceded by the character <. For example, a thesaurus variable named money
may be used within a XPAT session as follows:
<"money"
As with macros, thesaurus invocations are replaced by an exact copy of the definition. This means that a thesaurus variable can be used as an operand in other XPAT queries. Note, however, that it may be necessary to bracket the entire invocation in order to ensure correct results from the query. In practice, bracketing the definition itself is a good general method.
If an undefined thesaurus variable is used, for example <testing
, the following error message is generated.
The macro testing is undefined
[edit] Examples:
>> (<"policy"
) near (<"economy"
)
Assume that the following tagged data appears in the thesaurus file reference in the data dictionary for the XPAT session.
<Entry> <Word>economy</Word> <Query>("economic " + "fiscal " + "monetary " + "economy")</Query> </Entry> <Entry> <Word>policy</Word> <Query>("policy " + "policies ")</Query> </Entry>
The query shown finds any matches to either of the strings described by the thesaurus variable policy
that occur near any of the strings that are part of the union described by the thesaurus variable economy
.
>> *speaker including <"macbeth"
This assumes that the following tagged data appears in the thesaurus file.
<Entry> <Word>macbeth</Word> <Query>"macbeth" - (shift.5 "lady macbeth")</Query> </Entry>
The query shown finds those members of the region set defined by the name speaker
that contain Macbeth
but not Lady Macbeth
.
[edit] See also:
data dictionary documentation, macro
[edit] union
set1 +
set2
combines two sets.
The union
operator (+
) creates a new set containing the members of both set1 and set2, with duplicates removed. Set1 and set2 can be either point sets or region sets. If either of set1 or set2 is a point set the new set is also a point set.
If both set1 and set2 are region sets and there is no overlap or nesting of any member from set1 and any member from set2, the union set is a region set. If overlap or nesting occurs, set1 and set2 are treated as point sets by using the first of each pair of pointers describing the regions in the sets. The new set created is the union of these point sets. The following message is generated when this occurs:
Warning: Addition of Region objects produced a region with overlaps -- simplified into a point set
Note that if both set1 and set2 are region sets and a member of set1 coincides exactly with a member of set2, this is not considered to be an instance of overlap or nesting; rather, these members are considered identical and only one will be a member of the output set.
[edit] Examples:
>> USA + "U.S.A" + "United States" + "America "
This query creates a new point set containing all matches to each of the individual sets in the query.
>> region Title + region Summary
Assuming that the members of region Title and of region Summary do not overlap or nest, this query creates a new region set containing all the members of both regions.
>> *your_region + region First
Assuming that your_region
was created during the current XPAT session and contains a member that overlaps one or more members of region First
, this query creates a point set. The members in the new set consist of the first of the two pointers that describe the members of your_region
and of region First
. XPAT prints a warning message before printing the number of matches in the new set.
[edit] See also:
difference
, intersect
[edit] within
set1 within
set2
finds members of a set within a given region.
Within
creates a set containing those members of set1 that are located in one of the regions of the text described by set2. Set1 may be either a point set or a region set. Set2 must be a region set. The new set is of the same type as set1.
Set2 may be a predefined region set, a region set that has been created within the XPAT session using the region
command, a region set resulting from the use of the import
command, or the result of a previous query in the session.
If set1 is a point set, each member is examined to see if it falls within a region from set2 in order to determine inclusion in the new set. If set1 is a region set, the first of the pair of pointers (offsets into the text) describing each member is examined to see if it falls within a region of set2. The second pointer of the pair does not have to fall within a region of set2 for the region to be included in the new set. That is to say, if set1 and set2 are both region sets and they overlap, members of set1 are included in the result of within
if they begin within a member of set2.
The command not within
creates a set containing those members of set1 that are not in any of the regions described by set2.
set1 not within
set2
is the same as
set1 - (set1 within
set2)
Including
and within
are similar in that they both restrict searches to specified regions in the text. They differ in the set that is created. The including
command creates a set of regions that contain one or more members of another set, while within
creates a set of pointers or regions that are contained in members of a region set.
[edit] Examples:
>> "Cohen" within
region Speaker
In this example, the predefined region Speaker
defines regions of the text that contain speakers' names. This query creates a set of matches to Cohen
that falls within the regions described by region Speaker
.
>> "Fontaine" not within
region Speaker
This query finds all references to Fontaine
that are not located within one of the regions describing a speaker.
>> first = region "<Etym>" .. "</Language>" >> ("Spanish"within
region Language)within
*first
The first query defines regions of the text that start at the string <Etym>
and end with the string </Language>
. The second query finds all the matches to Spanish
that are within a Language region and also within one of the newly defined regions.
[edit] See also:
import
, including
, not
, region