Customizing BibClass

From DLXS Documentation

Revision as of 08:53, 10 October 2007 by Khage (Talk | contribs)
Jump to: navigation, search

Main Page > Mounting Collections: Class-specific Steps > Mounting a Bib Class Collection > Customizing BibClass

Contents

Adding New Searchable Fields

Overview

A basic set of fields has been declared in the bib.map file and these can be used along with collmgr to enable or turn on searches for those fields. The basic fields currently include:

  • author
  • title
  • entire record
  • publisher
  • place of publication
  • year (i.e., date of publication)
  • series
  • notes
  • collection (i.e., collection ID)
  • format (e.g., HTML)
  • type (e.g., text)
  • language
  • id (i.e., the ID of the record)
  • dt (i.e., the OAI-specified date of last update for a record)

and all are based on encoding that conforms to the bib.dtd.

It may be necessary to define additional fields or to substitute values (based on different encoding practices) for the values used in this bib.map. This section outlines the steps necessary to create a new map file, and the steps in the collmgr needed to take advantage of these new (i.e., additional or different) fields.

Creating and Modifying the Map File

Rather than modifying the file bib.map, a new map file should be created for each collection. The file bib.map and the new map file you will create are (or will be) located in the {DLXSROOT}/misc/b/bib/maps directory. For the purposes of this documentation, we will use the example "patent.map" for a hypothetical collection with the collection ID "patent". For more information, go to a generic overview of map files.

Each mapping element in a map file consists of the following:

label
This element determines what will display in the user's browser when constructing searches, and is case sensitive. It must match the value used in the collmgr.
synthetic
This element determines the variable name that is used in the cgi.
native
The "native" element provides an appropriate XPAT search that the system will use to discover the appropriate content. The search may be simple (e.g., region YR) or complex (e.g., (region origin within region citation) + (region publish within region citation))
nativeregionname
This element is not used by BibClass and so content within the element may be omitted.
summarylabel
This element determines what will display in the user's browser when reporting the search and corresponding results, and is frequently used to declare an abbreviation such as "date" for "date of publication".

Create the file patent.map with any standard text editor and copy the contents of bib.map to it. Each map file consists of a section of "operator" mappings (e.g., ), followed by a section of "region" mappings; each section is introduced by a comment such as <!-- operator mapping -->. An example of an operator mapping is as follows:

<mapping>
   <label>And</label>
   <synthetic>AND</synthetic>
   <native>^</native>
</mapping>

Operator mappings use the synthetic element in BibClass; region mappings do not.

Begin by removing all mapping groups that are not relevant to your collection. For example, let us say that our example collection, patent, consists of records without authors; in this case, we would remove the region mapping for author that looks like this:

<mapping>
   <label>author</label>
   <synthetic></synthetic>
   <native>region L</native>
   <nativeregionname></nativeregionname>
   <summarylabel>au</summarylabel>
</mapping>

Next, add new mappings for the fields that you would like to make searchable in the interface. (Remember, each BibClass record will typically be searchable across all fields by using the mapping entire record, which is mapped to region A.) In the following three examples, we declare (1) a new, simple field based on a single element, (2) a re-mapping of an element already declared for BibClass, and (3) a new element based on the combination of two complex searches:

<mapping>
   <label>patent number</label>
   <synthetic></synthetic>
   <native>region PATNO</native>
   <nativeregionname></nativeregionname>
   <summarylabel>patent #</summarylabel>
</mapping>
<mapping>
   <label>year</label>
   <synthetic></synthetic>
   <native>region YR within region citation</native>
   <nativeregionname></nativeregionname>
   <summarylabel>year</summarylabel>
</mapping>
<mapping>
   <label>data type</label>
   <synthetic></synthetic>
   <native>(region geoform within region citation) + (region formname)</native>
   <nativeregionname></nativeregionname>
   <summarylabel>data type</summarylabel>
</mapping>

Collmgr Steps Needed to Support Additional Searchable Fields

You will need to have the colldb database point to your new map file, and to declare the new searchable fields for the user interface. First, in collmgr's map field, declare your map file (e.g., patent.map). Next, in collmgr's regionsearch field, add your searchable fields. These must correspond to the label elements in the mapping, described above.

You may also need to support searchable fields such as date of publication (e.g., for sorting) or ID (for record-oriented operations like using the bookbag), but not need to show these as being searchable in the user interface. In these cases, you should create mappings, but not declare the regions in collmgr.

Creating and Editing Filtering Routines

Overview

BibClass is delivered with a default set of filtering routines based on the DTD and relatively conventional uses of the DTD. The content you are putting online may employ unusual use of the tags, may require different display labels, or may simply use different encoding that consequently requires different filtering routines.

The default filtering can be found in the {DLXSROOT}/cgi/b/bib/ folder. By using the keyword BibClass for the subclassmodule field in collmgr, you will be drawing on the corresponding subroutines in the file {DLXSROOT}/cgi/b/bib/BibClass.pm. New, custom filters can be written to override those in the base class BibClass.pm by creating subclass modules and placing them in {DLXSROOT}/cgi/b/bib/BibClass/, e.g., for the patents collection, patents.pm. Any methods in a new module will override the ones in BibClass.pm.

Creating New Filters

Declaring new subroutines

Each new filter is declared by creating a method in a new .pm file in {DLXSROOT}/cgi/b/bib/. You create a method and supply the transformations that will take place in that method. In this example, we will create a "short" record display filter for a collection called "patents".

sub ShortRecordFilter {

... and ...

}

Within the braces, you will supply the filtering parameters. For example:

 my $self = shift;
 my ( $i, $cgi, $dso ) = @_;
    $$i =~ s,</A>,,gs;
    $$i =~ s,<A [^>]*>,,gs;
    $$i =~ s,<T>(.*?)</T>,<strong>Publisher:</strong> $1<br>,gs;
 # more transformations follow

Creating transformations

Transformations in BibClass are done in perl using regular expressions. Many can be relatively simple, substituting an HTML value for an existing SGML or XML value. For example, in order to transform the content of the element V, which in BibClass represents an address, into a link of text with a preceding label, an expression like the following would be used:

 $$i =~ s,<V>(.*?)</V>,<strong>Address:</strong> $1<br>,gs;

Similarly, an element URL which has an ID value and no other attributes, could be transformed in the following way if the ID of the URL had no real value in the display of information for users:

 $$i =~ s,<URL ID="[^"]*">(.*?)</URL>,<strong>URL:</strong> <A HREF="$1">$1</A>,gs;

And, in the following case, the contents of the IDNO element with an attribute whose value is "BMP" could be transformed in the following way into HTML:

 $$i =~ s,<IDNO TYPE="BMP">(.*?)</IDNO>,<strong>BMP number:</strong> $1<br>,gs;

While many transformations are relatively simple, as in these cases, it is possible to use the entire range of possibilities in perl to create more sophisticated displays, even moving entire blocks of information around for the eventual browser output. At the end of this document you will find two examples that represent the range of these possibilities. The American Film Institute Index filtering routine is especially complex, and is designed to take a significant body of encoded information and create a display similiar to the printed entries from the source.

Sample Filtering Routines

Times of London/Palmer's Index to the Times filtering

 sub ShortRecordFilter
 {
     my $self = shift;
 
     my ( $i, $cgi, $dso ) = @_;
 
     $$i =~ s,</A>,,gs;
     $$i =~ s,<A>,,gs;
     $$i =~ s,<A [^>]*>,,gs;
     $$i =~ s,</?B>,,gs;           ## ----- TITLESTMT
     $$i =~ s|<F>\s*<K[^>]*>(.*?)</K>\s*<Z>\s*<YR>(.*?)</YR>\s*<PG>(.*?)</PG>\s*</Z>\s*</F>|qq{<strong>Citation:</strong> $1, } . &BibClassUtils::_YYYYMMDD2English($2) . qq{, $3<br>}|gse;    ## ----- SERIESSTMT, CITE, YR, PG
     $$i =~ s,<K[^>]*>(.*?)</K>,<strong>Title:</strong> $1<br>,gs;    ## ----- TITLE
     $$i =~ s,(.*)<L[^>]*>(.*?)</L>,<strong>Author:</strong> $2<br>$1,gs;  ## ----- AUTHOR
     $$i =~ s,<H[^>]*>.*</H>,,gs;      ## ----- SOURCEDESC
     $$i =~ s,<I2[^>]*>.*</I2>,,gs;    ## ----- TEXTCLASS
 }
 
 sub LongRecordFilter
 {
     my $self = shift;
 
     my ( $i, $cgi, $dso ) = @_;
 
     $$i =~ s,</A>,,gs;
     $$i =~ s,<A>,,gs;
     $$i =~ s,<A [^>]*>,,gs;
     $$i =~ s,</?B>,,gs;
     $$i =~ s,</?I2>,,gs;
     $$i =~ s|<F>\s*<K[^>]*>(.*?)</K>\s*<Z>\s*<YR>(.*?)</YR>\s*<PG>(.*?)</PG>\s*</Z>\s*</F>|qq{<strong>Citation:</strong> $1, } . &BibClassUtils::_YYYYMMDD2English($2) . qq{, $3<br>}|gse;
     $$i =~ s,<K[^>]*>(.*?)</K>,<strong>Title:</strong> $1<br>,gs;
     $$i =~ s,(.*)<L[^>]*>(.*?)</L>,<strong>Author:</strong> $2<br>$1,gs;
     $$i =~ s,<H[^>]*>\s*<P>(.*)</P>\s*</H>,<strong>Volume:</strong> $1<br>,gs;
     $$i =~ s,<KW[^>]*>\s*<AF>(.*?)</AF>\s*</KW>,<strong>Subject:</strong> $1<br>,gs;
 }

American Film Institute Index filtering

 sub ShortRecordFilter
 {
     my $self = shift;
 
     my ( $i, $cgi, $dso ) = @_;
 
     my $i = shift;
 
     # This section is getting rid of bounding elements with no
     # specific content other than other elements
     $$i =~ s,</A>,,gs;
     $$i =~ s,<A [^>]*>,,gs;
     $$i =~ s,<SOMHD>,,g;
     $$i =~ s,</SOMHD>,,g;
 
     # Now some basic formatting things for both short and long
     $$i =~ s,<PLS></PLS>,,g;
     $$i =~ s,<NUM>[^<]*</NUM>,,g;
 
     # Now some things that are for short or long
     $$i =~ s,<K>([^<]*)</K>,<strong>Title:</strong> $1<br>,gs;
     $$i =~ s,<YR>([^<]*)</YR>,<strong>Release year:</strong> $1<br>,gs;
     $$i =~ s,<DIR><NAME>([^<]*)</NAME><NAMEINV>[^<]*</NAMEINV></DIR>,<strong>Director:</strong> $1<br>,g;
     $$i =~ s,<DCO>([^<]*)</DCO>,<strong>Distribution company:</strong> $1<br>,g;
     $$i =~ s,<PCO>([^<]*)</PCO>,<strong>Production company:</strong> $1<br>,g;
 
     # Now to get rid of some things specifically for long
     $$i =~ s,<DIR><NAME>[^<]*</NAME><NAMEINV>[^<]*</NAMEINV><CREDIT>[^<]*</CREDIT></DIR>,,g;
     $$i =~ s,<DIR>.*?</DIR>,,g;
     $$i =~ s,<ANI>.*?</ANI>,,g;
     $$i =~ s,<ART>.*?</ART>,,g;
     $$i =~ s,<ATI>.*?</ATI>,,g;
     $$i =~ s,<BR>.*?</BR>,,g;
     $$i =~ s,<CAS>.*?</CAS>,,g;
     $$i =~ s,<CDT>.*?</CDT>,,g;
     $$i =~ s,<CHR>.*?</CHR>,,g;
     $$i =~ s,<CNO>.*?</CNO>,,g;
     $$i =~ s,<COC>.*?</COC>,,g;
     $$i =~ s,<COL>.*?</COL>,,g;
     $$i =~ s,<COPYRIGHT>.*?</COPYRIGHT>,,g;
     $$i =~ s,<COS>.*?</COS>,,g;
     $$i =~ s,<CREDIT>.*?</CREDIT>,,g;
     $$i =~ s,<CTL>.*?</CTL>,,g;
     $$i =~ s,<CTY>.*?</CTY>,,g;
     $$i =~ s,<DAN>.*?</DAN>,,g;
     $$i =~ s,<DRF>.*?</DRF>,,g;
     $$i =~ s,<DRM>.*?</DRM>,,g;
     $$i =~ s,<DRR>.*?</DRR>,,g;
     $$i =~ s,<EDI>.*?</EDI>,,g;
     $$i =~ s,<ESG>.*?</ESG>,,g;
     $$i =~ s,<EST>.*?</EST>,,g;
     $$i =~ s,<ETH>.*?</ETH>,,g;
     $$i =~ s,<GEN>.*?</GEN>,,g;
     $$i =~ s,<LANG>.*?</LANG>,,g;
     $$i =~ s,<MAK>.*?</MAK>,,g;
     $$i =~ s,<MTX>.*?</MTX>,,g;
     $$i =~ s,<MUS>.*?</MUS>,,g;
     $$i =~ s,<NAME>.*?</NAME>,,g;
     $$i =~ s,<NAMEINV>.*?</NAMEINV>,,g;
     $$i =~ s,<NOT>.*?</NOT>,,g;
     $$i =~ s,<PCN>.*?</PCN>,,g;
     $$i =~ s,<PDA>.*?</PDA>,,g;
     $$i =~ s,<PDQ>.*?</PDQ>,,g;
     $$i =~ s,<PHO>.*?</PHO>,,g;
     $$i =~ s,<PHY>.*?</PHY>,,g;
     $$i =~ s,<PRE>.*?</PRE>,,g;
     $$i =~ s,<PRM>.*?</PRM>,,g;
     $$i =~ s,<PRO>.*?</PRO>,,g;
     $$i =~ s,<RDO>.*?</RDO>,,g;
     $$i =~ s,<RDT>.*?</RDT>,,g;
     $$i =~ s,<SAU>.*?</SAU>,,g;
     $$i =~ s,<SBA>.*?</SBA>,,g;
     $$i =~ s,<SBI>.*?</SBI>,,g;
     $$i =~ s,<SCT>.*?</SCT>,,g;
     $$i =~ s,<SDO>.*?</SDO>,,g;
     $$i =~ s,<SER>.*?</SER>,,g;
     $$i =~ s,<SET>.*?</SET>,,g;
     $$i =~ s,<SFX>.*?</SFX>,,g;
     $$i =~ s,<SIG>.*?</SIG>,,g;
     $$i =~ s,<SOU>.*?</SOU>,,g;
     $$i =~ s,<STA>.*?</STA>,,g;
     $$i =~ s,<STX>.*?</STX>,,g;
     $$i =~ s,<SUM>.*?</SUM>,,g;
     $$i =~ s,<WRT>.*?</WRT>,,g;
 }
 
 
 sub LongRecordFilter
 {
     my $self = shift;
 
     my ( $i, $cgi, $dso ) = @_;
 
     my $output;
 
     # Construct output by matching specific elements of the A element
     # and reassembling them conditionally
 
     # Title (three of these K elements aren't followed by a YR
     my ( $title, $x, $year ) = ( $$i =~ m,<K>(.*?)</K>(<YR>(.*?)</YR>)?, );
     $output .= $year ? "<h3>$title ($year)</h3>" : "<h3>$title</h3>";
 
     # Next a blockquote around the production info and the alternate
     # title (no label) Looks like it's these elements:
     # PDQ|DCO|RDO|RDT|PDA|DRF|SER|PRE|DRM|COC|CDT|CNO|DRR|PHY|PCN|CTL|ATI
     $output .= '<blockquote><nowiki>';
     $output .= &AFI_DoElement( $cgi, $i, 'CTY', '(', '', '; ' );
     $output .= &AFI_DoElement( $cgi, $i, 'LANG', '', '', ') ' );
     $output .= &AFI_DoElement( $cgi, $i, 'ATI', 'Alternate Title ', 'i', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PCO', 'Production Co. ', 'i', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PDQ', '', '', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'DCO', 'Distribution Co. ', 'i', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PRE', '', '', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'RDT', 'Release ', 'i', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'RDO', 'Release ', 'i', ' ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PDA', 'Production ', 'i', ' ' );
 
     if ( $$i =~ m,<COPYRIGHT>Y, )
     {
         $output .= '[© ';
         $output .= &AFI_DoElement( $cgi, $i, 'COC', '', '', '; ' );
         $output .= &AFI_DoElement( $cgi, $i, 'CDT', '', '', '; ' );
         $output .= &AFI_DoElement( $cgi, $i, 'CNO', '', '', ';' );
         $output .= '] ';
     }
     $output .= &AFI_DoElement( $cgi, $i, 'DRM', '', '', ' min.; ' );
     $output .= &AFI_DoElement( $cgi, $i, 'DRF', '', '', ' ft.; ' );
     $output .= &AFI_DoElement( $cgi, $i, 'DRR', '', '', ' reels.; ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PHY', '', '', '; ' );
     $output .= &AFI_DoElement( $cgi, $i, 'PCN', 'PCA cert no. ', '', ' ' );
     $output .= '</blockquote>';
 
     # Source
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SOU', 'Source: ', 'strong' ) . "</p>";
     # Series
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SER', 'Series: ', 'strong' ) . "</p>";
     # Production Credits:
     $output .= "<p><strong>Production Credits: </strong>";
     $output .= &AFI_DoProductionCredit( $i, 'PRO', 'Producer' );
     $output .= &AFI_DoProductionCredit( $i, 'DIR', 'Director' );
     $output .= &AFI_DoProductionCredit( $i, 'WRT', 'Writer' );
     $output .= &AFI_DoProductionCredit( $i, 'PHO', 'Photography' );
     $output .= &AFI_DoProductionCredit( $i, 'ART', 'Art' );
     $output .= &AFI_DoProductionCredit( $i, 'EDI', 'Editor' );
     $output .= &AFI_DoProductionCredit( $i, 'SET', 'Set' );
     $output .= &AFI_DoProductionCredit( $i, 'COS', 'Costumes' );
     $output .= &AFI_DoProductionCredit( $i, 'MUS', 'Music' );
     $output .= &AFI_DoProductionCredit( $i, 'MTX', 'Music text' );
     $output .= &AFI_DoProductionCredit( $i, 'SDO', 'Sound' );
     $output .= &AFI_DoProductionCredit( $i, 'DAN', 'Dance' );
     $output .= &AFI_DoProductionCredit( $i, 'MAK', 'Makeup/hair' );
     $output .= &AFI_DoProductionCredit( $i, 'SFX', 'Special Effects' );
     $output .= &AFI_DoProductionCredit( $i, 'PRM', 'Production misc.' );
     $output .= &AFI_DoProductionCredit( $i, 'STA', 'Stand-ins' );
     $output .= &AFI_DoProductionCredit( $i, 'COL', 'Color personnel' );
     $output .= &AFI_DoProductionCredit( $i, 'ANI', 'Animation' );
     $output .= "</p>";
 
     # Cast
     $output .= "<p><strong>Cast: </strong>";
     $output .= &AFI_DoCast( $cgi, $i ) . "</p>";
     # Songs/Music
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'STX', 'Songs/Music: ', 'strong' ) . "</p>";
     # Genre
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'GEN', 'Genre: ', 'strong' ) . "</p>";
     # Broad Subject
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SBA', 'Broad Subjects: ', 'strong' ) . "</p>";
     # Specific Subject
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SBI', 'Specific Subjects: ', 'strong' ) . "</p>";
     # Plot Summary
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SUM', 'Plot Summary: ', 'strong' ) . "</p>";
     # Note
     $output .= qq{<p><font size="-1">} . &AFI_DoElement( $cgi, $i, 'NOT', 'Note: ', 'strong' ) . "</font></p>";
     # Source Citations
     $output .= "<p>" . &AFI_DoElement( $cgi, $i, 'SCT', 'Source Citations: ', 'strong' ) . "</p>";
     # Finally, a footer
     $output .= qq{<p><font size="-1"><code>[ ]</code> = offscreen credit<br>Data copyright 1999 The American Film Institute</font></p>} ;
 
     $$i = $output;
 }


Top

Personal tools