Setting up Dynamic Browsing

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
(Populating the item browse tables: explanation of -r parameter on ub.pl)
Current revision (15:40, 24 July 2012) (edit) (undo)
(Populating the item browse tables)
 
(6 intermediate revisions not shown.)
Line 19: Line 19:
     <p>Provisions will soon be made for creating static HTML browse pages with the
     <p>Provisions will soon be made for creating static HTML browse pages with the
       database populator script. </p>
       database populator script. </p>
-
   
+
 
 +
<div class="release_15">
 +
<p>
 +
'''Release_15''': Dynamic browsing is no longer limited to item metadata starting with 'a'..'z'.
 +
</p>
 +
<ul>
 +
<li>Punctuation items are removed (e.g. &quot;These things&quot; is filed under <em>t</em>)</li>
 +
<li>Values are transliterated when indexed to match MySQL query semantics (e.g. <tt>zó</tt> is the same as <tt>zo</tt>).</li>
 +
<li>Charactered that do not transliterate to 'a'..'z' are valid browse items, e.g. <tt>æ</tt>, Greek notation, and numbers.</li>
 +
</ul>
 +
<p>
 +
More items should be discoverable through browse.
 +
</p>
 +
</div>
 +
 
==Item browse database tables==
==Item browse database tables==
Line 133: Line 147:
         can be obtained by subclassing the browse update modules.
         can be obtained by subclassing the browse update modules.
         The currently available browse update module values are as follows:
         The currently available browse update module values are as follows:
-
         <dl>
+
         </p>
-
            <dt>ImageClass</dt><dd><tt>BrowseUpdate/ImageMysqlBU</tt></dd>
 
-
            <dt>FindaidClass</dt><dd><tt>BrowseUpdate/FindaidBU</tt></dd>
 
-
            <dt>TextClass</dt>
 
-
            <dd>
 
-
              <dl><dt>encodingtype = <tt>monograph</tt></dt>
 
-
                <tt>BrowseUpdate/MonographBU</tt>
+
; ImageClass
-
              </dl>
+
: <tt>BrowseUpdate/ImageMysqlBU</tt>
-
              <dl><dt>encodingtype = <tt>serialissue</tt></dt>
+
; FindaidClass
-
                <tt>BrowseUpdate/SerialIssueBU</tt> <br />Note that newspapers are serialissue encodingtype.
+
: <tt>BrowseUpdate/FindaidBU</tt>
-
              </dl>
+
; TextClass
-
              <dl><dt>encodingtype = <tt>serialarticle</tt></dt>
+
::; encodingtype = <tt>monograph</tt>
 +
::: <tt>BrowseUpdate/MonographBU</tt>
 +
::; encodingtype = <tt>serialissue</tt>
 +
::: <tt>BrowseUpdate/SerialIssueBU</tt>
 +
::: Note that newspapers are <tt>serialissue</tt> encodingtype.
 +
::; encodingtype = <tt>serialarticle</tt>
 +
::: <tt>BrowseUpdate/SerialArticleBU</tt>
 +
 
-
                <tt>BrowseUpdate/SerialArticleBU</tt>
 
-
              </dl>
 
-
            </dd>
 
-
        </dl>
 
-
        </p>
 
       </li>
       </li>
        
        
Line 200: Line 211:
     <p class="command">$DLXSROOT/bin/browse/updatebrowsedb.pl class=AAA c=BBB
     <p class="command">$DLXSROOT/bin/browse/updatebrowsedb.pl class=AAA c=BBB
       host=CCC row=DDD</p>
       host=CCC row=DDD</p>
-
     <p>where <span class="command">AAA</span> is either &quot;text&quot; or &quot;image&quot;;; <span class="command">BBB</span> is
+
     <p>where <span class="command">AAA</span> is either &quot;text&quot; or &quot;image&quot;; <span class="command">BBB</span> is
       the collection id of the collection you want to create browsing for; <span class="command">CCC</span> is
       the collection id of the collection you want to create browsing for; <span class="command">CCC</span> is
       the name of the host on which resides the XPAT index for the collection
       the name of the host on which resides the XPAT index for the collection
Line 213: Line 224:
       will be used; otherwise, the  
       will be used; otherwise, the  
       <span class="command">host</span> field will be used. </p>
       <span class="command">host</span> field will be used. </p>
 +
 +
===ItemColl XMLMETA and ImageClass===
 +
 +
ImageClass will create XMLMETA from the collection's <tt>thumbnailresflds</tt> value.
[[#top|Top]]
[[#top|Top]]

Current revision

Main Page > Working with DLXS Components > Working with the Collection Metadata Database > Setting Up Dynamic Browsing

Contents

[edit] Overview

Through a combination of database tables and collmgr field configuration, dynamic browsing is available through the DLXS middleware. When the proper CGI-based URL is received, the middleware checks for certain metadata in the database tables, packages some of it in XML, uses some of it, as is in its stored format of XML, and lets XSLT format the results. At run time, no XPAT queries are needed, only MySQL queries are run against the item browse tables in the dlxs database.

After the tables are prepared, that is, populated with item metadata through the $DLXSROOT/bin/browse/updatebrowsedb.pl script (aka "database populator"), configuration for the behavior of dynamic browsing is accomplished by modifying certain collmgr fields.

Provisions will soon be made for creating static HTML browse pages with the database populator script.

Release_15: Dynamic browsing is no longer limited to item metadata starting with 'a'..'z'.

  • Punctuation items are removed (e.g. "These things" is filed under t)
  • Values are transliterated when indexed to match MySQL query semantics (e.g. is the same as zo).
  • Charactered that do not transliterate to 'a'..'z' are valid browse items, e.g. æ, Greek notation, and numbers.

More items should be discoverable through browse.

[edit] Item browse database tables

There are three tables in the dlxs database that are specifically used for dynamic browsing in the middleware:

  • ItemColl
  • ItemBrowse
  • ItemBrowseCounts

The ItemColl table holds, for each item/document and collection combination, one row containing the following columns:

  • the idno
  • the collection id
  • the modification date of the row's information
  • XML metadata about the item: in the case of TextClass, this is simply the DLXSTEXTCLASS/HEADER element that is retrieved from an XPAT query in the "database populator" script. In the case of ImageClass, the Perl subclass used by the database populator grabs information from the MySQL or XPAT data and wraps it in specific XML before filling in this field.

The ItemBrowse table holds, for each item/document's browseable field (e.g., author, title, etc.):

  • the idno
  • the collection id
  • the field name
  • the value of the field
  • rank (not currently used)

The ItemBrowseCounts table holds, for each collection, a list of rows containing:

  • the colleciton id
  • the field name
  • the first character or the first two characters of the sortable field's value (sortable title, author's name)
  • the count of items that begin with that first character or those first two characters

[edit] Collmgr fields for configuration

The main fields in the collmgr that need to be set properly are:

  • locale
  • browsenav
  • browsefields
  • browseupdatemodule

See Configure the collmgr fields below for more information.


[edit] Preparing collections for browsing

[edit] Configure the collmgr fields

Start collmgr and change the following fields:

  • devhost: if you are running the database populator in a development environment (that is, where DLPS_DEV environment variable is set), you can have the middleware use XPAT-indexed data that is on a machine different from the usual host. This can be useful for testing purposes.
  • locale: (this should be changed to a UTF-8 type of encoding, e.g., en_US.UTF-8)

  • browseable A "yes" (case-insensitive) value in this field enables the browse tab in the user interface. If a file in the collection-specific web directory for this collection contains a file named browse.html that page will be served but only if browsefields is empty. Fallback is applied to select the correct browse.html file for collection-specific customization. This supports static browse pages. If browse.html is not present a dynamic browse page will be served based on data from the browse database. When a dynamic browse page is served the browsenav field value is consulted and must be defined.

  • browsenav: enter 0, 1, or 2. If you want no paging, that is, that all items in the colleciton appear on one HTML page for the user to browse, enter 0. Enter 1, if you want "one level of browsing", that is, that a separate page be created for each first character of the value in question (e.g., title or author) and that a navigation bar be built that allows the user to navigate to each page, for example, jump to the page listing items whose value begins with "M". Entering 2 in this field will create a "two-level browse", where two navigation bars will be created. The first bar will allow the user to jump to items whose values begin with a particular first character (e.g., jump to the records that begin with "B"). The second navigation bar will allow the user to jump to items whose values begin with a particular two-character combination, (e.g., records that begin with "Bu"). This decision is left to the collection coordinator. We have found that the level is based on how many total items there are in the collection and therefore what is a reasonable number of browseable items for a single HTML page.
  • browsefields: list the browseable fields for the collection. For example, some collections may have only title browsing, others may need both title and author, etc. For Text class and Findaid class, the first field becomes the default browse page Leave this field empty to enable static browsing.

  • browseupdatemodule: specifies the name of the browse update Perl module that will be used by the updatebrowsedb.pl script to populate the database. This value is analogous to the appmodule and subclassmodule fields. (This field exists as of Release 12a; it supersedes a Perl configuration hash used in Release 12.)

    The module files are located in DLXSROOT/bin/browse. If a dynamic browse page is to be served this field must have a value. Specialized behavior can be obtained by subclassing the browse update modules. The currently available browse update module values are as follows:


    ImageClass
    BrowseUpdate/ImageMysqlBU
    FindaidClass
    BrowseUpdate/FindaidBU
    TextClass
    encodingtype = monograph
    BrowseUpdate/MonographBU
    encodingtype = serialissue
    BrowseUpdate/SerialIssueBU
    Note that newspapers are serialissue encodingtype.
    encodingtype = serialarticle
    BrowseUpdate/SerialArticleBU


[edit] Populating the item browse tables

To initially populate or to update the item browse tables, there is a script called updatebrowsedb.pl which is located in $DLXSROOT/bin/browse. Running this program will populate or update the rows necessary in each of the three ItemBrowse related tables. These tables will be queried when the user requests browsing from the middleware.

However, please note: unless you are making changes to or need to debug updatebrowsedb.pl, you should use the "wrapper" shell script provided in the same subdirectory. This wrapper is called ub and was written to ensure that updatebrowsedb.pl

  • is run from the "release" directory and not from a particular developer's directory (for more information about a development environment which uses multiple developers' directories and environments, click here)
  • runs with certain environment variables properly set
  • assumes the use of the "production" row, if no row is specified, when setting the host from which data will be read

$DLXSROOT/bin/browse/ub -C class -c collection [ -r row [ -h host ] ] [ -f ] [ -p ]

  • -f : is optional but if supplied, the wrapper will run the updatebrowsedb.pl script without asking for confirmation
  • -p : is used to "purge" all records from the browse tables for a particular collection without re-populating (updating) the collection's browse information
  • -r : Optional. Specifies which row in collmgr should be consulted for this collection. Without it, the "production" row will be assumed and used.
  • -h : if row is supplied, host is optional. The script will force updatebrowsedb.pl to use the host given. If no row is supplied, host cannot be supplied.

If, for any reason, you must override the assumptions made by the "wrapper" script, you can always run the updatebrowsedb.pl directly by entering:

$DLXSROOT/bin/browse/updatebrowsedb.pl class=AAA c=BBB host=CCC row=DDD

where AAA is either "text" or "image"; BBB is the collection id of the collection you want to create browsing for; CCC is the name of the host on which resides the XPAT index for the collection (this is not relevant to ImageClass, which uses MySQL for all queries); and DDD is key to the row in the database you wish to use (production, dlxsadm, or an individual developer's id). For example, you may want to point the script at new or test data on a machine that is different from your production machine. You could accomplish this by changing the host or devhost field for the collection in the collmgr. NOTE: If DLPS_DEV is set when you invoke updatebrowsedb.pl (without the wrapper ub), the devhost field will be used; otherwise, the host field will be used.

[edit] ItemColl XMLMETA and ImageClass

ImageClass will create XMLMETA from the collection's thumbnailresflds value.

Top

Personal tools