The Civil War Diaries: A Case Study

From DLXS Documentation

Revision as of 18:55, 7 August 2007 by Cboulay (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

The Civil War Diaries Interface: A Case Study Leaving aside all the markup issues that are inherent in this project and generally outside the scope of the interface designer (but which nonetheless have major impact on usability), the changes that WMU staff asked for provide a nice primer on things that generally can be altered in the interface.

This was initially put online with a minimal interface with minimal options. There was no browsing (as we started with one diary, it seemed pointless). The basic search was available within “full text” with searches limited by author, title, and citation. KWICs were enabled for results, and notes were shown as inline. Add browses, change notes to pop-ups: WMU immediately asked that a browse be provided and that notes not show inline. These are simple changes in the collmgr’s browseable/browsenav and displaynotesinline fields. browseable was set to yes and browsenav was set to 0 (as there will only ever be a handful of diaries, the alphabetic rulers seemed like overkill); displaynotesinline was set to no. The browse script (in /l1/bin/browse) was then run to populate the browse table.


Additional “search within” values: WMU next asked that we add subject as an option for “Search in” and “Limit to.” This requires that “subject” be added to the collmgr’s bibsearch and termsearch fields. However, for this to function properly and not simply appear in the pulldown menu, you also need a mapping in the mapfile for the collection to direct the search to the proper elements in the indexed text. civilwar1 initially used the philamer.map, but it was clear that this will be a collection with a fair amount of customization and will require a mapfile of its own (/l1/misc/t/text/maps/civilwar1.map). At this point (with only one diary in hand), the subject mapping could safely be defined as

 <mapping>
   <label>subject</label>
   <synthetic>SUBJECT</synthetic>
   <native>region TERM</native>
   <nativeregionname>TERM</nativeregionname>
 </mapping>

However, it seems that the markup in the collection is elaborate enough that TERM elements may eventually appear in places aside from the KEYWORDS element in the HEADER (the canonical location for subject terms), so it is more prudent to provide a more specific mapping.

 <mapping>
   <label>subject</label>
   <synthetic>SUBJECT</synthetic>
   <native>region subject</native>
   <nativeregionname>subject</nativeregionname>
 </mapping>

This required a fabricated region “subject” be created in the extra.srch file:

(region TERM within region KEYWORDS); {exportfile "/l1/idx/c/civilwar1/subject.rgn"}; export; ~sync "subject";

and that the “make post” indexing step be rerun.

More explicit division headings for DIVs with no HEADs: WMU also asked if we could do something about the headings appearing in results lists and the TOC view. Generally, the headings are pulled from the “div1head” region et al (depending on how deeply subdivided a text is, you can have div2head, div3head, etc.), which is defined as the HEAD element in a given DIV1 or the tag itself if there is no HEAD. This diary has no HEADs, but the encoders have provided a TYPE of “entry” for each DIV1, along with the ISO date in the N attribute, so the tag looks like:

<DIV1 NODE="USCW0001.0001.001:34" TYPE="entry" N="1862-09-14"> However, general DLXS settings show only the TYPE attribute in the absence of a HEAD:


In order to locate where such styles are handled, grepping for TYPE within the XSL stylesheets in the /l1/web/t/text directory is almost always fruitful. The proper file in this case is scopedivs.xsl, as what needs to be changed is the label for the “scoped heads” as they are generally referred to in DLXS. A local version of the scopedivs.xsl was created in the web directory for civilwar1. It imports the main scopedivs.xsl and contains only the altered “Divhead” template. The top of the file is as follows:

<?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet version="1.0"

 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:func="http://exslt.org/functions" 
 xmlns:dlxs="http://dlxs.org"
 extension-element-prefixes="func dlxs" 

exclude-result-prefixes="func dlxs">

<xsl:import href="../../t/text/scopedivs.xsl"/>

<xsl:strip-space elements="*"/>

<xsl:template match="BuildDivHeadLinkLabel">

scopedivs.xsl is very complicated as a whole, as it involves printing the labels and building the links whether there are HEAD elements or not, whether we are dealing with serial articles or not, etc. However, the small portion that is relevant here is fairly straightforward, and again, searching for TYPE and realizing that you are concerned with the cases where DIVs have no HEADs will steer you to the proper portion of the template:

<xsl:otherwise> <tr> <td>

<xsl:choose> <xsl:when test="not(child::HEAD) and ../@TYPE!='entry'"> <xsl:value-of select="../@TYPE"/> </xsl:when>

<xsl:when test="not(child::HEAD) and ../@TYPE='entry'"> <xsl:value-of select="../@TYPE"/> <xsl:text>: </xsl:text> <xsl:value-of select="../@N"/> </xsl:when>

<xsl:when test="not(child::HEAD) and not(../@TYPE)">

<xsl:text>Section</xsl:text>

</xsl:when>

<xsl:otherwise>
<xsl:for-each select="HEAD">

<xsl:apply-templates mode="procdivhead" select="."/> <xsl:if test="position()!=last()">

			<xsl:text> </xsl:text>

</xsl:if> </xsl:for-each> </xsl:otherwise>

So, if there is no child HEAD and the TYPE is not “entry”, show the value of the TYPE attribute (example 1. above). If there is no child HEAD and the TYPE is “entry”, show the value of the TYPE attribute followed by a colon and space and then the value of the N attribute (example 2. above). The remainder of the template deals with cases where there is neither a HEAD nor a TYPE (print the word “Section”) and those cases where there are HEADs, in which case it will show each HEAD (some DIVs may have more than one) separated by non-breaking spaces ( ).

Here is the results list after the scopedivs.xsl for civilwar1 is placed into its web directory:

Customized phrase-level markup rendering styles: The diary entries have a great deal of markup of individual words and phrases. Place names, personal names, dates, etc., are all wrapped in elements and have attributes expanding or clarifying them (creating a de facto authority file); additions and deletions (using the ADD and DEL elements of the TEI), and switches in handwriting (for example, <HI1 REND=“underlined”> or <HI1 REND=“superscript”>) are all captured as well. Such things are present in the basic DLXS package, but because WMU’s encoding practices vary from our standard, more customizations needed to be made. The most straightforward is that the appropriate REND behaviors were not present; things marked as superscript or underlined appeared as plain text.

Rendering of elements gets handled in the XSL as a conversion to HTML that will be treated by CSS. That is, the <HI1 REND="superscript">st</HI1> in the XML gets converted to st (essentially, it takes the content of the element, wraps it in a span, and gives it a class of rend-hyphen-“value of the REND attribute”). Our CSS files didn’t have a class of “rend-superscript” – rend-sup and rend-super were there, though, since we tend to abbreviate such values – so a textclass-specific.css file was needed for the collection. Again, since it was clear there would be more custom styling to come, it seemed best to copy the whole “rend styles” section of the textclass.css:

/* rend styles start */ .rend-center { text-align: center;} .rend-italics,.rend-italic,.rend-i,.rend-ital { font-style: italic; } .rend-italicsunderlined {

       font-style: italic;
       text-decoration: underline;

} .rend-u,.rend-und,.rend-underlined2x,.rend-underlined3x,.rend-underlined,.rend-underline,. rend-double-underline { text- decoration: underline; } .rend-indented { margin-left: 1.0em; } .rend-sc,.rend-smcap { font-variant: small-caps; } .rend-bold, .rend-b { font-weight: bold; } .rend-bolditalic {

       font-weight: bold;
       font-style: italic;

} .rend-scital {

       font-variant: small-caps;
       font-style: italic;

} .rend-strike { text-decoration: line-through; } .rend-sub { vertical-align: sub; } .rend-super,.rend-sup,.rend-superscript {

       vertical-align: super;
       /* avoid too much line offset: */
       font-size: 75%;

} .rend-sup-und,.rend-supund {

       vertical-align: super;
       text-decoration: underline;
       /* avoid too much line offset: */
       font-size: 75%;

}

/* rend styles end */

All of the renderings needed were already present in the CSS; the more verbose class names merely needed to be added to the existing groups. Note superscripting of the dates on the first line:


Note the phrase “in the morning” in red above, which is encoded with an ADD element. This was not how the WMU staff envisioned additions appearing; they also wanted deletions to be shown with a strike-through, which had been considered but rejected as possibly too illegible by previous DLPS interface designers. Phrase-level markup is generally handled in text.components.xsl, and as with the changes to the div1heads, a custom version of the stylesheet was created for the collection and placed in /l1/web/c/civilwar1. Here is the beginning of the file:

<?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet version="1.0"

 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:func="http://exslt.org/functions"
 xmlns:dlxs="http://www.umdl.umich.edu/dlxs"
 extension-element-prefixes="func"
 exclude-result-prefixes="func dlxs">
<xsl:import href="../../t/text/text.components.xsl"/>

The new definitions for the conversion of the XML to HTML were added to the collection-specific stylesheet:

<xsl:template match="DEL"> <xsl:apply-templates/> </xsl:template>

<xsl:template match="ADD"> <xsl:apply-templates/> </xsl:template>

Now added text is shown as superscript:


Removal of redundant page labels: Because WMU staff had provided page labels in their PB attribute values, links to page images had redundant descriptive language:


Labels like this are contained in the langmap file found in /l1/web/t/text. The label is defined there as <Item key=”headerutils.st.page”>Page </Item> .To override the labels available for all of Text Class in general, a langmapextra.en.xml file was placed in /l1/web/c/civilwar1 containing the following empty descriptor:

<ColLookupTables>

  <Lookup id="headerutils">

<Item key="headerutils.str.page"> </Item>

 </Lookup>

</ColLookupTables>

This provides a slightly cleaner link:


Pop-up the regularized forms of names:

As mentioned before, WMU staffers have encoded many names – of battles, people, and places – to specifically identify them by providing regularized versions of the names, like

<NAME TYPE="place" REG="Jeffersonville (Ind.)"> Jeffersonville</NAME> and <NAME TYPE="person" REG="Davis, Jefferson Columbus, 1828-1879">Genl.<LB/>Jeff C. Davis</NAME>.

They wanted the regularized forms to “pop up” when the user clicked on the names. Because of our institutional commitment to minimize the use of javascript, we decided to try using CSS “tooltips” to provide this functionality. In the interface, the NAMEs are underscored with a dashed line and a text box containing the normalized value pops up when the user mouses over the NAME; the name itself is highlighted. In the example below, the third NAME is being moused over; you can see that there are two other normalized names in the entry. (The word “wagon” is highlighted in yellow because it was the search term that led to this entry.)


Implementing the tooltips is a two-step process. First, a template for NAME needed to be added to the text.components.xsl in /l1/web/c/civilwar1:

<xsl:template match="DIV1//NAME"> <a class="info" href="#"><xsl:value-of select="."/><xsl:value-of select="@ REG"/></a>

</xsl:template>

This wraps the content of the NAME element in an <a> tag and places the content of the REG attribute in a tag. The examples from above become

<a class=”info” href="#"> Jeffersonville Jeffersonville (Ind.)</a> and <a class=”info” href="#">Genl.<LB/>Jeff C. Davis Davis, Jefferson Columbus, 1828-1879</a>.

Then, the following styles need to be added to the textclass.specific.css file in /l1/web/c/civilwar1:

a.info{

   position:relative; /*this is the key*/
   z-index:24;
   color:#000;
   border-bottom:1px dashed #000;
   text-decoration:none}

a.info:hover{z-index:25; background-color: #dad1b2;}

a.info span{display: none}

a.info:hover span{ /*the span will display just on :hover state*/

   display:block;
   position:absolute;
   top:2em; left:2em; width:15em;
   border:1px solid #dad1b2;
   background-color: #f5f5dc; color:#000;
   text-align: center;
   text-decoration:none}

Customized browses in addition to author/title: In addition to regularized forms of names, the WMU staff had identified topics within the diary entries. This had been done using SEGs for topics — <SEG TYPE="transportation">crossed on boats</SEG> — in addition to the NAMEs mentioned above — <NAME TYPE="battles" ID="battles4" REG="Perryville, Battle of, Perryville, Ky., 1862">battleground</NAME>.

They wanted to have indexes of these values that users could browse, as well as the usual browse list of diary authors and titles. This meant that customized browse pages needed to be built. This hybrid approach makes use of the automatic browse building, plus additional hand-coded HTML pages, with links to the other browse pages either coded in the HTML or supplied by a collection-specific browse.xsl in /l1/web/c/civilwar1.

So, in addition to my first steps in customization of setting browseable to yes and browsenav to 0 in collmgr, I created three HTML pages: browse.html, browsetopic.html, and browsename.html. browse.html is a file that links to all the browse options, including the cgi-driven option:

<A HREF="/cgi/t/text/text-idx?page=browse;c=civilwar1">Browse the Civil War Diaries by author/title</A>

browsename.html and browsetopic.html are basically canned searches that will take users to the diary entries containing those names or topics. They were created by using xpatu to pull the values out of the collection and then wrapping them in the necessary HTML and cgi values. All the data is as it was provided in the collection, with no change of capitalization, pluralization, etc.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html xmlns:dlxs="http://dlxs.org"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>The Civil War Diaries</title> <link rel="stylesheet" type="text/css" href="/t/text/textclass.css"> <link rel="stylesheet" type="text/css" href="/c/civilwar1/textclass-specific.css"> </head> <body class="defaultbody" bgcolor="#ffffff">

Browse by Topic | <a href="/c/civilwar1/browsename.html">Browse by Name</a> |  <a href="/cgi/t/text/text-idx?page=browse;c=civilwar1">Browse by Author/Title</a>
Browse by:
Topic
There are 7 topics in this collection
<thead class="browselist"></thead>
Topic
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dafrican-americans">african-americans</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dclothing">clothing</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Ddeath-and-casualties">death-and-casualties</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dfood">food</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dmusic">music</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dreligion">religion</a>
<a href="/cgi/t/text/text-idx?c=civilwar1;cc=civilwar1;type=simple;rgn=full%20text;q1=TYPE%3Dtransportation">transportation</a>



</body> </html>


The links to the other browse options are part of the HTML code (circled in the HTML provided above).


In the browse by author/title page generated by running the browsebuilder, links to the hand-coded browse options are provided by placing a collection-specific browse.xsl into /l1/web/c/civilwar1:

<xsl:stylesheet version="1.0" xmlns:xsl=http://www.w3.org/1999/XSL/Transform xmlns:exsl="http://exslt.org/common"> <xsl:import href="../../t/text/browse.xsl"/>

<xsl:template name="collSpecificText"> <xsl:text> | </xsl:text><xsl:element name="a">

<xsl:attribute name="href"><xsl:text>/c/civilwar1/browsename.html</xsl:text></xsl:attribute> Browse by Name</xsl:element><xsl:text> | </xsl:text>

<xsl:element name="a"> <xsl:attribute name="href"><xsl:text>/c/civilwar1/browsetopic.html</xsl:text></xsl:attribute>

Browse by Topic </xsl:element></xsl:template>

</xsl:stylesheet>


Banners, tab colors, and customized TOC view: WMU staffers wanted to make aesthetic changes to the site, with a graphical banner instead of the plain text “Civil War Diaries” and matching the tabs to the color scheme of the banner. Additionally, because the diaries are previously unpublished, many of the labels that come “out of the box” in DLXS were not quite appropriate.

Changing the banner was very simple – the banner (named banner.jpg) they provided was placed into the directory /l1/web/c/civilwar1/graphics and the collmgr primarytitle was changed to read graphic:banner.jpg .

The colors of the navigation tabs were changed in the textclass-specific.css file.

/* STYLES FOR NAVIGATION AND MENUS */ td.mainnavcell {

background-color: #A2A0AB;
padding-left:20px;
padding-right:20px;
border-bottom: 1px solid #666666;}

.navcolor { background-color: #8A7B90; }

Here is the look with the new banner and color scheme:


WMU wanted to suppress some existing metadata (Print source) and show additional pieces of the metadata in the header, so a collection-specific version of tocheader.xsl was placed in the /l1/web/c/civilwar1 directory. In the place of Print source, which shows metadata from the SOURCEDESC, they wanted to display their notes. A template was added for notes:

   <xsl:template match="NOTE">
       <xsl:apply-templates/>
   </xsl:template>

which was called instead of the SOURCEDESC, after AVAILABILITY:

   <xsl:variable name="availability">
     <xsl:copy-of select="HEADER/FILEDESC/PUBLICATIONSTMT/AVAILABILITY/P"/>
   </xsl:variable>
   <xsl:variable name="notesstmt">
     <xsl:copy-of select="HEADER/FILEDESC/NOTESSTMT/NOTE"/>
   </xsl:variable>

Labels for various metadata sections were also changed, as they wanted Publisher instead of Publication Info and Rights instead of Availability. This was done in the langmapextra.en.xml, which had been previously created to change the PB metadata filtering.

<ColLookupTables>

  <Lookup id="headerutils">

<Item key="headerutils.str.page"> </Item> <Item key="headerutils.str.publicationinfo">Publisher</Item> <Item key="headerutils.str.22">Rights</Item> <Item key="civil.str.notes">Notes</Item>

 </Lookup>

</ColLookupTables>


The Collected Works of Abraham Lincoln Interface: A Case Study

We have hosted the online version of The Collected Works of Abraham Lincoln for a number of years. Recently, the staff at the Papers of Abraham Lincoln decided to add additional Lincoln writings – some found in the Making of America, others that they are having digitized – to their online collection. However, they wanted the Collected Works to remain searchable alone, as that has been widely used and contains primarily Lincoln’s own writing, whereas the other texts contain some of Lincoln’s words interspersed among the authors’ own prose discussing Lincoln, history, etc.

Here’s what the search interface looked like initially:


Repurposing the genre restriction: The most expedient way to add a search restriction to a collection is to repurpose the existing genre and gender restrictions. Since there are only two choices here (in the Collected Works or not), the gender restriction probably could have repurposed here. However, since this change is in response to outside input, there could easily be more changes ahead as the Papers of Abraham Lincoln staffers come up with additional possibilities for text groupings to be offered to users; using the genre restriction seemed a better long-term choice.

In order to use this restriction, DLXS requires a map of the region to look in for the identifying values, and then individual maps for each restriction to list out the search strings to be found in that region and the label you would like to appear in the interface. (The choice of “All” is there by default.) It was necessary to examine the collection carefully to find what could distinguish the Collected Works from all the other texts. It needed to be something that was identical in all the other texts (so individual IDs or titles were out) and yet different in the Collected Works. Because of changing encoding practices over time, the FILEDESC PUBLISHER values were different in the two batches of texts:

In the Collected Works, the publisher of the electronic file was listed as University of Michigan Digital Library Production Services .

In the other texts, the publisher of the electronic file was listed as University of Michigan Library .

Had such a distinction not been readily available, something would have had to have been added to the HEADER to facilitate these groupings. In fact, if it ends up that further, more fine-grained restrictions are desired, such metadata changes could still be a possibility. As it is, the following mapping was added to the lincoln.map file to describe the region in which the restriction search would take place:

<mapping>

 <label>genre</label>
 <synthetic>GENRE</synthetic>
 <native>region PUBLISHER not within region SOURCEDESC</native>
 <nativeregionname>PUBLISHER</nativeregionname>

</mapping>

These mappings order the labels in the pulldown menu and indicate the string that should be searched within the region mapped above:

<mapping><genreorder>1</genreorder><genrelabel>Collected Works</genrelabel><genrenative>University of Michigan Digital Library Production Services</genrenative></mapping> <mapping><genreorder>2</genreorder><genrelabel>Other texts</genrelabel><genrenative>University of Michigan Library</genrenative></mapping> Next, collmgr was changed to indicate that there needed to be a genre restriction; this is done by adding the word “mapped” to the singlegenre field and checking in the collection.

Here is the result:


These are not, however, genres per se. I wanted to change the label to something more relevant to the choices the users were being offered (and yes, the collection name will be changing as well, but they haven’t provided that yet). Labels like this are contained in the langmap file found in /l1/web/t/text. To override the labels available for all of Text Class in general, a langmapextra.en.xml file was placed in /l1/web/l/lincoln containing the following:

<ColLookupTables>

  <Lookup id="searchforms">
     <Item key="searchforms.str.19">Restrict to volumes:</Item>
 </Lookup>

</ColLookupTables>

While that is not ideal label either, it does give them something neutral to respond to and possibly change. Here’s what we have now:

Match search form colors to new index page color scheme: Lincoln staffers provided a new design for the index page, to match the redesign of their own website. In order to tie the search pages in with the new color scheme, the gray bar holding the main navigation tabs was changed to bright blue. This was done in the textclass-specific.css file in /l1/web/l/lincoln/ . The background-color for the mainnavcell and the background-color for the navcolor was changed to #39478e;.

/* STYLES FOR NAVIGATION AND MENUS */ td.mainnavcell {

background-color: #39478e;
padding-left:20px;
padding-right:20px;
border-bottom: 1px solid #666666;}

.navcolor { background-color: #39478e; }

This makes the bar solid; these could have been set to two different colors to have individual tabs against a contrasting background.

Personal tools