Release Notes for DLXS Release 13
From DLXS Documentation
(→New and Changed Functionality) |
(→Database Installation Notes) |
||
Line 47: | Line 47: | ||
==Database Installation Notes== | ==Database Installation Notes== | ||
- | <p>MySQL is now the only supported database type. CSV file-based database support has been removed. In order to run DLXS you will need to have a MySQL server installed. Sample data is delivered in the form of a MySQL dump file which can be directly imported into a MySQL database. The database upgrade script (<b>upgrade_5_6</b>) operates only on a MySQL database. If you have a version 5 CSV database you should run <b>dbmove</b> to move your CSV data into a MySQL database prior to running the upgrade. These issues are documented in detail in the | + | <p>MySQL is now the only supported database type. CSV file-based database support has been removed. In order to run DLXS you will need to have a MySQL server installed. Sample data is delivered in the form of a MySQL dump file which can be directly imported into a MySQL database. The database upgrade script (<b>upgrade_5_6</b>) operates only on a MySQL database. If you have a version 5 CSV database you should run <b>dbmove</b> to move your CSV data into a MySQL database prior to running the upgrade. These issues are documented in detail in the [[Installing DLXS|installation instructions]] and the [[DLXS Database Upgrade Utility|upgrade instructions]].</p> |
- | + | ||
==New and Changed Functionality== | ==New and Changed Functionality== |
Revision as of 14:38, 7 August 2007
Contents |
General Information
TextClass, FindaidClass and BibClass are substantially identical to release 12 except for bug fixes as noted below. ImageClass has undergone extensive changes focused primarily on improving performance.
Release 13 is comprised of:
- BibClass version 3.7.0
- Bibperm (Nameresolver) version 4.7.0
- broker20 version 1.4.0
- Collmgr version 3.2.2
- FindaidClass version 6 4 0
- ImageClass version 4.7.0
- Lib version 4.10.0
- TextClass version 4.6.0
- XClass version 2.2.5
- METS Pageturner version 1.12.0
- XPAT version 5.3.2
- SRU version 1.1.0
- dlxsd version 1.0.1
- Other items released
- mrsid_retrieve version 1.3.1
- tif2web version 1.0.4
- kakadu version 4.0.2
- cjpeg version 6b
- dlps-auth version 1.5
- xpatutf8check version 1.0
- utf8conditioner version 1.0.1
- Patches
Known Problems
- None
Database Installation Notes
MySQL is now the only supported database type. CSV file-based database support has been removed. In order to run DLXS you will need to have a MySQL server installed. Sample data is delivered in the form of a MySQL dump file which can be directly imported into a MySQL database. The database upgrade script (upgrade_5_6) operates only on a MySQL database. If you have a version 5 CSV database you should run dbmove to move your CSV data into a MySQL database prior to running the upgrade. These issues are documented in detail in the installation instructions and the upgrade instructions.
New and Changed Functionality
XPAT
- No changes.
Lib
- If user fails to supply email address it is now treated as a user error not a system error. [BookBag.pm]
- Added"value" to the list of cgi parameters excluded from cleaning. "value" needs to be excluded because it is used in a MySQL query for browse that does a partial phrase match (using "like"). If characters such as parenthesis are stripped out, the query fails. [CioFactory.pm]
- A modest performance improvement was achieved through a simple code change that drastically reduced the number of calls to the SetUTF8Flag routine when reading the Collection Database. [CollsInfo.pm]
- Added slices and tag lists to browsing. Added support for field specific browse level configuration. Added Browse navigation and list building was changed to optimize mysql querying and to fix anomalies in listing certain letter pairs. [DLXSApp.pm]
- Several minor additions and enhancements were made, most notably automatic reconnection to database which is especially helpful when loading large datasets. [DbUtils.pm]
- Added silent assertion, which sends an asssertion email to developers but does not disrupt the CGI run for the user. [DlpsUtils.pm]
- Added tpl parameter for specifying an arbitrary xml template file without needing to add it to the perl hash and without affecting program flow (using page param instead of tpl can affect program flow in an undesireable way). [DlpsUtils.pm]
- Code changes were made in support of collid column addition to ItemBrowse table. [Browse Related Scripts]
- Fixed a bug and improved the general situation regarding removal of
articles and punctuation from the beginning of browse strings. The bug
had to do with
the hanlding of multiple field values. [Browse Related Scripts] - Implemented "purge" to be able to remove a collection's rows from the Browse tables without repopulating them. [Browse Related Scripts]
- OAITransform has been enhanced to handle character encoding problems that may exist in incoming data.
- The newly added DLXS statistics system consists of two parts: (1) a tool to run on each web server to parse web log files, calculate hits, and insert those hits into the database, and (2) a web interface for retrieving reports such as HTML or MS Excel files.
Text Class
XML template and XSL stylesheet changes
- browse.xml - Added <BrowseLevels><?BROWSE_LEVELS_XML?></BrowseLevels>
- browse.xsl - Changes for subject browse and special 'browsefields' syntax to support browse levels per field, e.g. subject=0, author=1
- navheader.xsl - Avoid adding empty "tips=" parameter to url for navbar links to fix problem where additional url params tacked onto end create invalid syntax e.g. ...tips=target=_top
- results.xsl - Stub template for PDF link in reslist, desired. Handle common XML tags being passed through in KWICS. (Formerly handled in TextClass::CleanResidualTags.)
- resultsheader.xsl - Check for BIBLSCOPE[@TYPE='pageno'] in definePubInfoForSerialIssue. Don't display BIBLSCOPE TYPE="datesort".
- scopedivs.xsl - Support DATE in HEADs with new DATE template. In <template match="Divhead">: removed extraneous table row. Cleaned up redundant code that was also in <template name="BuildDivHeadLinkLabel"> Use BIBLSCOPE[(@TYPE='pg' or @TYPE='pageno')] instead of just pg. Separate multiple AUTHORINDs with semicolons.
- search.xsl - Move tips to below the search form so the iframe is not squeezed into a tiny box
- text.components.xsl - Filter for LIST in filterNumberedNoteWithParas. Pass through value of COLSPAN in table cells.
- tocheader.xsl - Correct formatting for multiple authors. Fixed formatting for multiple authors in printsourcestmt in OutputHeader. Made inclusion of BookmarkableUrl conditional on string not being empty. Don't display label for BIBLSCOPE TYPE="pg" if the element is empty. Multiple authors, editors formatting handled in textheader.xsl Copied AUTHOR and EDITOR templates from textheader.xsl.
CGI/Middleware
- TextApp.pm - Backward compatibility for pagenname=browseentries/Na.xml. Removed ValidityChecks setting of default value for browse. It is set when building the browse page when we know more about what is available instead of just selecting 'a'
- TextClass.pm - Allow parameter $idroot to sub FigureIDResolver to indicate extra dir level in image filepath, e.g. web/c/coll/images/idroot/idno.jpg. Extensive re-write of entity resolution in <FIGURE ENTITY=...> to support more types of resolution. Better error message when a query failure occurs fetching the text of a pageviewer page. TextClass::Filter_REFsForText now handles more kinds of REF targets. BEG Change regexp in FilterPBs_XML to accept <PB .../> or <PB ...></PB> since both are valid XML. Filter_REFsForText: check for TYPE="ptr" instead of assuming every REF with a TARGET attr is of the same type. Optimize Filter_REFsForText for case of no <REF ..> elements. Fix bug: Empty defaultpageview field is ok if pageimages field is empty or 0 in collmgr. Better error message in GetItemEncodingLevel()
Image Class
CGI/Middleware
Known Problems
- Metadata fields containing xml must be mapped to ic_xml in the field_admin_maps Collection Manager field. AND, there is a bug in the released version of the data2db.pm perl module that causes fields containing XML to be skipped completely. <a href="/products/archive-by-CDROM/13/Patches">Download</a> the data2db.pm patch file.
- July 11, 2007. data2db.pm parser for FMP DSO XML failed to handle fields where the name of the field contains special regular expression characters that needed to be quoted. <a href="/products/archive-by-CDROM/13/Patches">Download</a> the data2db.pm patch file.
- TEXT::CSV_XS Perl module must be installed (available in CPAN) in order for the image class data loading script to perform.
- The owner of a session based/temporary portfolio is not allowed to open it. For example, a non-authenticated user adds an image to a new portfolio and the user is not allowed to see the portfolio when it tries to open to display the new addition. To fix the problem, change the following lines of the GetBookBag method in $DLXSROOT/cgi/i/image/ImageApp.pm.
Oldif ( ( lc($ENV{'REMOTE_USER'}) ne lc($portfolioOwner) ) && ( ! $portfolioPublic ) ) { &DlpsUtils::errorBail( qq{Requested portfolio is not public. The owner may choose to make it so.} ); }
New
my $sessionid = &DlpsUtils::GetReadOnlySessionId(); my $username = $ENV{'REMOTE_USER'} || qq{sid-} . $sessionid; if ( ( $username !~ m/^$portfolioOwner$/i ) && ( ! $portfolioPublic ) ) { &DlpsUtils::errorBail( qq{Requested portfolio is not public. The owner may choose to make it so.} ); }
Speed/Performance Enhancements
- Sliced viewing of portfolios was added to improve performance of large portfolios and to avoid getting throttled by the web server due to many thumbnail requests at once. This changed required some rearrangement of interface functions.
- Browse results are now sliced for better performance and a list of field values with occurrence counts is presented in the sidebar for further navigation within the browse results.
- Changed Image Class searching to use UNION of sql select statements rather than a temp table. The temp table was contributing to MySQL database replication errors.
- Thumbnails are now embedded in XHTML as a data:URI, reducing the number of HTTP requests and improving performance. This does not work with IE, so, the common thumbnail linking method is used for IE.
- A limited number of search results are now cached on the session in raw unformatted form and as XML. Look-ahead cacheing is done as the end of the CGI run for the next slice as well. The cacheing improves overall performance.
Other
- Made changes to related views to support direct media links.
- Minor xsl change related to entry display of real media and pdf links.
- Separated thumbnail XSL code out into new template for easy override.
- Made changes to avoid searching all collections when no collections are selected in group search mode. It is now necessary for the user to select at least one collection.
- Improved handling of missing portfolio items.
- CSS is used to change the background color of items just added to a portfolio.
- Enlarged the display size of the DIV that encloses each thumbnail in the results view. This reduces the clipping of metadata in some cases.
- Image Class was modified to use a database table (ImageClassMediaFiles) to store and retrieve technical metadata about images instead of storing this information in text files on disk.
- Previously a record with multiple media items, each with a distinct caption, resulted in a single combined caption. Now captions are independently associated with images. Please see documentation for details of how to take advantage of this new functionality relating to ic_vi field mappings.
- Added ability to output xml (rather than xhtml or html, and without using debug=XML) after applying xslt to original xml.
- Improved retrieval of items when an exact match of the ID is not possible. For example, when an id stored in a portfolio fails to work as is, there are some simple things that can be done to find the right record in most cases.
- Fixed a bug that allowed private portfolios to be retrieved by ID, even if not the owner. This situation was not likely to occur in real use, and it did not make the content accessible, just the structure of the portfolio.
- Browse level configuration can now be done per field using Collmgr.
- Image Class items can now be retrieved by filename alone if there is an exact match with an entry.
- Changed handling of captions in related views for situation where there is no image for the view to display the caption. This is helpful with a database that has a single record with "views" for digital and slide so that the slide has reasonable representation in the related views table.
- Added a status check on prep tables to make sure there are indexes before allowing the middleware to hit on them. This adds stability in certain environments, like Michigan's where MySQL replication is in use.
- Improved handling of AUTHZD_COLL list for comparision to entry auth value for allowing/denying access to full image.
- Enhanced getimage-idx to do a better job of finding an image where the viewid has changed for some reason, and a stale external link is being used.
Data Preparation
Speed/Performance Enhancements
- Image Class was modified to use a database table (ImageClassMediaFiles) to store and retrieve technical metadata about images instead of storing this information in text files on disk. Unofficially, metadata loading is 2-4 times faster than it was before because image filenaming checking is faster with the table.
- Tables are now locked during metadata loading. This speeds up the data loading.
- Consolidated table optimizations into single MySQL statement rather than several. The same was done for index building. This is approach is much faster.
- Combined ic_all (all fields of the record) field is now made into a unique list of words to reduce the size of ic_all a little, and in turn improve search performance.
Other
- Made changes to the key Image Class data loading scripts to add an option that allows data records to be loaded to an alternative (e.g., development or prep) MySQL server. The option is configured globally in the icdbprep.cfg file, and can also be controlled with a command line parameter for load.pl, icdbprep.pl, fmpxml2mysql.pl (deprecated in favor of data2db.pl), and data2db.pl.
- The mediaprep script replaced imageprep as the utility for gathering information about media files for use with Image Class. mediaprep works with the new ImageClassMediaFiles table.
- Previously a record with multiple media items, each with a distinct caption, resulted in a single combined caption. Now captions are independently associated with images. Please see documentation for details of how to take advantage of this new functionality relating to ic_vi field mappings.
- Changed the way the ic_all field is assembled so that it includes "istruct_caption_" fields that are mapped to ic_all.
- Session timeouts during data loading are now avoided by making the sessions empty so that there is no attempt to write the session back to the session database.
- If a ic_id was > 255, index would not get added. Now, it will, for those odd situations where an ic_id is so long (probably a data error).
- Case of filename field names was not handled properly. Uppercase fieldnames were ignored. This had been fixed.
- Changed "splitrepeatingfield" routine so that it no longer splits on a comma in an ic_vi (caption) or ic_fn (filename) field. ic_vi fields often include commas not meant as delimiters.
- Added automatic database reconnection. This is most likely to aid data loading processes in certain environments. This is actually a change to DbUtils.pm, but is most beneficial to Image Class.
- Removed a filter for binary files that was getting confused about text PDFs and RAM files that simply enclose a URL for redirection.
Bib Class
- Exploratory work in BibClass IMLS subclasses for clustering/categorization support.
===Bibperm=== (Nameresolver)
Broker20
- No changes.
Collmgr
- Labels on radio buttons now clickable.
FindaidClass
XML template and XSL stylesheet changes
- browse.xml - Added <BrowseLevels><?BROWSE_LEVELS_XML?></BrowseLevels>
- browse.xsl - Changes for subject browse and special 'browsefields' syntax to support browse levels per field, e.g. subject=0, author=1
- navheader.xsl - Avoid adding empty "tips=" parameter to url for navbar links to fix problem where additional url params tacked onto end create invalid syntax e.g. ...tips=target=_top
- search.xsl - Move tips to below the search form so the iframe is not squeezed into a tiny box
- results.xsl - BEG Fixed bug in hit summary creation for boolean search results.
- text.xml - Remove TextClass PI doesn't belong <DocEncodingType><?DOC_ENCODING_TYPE_XML?></DocEncodingType>
- text.xsl - Refer to results.str.returntoresults instead of results.str.22 so "return to results" link will show up.
CGI/Middleware
- FindaidAppp.pm - Removed ValidityChecks setting of default value for browse. It is set when building the browse page when we know more about what is available instead of just selecting 'a'
- FindaidClass.pm - Added 'silent' optional parameter to ASSERT in dao resolution so an unresolvable dao does not prevent the entire page from rendering. Improved visibility of dao links and emit "[image not available]" for unresolved daos. Add 'silent' optional parameter to ASSERT so processing can continue but still give us an email warning. Move list of section heads that was hardcoded in FindaidClass::_initialize to findaidclass.cfg to make editing out those heads that are missing under sume EAD encoding practices. Catch connect exception when doing idresolver calls. Fix FilterAllDaos_XML case where there is no href attribute to prevent infinite loop.
XClass
- No changes.
Tif2web binary
XClass
- No changes.
METS Pageturner
- Ongoing development work.