DLXS to DC

From DLXS Documentation

Revision as of 16:33, 20 August 2008 by Khage (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Main Page > Ancillary Resources > OAI Provider > DLXS to DC

Contents

[edit] Converting DLXS Data to DC

This page provides some instruction for converting DLXS collections to Dublin Core (DC) so that they can be loaded into the OAI database table for the UMProvider. Depending on your data, these scripts and stylesheets may require some tweaking to work for you. The stylesheets and scripts can be found in $DLXSROOT/bin/o/oai/provider/.

[edit] Extract Headers

ExtractHeaders.pl will extract the headers from XPAT for given collections and places the extracted data in $DLXSROOT/bin/o/oai/headres/.

# ./ExtractHeaders.pl
USAGE:
	-h this usage message
	-c the full path of xml file that contains a list of collections to convert (required).
	-u update time: only process collections which have been updated since this given time 

The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.

[edit] ConvertToDc.pl

The ConvertToDc.pl script takes a XSLT file, the XML file containing the data and a XML file containing the list of collections that should be converted.

# ./ConvertToDc.pl
USAGE:
	-h this usage message
	-t the full path of the xsl file that does the Text class to dc transformation (required)
	-b the full path of the xsl file that does the Bib class to dc transformation (required)
	-a the full path of the xsl file that does the transformation from text class to articles to dc (required)
	-c the full path of xml file that contains a list of collections to convert (required).
	-d the directory that contains the header XML files to parse (required)

The collections XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>. The headers XML file is assumed to be in the format <RSet>..<HEADER>...</HEADER>...</RSet>. (It will contain extra xpat wrappers between the <RSet> and <HEADER> tags.) Converted files (in DC) are stored in $DLXSROOT/prep/o/oai/provider

Example:

./ConvertToDc.pl -c listOfColls.xml -t textClassToDc.xsl -b bibClassToDc.xsl -a articlesToDc.xsl -d /l1/prep/o/oai/headers/

The resulting data should end up in $DLXSROOT/prep/o/oai/provider/. From there you can use the LoadOai.pl script to load that XML data into the OAI tables.

USAGE:
	-h this usage message
	-c the full path of xml file that contains a list of collections to convert (required).
	-u update time: only process collections which have been updated since this given time 

The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.

[edit] XSLT

The XSLT stylesheets that we have as examples to use are:

  • textClassToDc.xsl - Stylesheet that does the XML transformation from Text Class to DC
  • bibClassToDc.xsl - Stylesheet that does the XML transformation from Bibliographic Class to DC. This is used for static collections. It takes the series title and puts in a dc:source tag.
  • articlesToDc.xsl - Stylesheet that does XML transformation from Text Class serial collections to DC.

</div>

Personal tools