DLXS to DC

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (17:33, 20 August 2008) (edit) (undo)
(Extract Headers)
 
(14 intermediate revisions not shown.)
Line 1: Line 1:
-
==Converting DLXS data to DC==
+
[[DLXS Wiki|Main Page]] > [[Ancillary Resources]] > [[OAI Provider]] > DLXS to DC
-
<div class="release"> '''Release_14:'''
+
==Converting DLXS Data to DC==
-
This page provides some instruction for converting DLXS collections to Dublin Core (DC) so that they can be loaded into the OAI database table for the [[OAI_Provider|UMProvider]].  Depending on your data, these scripts and stylesheets may require some tweaking to work for you.  The stylesheets and scripts can be found in ''$DLXSROOT/bin/o/oai/provider/'' (after release 14).
+
 
 +
This page provides some instruction for converting DLXS collections to Dublin Core (DC) so that they can be loaded into the OAI database table for the [[OAI_Provider|UMProvider]].  Depending on your data, these scripts and stylesheets may require some tweaking to work for you.  The stylesheets and scripts can be found in ''$DLXSROOT/bin/o/oai/provider/''.
===Extract Headers===
===Extract Headers===
Line 9: Line 10:
<pre>
<pre>
# ./ExtractHeaders.pl
# ./ExtractHeaders.pl
 +
USAGE:
 +
-h this usage message
 +
-c the full path of xml file that contains a list of collections to convert (required).
 +
-u update time: only process collections which have been updated since this given time
 +
</pre>
 +
 +
The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.
===ConvertToDc.pl===
===ConvertToDc.pl===
Line 21: Line 29:
-a the full path of the xsl file that does the transformation from text class to articles to dc (required)
-a the full path of the xsl file that does the transformation from text class to articles to dc (required)
-c the full path of xml file that contains a list of collections to convert (required).
-c the full path of xml file that contains a list of collections to convert (required).
-
-d the directory that contains the header XML files to parse (required).
+
-d the directory that contains the header XML files to parse (required)
 +
</pre>
-
The collections xml file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.
+
The collections XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.
-
The headers xml file is assumed to be in the format <RSet>..<HEADER>...</HEADER>...</RSet>.
+
The headers XML file is assumed to be in the format <RSet>..<HEADER>...</HEADER>...</RSet>.
(It will contain extra xpat wrappers between the <RSet> and <HEADER> tags.)
(It will contain extra xpat wrappers between the <RSet> and <HEADER> tags.)
-
Converted files (in DC) are stored in '$DLXSROOT/prep/o/oai/provider'
+
Converted files (in DC) are stored in ''$DLXSROOT/prep/o/oai/provider''
-
Example: ./ConvertToDc.pl -c listOfColls.xml -t textClassToDc.xsl -b bibClassToDc.xsl -a articlesToDc.xsl -d /l1/prep/o/oai/headers/
+
Example:  
 +
<pre>
 +
./ConvertToDc.pl -c listOfColls.xml -t textClassToDc.xsl -b bibClassToDc.xsl -a articlesToDc.xsl -d /l1/prep/o/oai/headers/
</pre>
</pre>
-
The resulting data should end up in '''$DLXSROOT/prep/o/oai/provider/'''. From there you can use the [[OAI_Provider#LoadOai.pl|LoadOai.pl]] script to load that XML data into the OAI tables.
+
The resulting data should end up in ''$DLXSROOT/prep/o/oai/provider/''. From there you can use the [[OAI_Provider#LoadOai.pl|LoadOai.pl]] script to load that XML data into the OAI tables.
 +
<pre>
USAGE:
USAGE:
-h this usage message
-h this usage message
-c the full path of xml file that contains a list of collections to convert (required).
-c the full path of xml file that contains a list of collections to convert (required).
-u update time: only process collections which have been updated since this given time  
-u update time: only process collections which have been updated since this given time  
-
 
-
The xml file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>
 
</pre>
</pre>
-
===XSLT===
+
The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.
 +
===XSLT===
The XSLT stylesheets that we have as examples to use are:
The XSLT stylesheets that we have as examples to use are:
* textClassToDc.xsl - Stylesheet that does the XML transformation from Text Class to DC
* textClassToDc.xsl - Stylesheet that does the XML transformation from Text Class to DC
-
* bibClassToDc.xsl  - Stylesheet that does the XML transformation from Bib Class to DC. This is used to static collections. It takes the series title and puts in a dc:source tag.
+
* bibClassToDc.xsl  - Stylesheet that does the XML transformation from Bibliographic Class to DC. This is used for static collections. It takes the series title and puts in a dc:source tag.
* articlesToDc.xsl  - Stylesheet that does XML transformation from Text Class serial collections to DC.
* articlesToDc.xsl  - Stylesheet that does XML transformation from Text Class serial collections to DC.
 +
 +
</div>

Current revision

Main Page > Ancillary Resources > OAI Provider > DLXS to DC

Contents

[edit] Converting DLXS Data to DC

This page provides some instruction for converting DLXS collections to Dublin Core (DC) so that they can be loaded into the OAI database table for the UMProvider. Depending on your data, these scripts and stylesheets may require some tweaking to work for you. The stylesheets and scripts can be found in $DLXSROOT/bin/o/oai/provider/.

[edit] Extract Headers

ExtractHeaders.pl will extract the headers from XPAT for given collections and places the extracted data in $DLXSROOT/bin/o/oai/headres/.

# ./ExtractHeaders.pl
USAGE:
	-h this usage message
	-c the full path of xml file that contains a list of collections to convert (required).
	-u update time: only process collections which have been updated since this given time 

The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.

[edit] ConvertToDc.pl

The ConvertToDc.pl script takes a XSLT file, the XML file containing the data and a XML file containing the list of collections that should be converted.

# ./ConvertToDc.pl
USAGE:
	-h this usage message
	-t the full path of the xsl file that does the Text class to dc transformation (required)
	-b the full path of the xsl file that does the Bib class to dc transformation (required)
	-a the full path of the xsl file that does the transformation from text class to articles to dc (required)
	-c the full path of xml file that contains a list of collections to convert (required).
	-d the directory that contains the header XML files to parse (required)

The collections XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>. The headers XML file is assumed to be in the format <RSet>..<HEADER>...</HEADER>...</RSet>. (It will contain extra xpat wrappers between the <RSet> and <HEADER> tags.) Converted files (in DC) are stored in $DLXSROOT/prep/o/oai/provider

Example:

./ConvertToDc.pl -c listOfColls.xml -t textClassToDc.xsl -b bibClassToDc.xsl -a articlesToDc.xsl -d /l1/prep/o/oai/headers/

The resulting data should end up in $DLXSROOT/prep/o/oai/provider/. From there you can use the LoadOai.pl script to load that XML data into the OAI tables.

USAGE:
	-h this usage message
	-c the full path of xml file that contains a list of collections to convert (required).
	-u update time: only process collections which have been updated since this given time 

The XML file is assumed to be in the format <COLLS><COLL>collectionName</COLL></COLLS>.

[edit] XSLT

The XSLT stylesheets that we have as examples to use are:

  • textClassToDc.xsl - Stylesheet that does the XML transformation from Text Class to DC
  • bibClassToDc.xsl - Stylesheet that does the XML transformation from Bibliographic Class to DC. This is used for static collections. It takes the series title and puts in a dc:source tag.
  • articlesToDc.xsl - Stylesheet that does XML transformation from Text Class serial collections to DC.

</div>

Personal tools