Transforming bibliographic files into BibClass

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (15:24, 14 September 2007) (edit) (undo)
 
Line 1: Line 1:
-
Please note that for most classes in DLXS, DLXS does not formally support transformations from non-DLXS formats to DLXS XML/SGML. The following instructions and programs are provided as guides or aids only.
 
-
To date, DLPS has received and processed bibliographic information in a variety of formats, including USMARC records from our NOTIS catalog, SGML from Chadwyck-Healey, bibliographic information in our own generic TEI-like SGML (encoded for the Text Class), and a variety of local database schema from applications like FMPro and MS-Access. The three Perl programs linked below are representative both for their ability to transform from one of these to BibClass, and for their degree of "polish." They are provided as freely available aids for those implementing BibClass and doing similar work.
 
-
 
-
* [[gums2bib.pl]]: This very rudimentary program transformed data encoded in Text Class's old "grand unified markup scheme" or gums (tongue-in-cheek), a TEI-derived DTD, to BibClass's [[Bib.dtd|DTD]]. However, it should be noted that, wherever possible, we relied on data coming from USMARC records for both the encoded information in the gums.dtd/textclass.dtd and for bibliographic information in BibClass, so this program is reserved for exceptions -- bibliographic data found only in the online text. To use it, ensure that the path to perl is correct and issue the <tt>gums2bib.pl</tt> command, specifying input file and output file, as in:
 
-
<tt>./gums2bib.pl my-texts.sgml &gt; my-bib.sgml</tt>
 
-
* [[marc2bib.pl]]: This much more thoughtful program (written primarily by Beth Kirschner) derived bibliographic information from NOTIS records in the USMARC format and produced output in BibClass's bib.dtd. We often used it in conjunction with something called [[marc_split.pl]], which divides a file of NOTIS-generated records into individual records named with the NOTIS record identifier and the ".marc" extension.
 
-
The program will look for USMARC records in a file called <tt>records.marc</tt>, or alternatively in a file or files identified on the command line. It will produce a collection of individual files with the <tt>.bib</tt> extension, each named with the NOTIS record identifier or key and will put those output files in a directory called <tt>sgmlout</tt>. Thus, <tt>marc2bib.pl</tt> by itself, with a file called <tt>records.marc</tt> containing the NOTIS keys foo, bar, and foobar, will result in <tt>sgmlout/foo.bib</tt>, <tt>sgmlout/bar.bib</tt>, and <tt>sgmlout/foobar.bib</tt>. Similarly, <tt>marc2bib.pl</tt> with the command line argument <tt>marc/*.marc</tt> with the NOTIS IDs <tt>foo.marc</tt>, <tt>bar.marc</tt>, and <tt>foobar.marc</tt>, will also result in (or overwrite) <tt>sgmlout/foo.bib</tt>, <tt>sgmlout/bar.bib</tt>, and <tt>sgmlout/foobar.bib</tt>.
 
-
* [[fixtimes.pl]]: This program, written by Phil Farber, was extremely useful in processing Chadwyck-Healey's SGML from three bibliographic collections. The ''Historical Index to the New York Times, 1851-1922'', the ''Official Index to the Times (London), 1906-1980'', and ''Palmer's Index to the Times, 1790-1905'' were all encoded using roughly the same Chadwyck-Healey DTD. The data were transformed, primarily by fixtimes.pl, into the Bibliographic Class's bib.dtd. To use this program, ensure that the path to perl is correct in the program, and then issue the command <tt>fixtimes.pl</tt> with the arguments <tt>input-file</tt> and <tt>output-file</tt>, as in:
 
-
<tt>./fixtimes.pl times.sgm &gt; newtimes.sgm</tt>
 
-
* [[moa-bibclass.xsl]]: This is an XSL transform written to map MOA TextClass data into BibClass. It only works on the MOA headers (originally TEI/MARC). It's used in conjunction with saxon:
 
-
<tt>saxon moa-headers ../moa-bibclass.xsl moa-bib.xml</tt>
 

Current revision

Personal tools