Transforming bibliographic files into BibClass

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search

Revision as of 11:49, 9 September 2007

Please note that for most classes in DLXS, DLXS does not formally support transformations from non-DLXS formats to DLXS XML/SGML. The following instructions and programs are provided as guides or aids only.

To date, DLPS has received and processed bibliographic information in a variety of formats, including USMARC records from our NOTIS catalog, SGML from Chadwyck-Healey, bibliographic information in our own generic TEI-like SGML (encoded for the Text Class), and a variety of local database schema from applications like FMPro and MS-Access. The three Perl programs linked below are representative both for their ability to transform from one of these to BibClass, and for their degree of "polish." They are provided as freely available aids for those implementing BibClass and doing similar work.

  • gums2bib.pl: This very rudimentary program transformed data encoded in Text Class's old "grand unified markup scheme" or gums (tongue-in-cheek), a TEI-derived DTD, to BibClass's DTD. However, it should be noted that, wherever possible, we relied on data coming from USMARC records for both the encoded information in the gums.dtd/textclass.dtd and for bibliographic information in BibClass, so this program is reserved for exceptions -- bibliographic data found only in the online text. To use it, ensure that the path to perl is correct and issue the gums2bib.pl command, specifying input file and output file, as in:

./gums2bib.pl my-texts.sgml > my-bib.sgml

  • marc2bib.pl: This much more thoughtful program (written primarily by Beth Kirschner) derived bibliographic information from NOTIS records in the USMARC format and produced output in BibClass's bib.dtd. We often used it in conjunction with something called marc_split.pl, which divides a file of NOTIS-generated records into individual records named with the NOTIS record identifier and the ".marc" extension.

The program will look for USMARC records in a file called records.marc, or alternatively in a file or files identified on the command line. It will produce a collection of individual files with the .bib extension, each named with the NOTIS record identifier or key and will put those output files in a directory called sgmlout. Thus, marc2bib.pl by itself, with a file called records.marc containing the NOTIS keys foo, bar, and foobar, will result in sgmlout/foo.bib, sgmlout/bar.bib, and sgmlout/foobar.bib. Similarly, marc2bib.pl with the command line argument marc/*.marc with the NOTIS IDs foo.marc, bar.marc, and foobar.marc, will also result in (or overwrite) sgmlout/foo.bib, sgmlout/bar.bib, and sgmlout/foobar.bib.

  • fixtimes.pl: This program, written by Phil Farber, was extremely useful in processing Chadwyck-Healey's SGML from three bibliographic collections. The Historical Index to the New York Times, 1851-1922, the Official Index to the Times (London), 1906-1980, and Palmer's Index to the Times, 1790-1905 were all encoded using roughly the same Chadwyck-Healey DTD. The data were transformed, primarily by fixtimes.pl, into the Bibliographic Class's bib.dtd. To use this program, ensure that the path to perl is correct in the program, and then issue the command fixtimes.pl with the arguments input-file and output-file, as in:

./fixtimes.pl times.sgm > newtimes.sgm

Personal tools