Loading Image Files

From DLXS Documentation

(Difference between revisions)
Jump to: navigation, search
(Keeping Old Files)
(More about Media File Storage)
Line 379: Line 379:
it recursively parses directories to find all media files.</p>
it recursively parses directories to find all media files.</p>
</ul>
</ul>
- 
==Outgoing Files==
==Outgoing Files==

Revision as of 20:53, 14 August 2007

Main Page > Mounting Collections: Class-specific Steps > Mounting an Image Class Collection > Loading Image Files


Contents

Quick Start

If you are migrating from a previous version of DLXS that used imageprep rather than mediaprep, please read this entire document, especially Migration before running mediaprep.

Image Class supports JPEG2000, MrSID, JPEG, GIF, PNG, Flash, MP3, and WAV, and PDF media formats by default. Support for other formats can be added.

Place media files (filenames must be unique within the collection) on production web server following the convention below. The "incoming" directory is optional. If you use the incoming directory, files will automatically be moved out and into a directory named by year and month (YYYYMM). The "incoming" directory is required if adding files incrementally without reprocessing all media files (see next section).

JPEG2000 with thumbnail:
$DLXSDATAROOT/img/c/coll/[incoming]/.../jp2/filename.jp2
$DLXSDATAROOT/img/c/coll/[incoming]/.../thumbjp2/filename.jpg

MrSID with thumbnail:
$DLXSDATAROOT/img/c/coll/[incoming]/.../sid/filename.sid
$DLXSDATAROOT/img/c/coll/[incoming]/.../thumb/filename.jpg

JPEG with thumbnail:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.jpg
$DLXSDATAROOT/img/c/coll/[incoming]/.../thumb/filename.jpg

PDF:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.pdf

MOV:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.mov

RM:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.rm

MP3:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.mp3

WAV:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.wav

Flash:
$DLXSDATAROOT/img/c/coll/[incoming]/.../filename.swf

JPEG, GIF, and PNG formats are supported equally. Anywhere you see ".jpg" above, you can substitute ".gif" or ".png", and in any combination.

Please refer to the Image File Naming document for the more information about filenaming.

Running the "mediaprep" Script

With your media files in place run the mediaprep script like this...

  $DLXSROOT/bin/i/image/mediaprep c=[collid]

mediaprep analyzes all media files and registers findings in a DLXS database table called ImageClassMediaFiles. It also creates a directory of symlinks called "index". Together, the table ImageClassMediaFiles and the "index" directory are used in the processes of loading metadata records and providing web access to the images.

mediaprep should be run before the loading of metadata records.


Migration

Migration from a previous version of DLXS that did not use mediaprep requires an understanding of the changes that have occurred. Note: DLXS 12a was the last version to use imageprep.

mediaprep differs from the its predecessor imageprep in the following ways:

  • Information about media files is stored in a MySQL table called ImageClassMediaFiles, not in text files.
  • The index directory, that in the past held the text files and symlinks to thumbnails, now only holds symlinks. The text files are obsolete.
  • The index directory is only needed on web servers. It is not needed, for example, on a machine used just for loading Image Class data records. DLXS 12a and earlier required the index directory be transferred to the data preparation server, if different than the web server.
  • mediaprep recognizes duplicate files and uses the newest. Additionally, older copies are moved to an outgoing directory by default, which differs from how imageprep worked.
  • New directories/files can be placed in an incoming directory and mediaprep will move them to the prescribed directory structure based on the current date. Preexisting files may remain where they are.
  • Some steps that happened automatically in the past are now optional, and not executed by default. For example, the checking of file permissions is now optional.
  • mediaprep deletes empty directories it finds.

mediaprep resembles its predecessor imageprep in the following ways:

  • mediaprep uses the imageprep.cfg and localimageprep.cfg files for configuration.
  • mediaprep is essential for preparing images and other media for use with Image Class.

If during migration your old and new versions of ImageClass need to share the same img directory, the following is the recommended migration path:

  • Run mediaprep using the <a href="#hold">hold=1</a> and <a href="#keepold">keepold=1</a> options. This allows both DLXS 12a Image Class and DLXS 13 Image Class to function simultaneously.
  • Complete the migration to the DLXS 13 Image Class middleware, including code customizations, etc.
  • Run mediaprep again without the <a href="#indexonly">indexonly=1</a> option, but not the hold and keepold options.
  • If necessary, it is OK to run imageprep. For example, if you forget to use the hold=1 option when running mediaprep, you can compensate by running imageprep as you have in the past.
  • In the future, just run mediaprep, most likely without hold, keepold, or indexonly.


Thumbnails and the Web Server Document Root

Fast delivery of thumbnails to the browser requires the thumbnail image files to be under the web server's document root directory. To achieve this, add a symlink named "thumb" in the collection's web directory, and have it point to the "index" directory.

You may have to create a web directory for the collection if one does not already exist.

  cd $DLXSROOT/web/c/coll
  ln -s $DLXSDATAROOT/img/c/coll/index thumb

Remember, "c/coll" should be replaced with the actual collid and inital letter.

Adding Media Files

To incrementally add media files without reprocessing the entire set, create an "incoming" directory and place new files in it, then run mediaprep with the "incoming" command line parameter set.

mkdir $DLXSDATAROOT/img/c/coll/incoming.

Place files in incoming (as shown above in Quick Start).

$DLXSROOT/bin/i/image/mediaprep c=collid incoming=1

About the Index Directory, and Options

When mediaprep runs, it creates a directory called "indexprep" and places within it relative symlinks to the thumbnail images. When done creating symlinks, the "indexprep" directory is renamed to "index", automatically. If necessary, an existing "index" directory is first renamed "index-old" and then deleted after everything else is done.

"Holding" an Index Directory

Optionally, mediaprep can create an "indexprep" directory, but refrain from renaming it as "index". Rarely useful, but a handy trick if transitioning from DLXS 12 or 12a to 13 because 13 can use an index directory created by 12a.

e.g.,

  $DLXSROOT/bin/i/image/mediaprep c=collid hold=1

Keeping Old Files

mediaprep by default moves old duplicate files to the outgoing directory, creating the directory if necessary. Keeping old files can be useful when migrating to DLXS 13, because it allows for minor differences in the way duplicate files are handled. If mediaprep were allowed to shift duplicates to the outgoing directory, it might cause the index directory created by the older imageprep program to become disconnected with some of the media files.

e.g.,

  $DLXSROOT/bin/i/image/mediaprep c=collid keepold=1

Purge Existing ImageClassMediaFiles Rows

mediaprep by default uses the existing rows in the ImageClassMediaFiles table, making relative modifications as necessary. Sometimes it is valuable to purge the existing rows for a collection from the table first. The purge=1 command line option does just that before proceeding with trest of the operation.

The primary advantage of purge is that it gives you a clean slate which is essential when using the <a href="#sidjp2">sidjp2</a> equivalency option.

The only real disadvantage to using purge is that none of the images of the collection will not be accessible to users during the run of mediaprep. If the collection is small, this hardly matters. If the collection is large, and it takes a couple of hours for mediaprep to run, then it may be worth considering.

e.g.,

  $DLXSROOT/bin/i/image/mediaprep c=collid purge=1

Replacing MrSID files with JPEG2000, or Vice-Versa

MrSID and JPEG2000 files are fairly interchangeable in Image Class. JPEG2000 has become preferred in recent years, and you may be wanting to replace MrSID files with JPEG2000 (we recommend it, in fact.) If the files are named the same with the exception of the filename extension, mediaprep can replace MrSID files with JPEG2000. The opposite is also true. MrSID files can replace JPEG2000 files. The determining factor is the timestamp of the file. The newest file is always used. The sidjp2=1 command line option requires the <a href="#purge">purge</a> option be used as well. <p>e.g.,

  $DLXSROOT/bin/i/image/mediaprep c=collid purge=1 sidjp2=1


Make An Index Directory Only (Without Updating the Database)

A collection needs an index directory on every server running the Image Class middleware. Chances are this includes your production server, and your development server (if you have one). The actual media files only need to be on one server, since the Collmgr "mediahost" and "devmediahost" fields can be used to tell Image Class which server to hit for image delivery. Once mediaprep has been run against the media files, generating an index directory on other servers is easy (as long as DLXS is installed).

   $DLXSROOT/bin/i/image/mediaprep c=collid indexonly=1 [hold=1]

The index directory is generated by reading rows from the ImageClassMediaFiles table and creating symlinks from the table row information.

It is no longer necessary to manually tar and transfer the index directory to other servers. And it is no longer necessary to have an index directory in machines used only for dataprep as long as the machine is not as an Image Class web server.


Checking File Permissions

Optionally, mediaprep will check the permissions of media files and attempt to set permissions properly for Image Class if necessary. Permissions are not checked by default. Use the checkperms command line parameter to turn this option on.

The following command does the normal mediaprep operation, plus checks file permissions.

  $DLXSROOT/bin/i/image/mediaprep c=collid checkperms=1

Image files need to be readable by the web server, which often runs as user "nobody".

Image directories should be 775 and image files should be 664. The "mediaprep" program will attempt to properly set all permissions for all image directories and files. However, if the user excecuting "imageprep" does not have the necessary permissions to change the mode of the directories and files, the "imageprep" program will not be successful in its attempt, and will generate a message reporting this is the case.

Alternatively, the chmod command can be used to set permissions. There are ways to modify files in batch with UNIX commands, but this topic is beyond the scope of this document.

   chmod 775 $DLXSDATAROOT/img/m/musart/sid
   chmod 664 abc.sid


Filenaming

Please refer to the Image File Naming document for the details.

It is best/easiest if filenames are unique within a collection. If they are not, the subdirectory path can be used to force uniqueness. To enable this function, edit the file at $DLXSROOT/bin/i/image/localimageprep.cfg to add the following code to the "COLL SPECIFIC OVERRIDES/ADDITIONS" section

if ($coll eq 'collid')
  {
    $gLoadedName = 'loaded';
  }

Be sure to replace "collid" with the collection's ID!

Use of "loaded" requires that filenames in the data records also be a concatenation of the filename with the parent directory path. After running mediaprep look within the $DLXSDATAROOT/img/c/coll/index directory at the names of the .inf files. The media filename should be the same as the .inf file, except with the .inf exetension changed to the appropriate media file extension.


More about Media File Storage

Media files must be stored using the following collection level directory convention:

  • $DLXSDATAROOT/img/c/collid
  • collid is the unique alphabetic abbreviation of the collection name.

For example, the unique collid for the collection "French Architecture" is "sampleic", in which case the image file directory is...

$DLXSDATAROOT/img/s/sampleic

The software assumes that every image has a thumbnail image and a larger display image. It is not a requirement, and a thumbnail is not expected for media types other than still images.

Within the collection directory, images should be stored in the following locations.

  • Thumbnail images corresponding with large JPEG2000 images must be in a directory or directories named 'thumbjp2'.
  • All other thumbnail images must be in a directory or directories named "thumb".
  • All JPEG2000 file must be in a directory or directories named "jp2".
  • All SID format images may be anywhere in the collection directory.
  • All JPEG, non-thumbnail, larger, images may be anywhere in the collection directory, though not in the "index" directory.
  • Image files may be within multiple levels of sub-directories.
  • The directory name "index" is reserved for building an index directory of symlinks for all images for the collection. Do not use "index" for storing actual image files. The index directory is created automatically.
  • To clarify with an example, the Image Class middleware assumes that all JPEG files in directories named "thumbjp2" and within the $DLXSDATAROOT/img/c/collid structure are thumbnails to be displayed in association with JPEG2000 images (stored in "jp2" directories. All JPEG images not in thumb directories are assumed to be single resolution images for large display.

    Example:

    The "sampleic" collection can again be used as a simple example. $DLXSDATAROOT/img/s/sampleic has the following directories...

       drwxrwxr-x 4 jweise dlps 512 Feb 15 14:37 index
    
       drwxrwxr-x 2 jweise dlps 2048 Feb 15 13:30 sid
       drwxrwxr-x 2 jweise dlps 2048 Jun 8 1998 thumb
    

    $DLXSDATAROOT/img/s/sampleic/sid contains all of the SID format, multiple resolution, files for the collection.

    $DLXSDATAROOT/img/s/sampleic/thumb contains all of the JPEG format, thumbnail size, files for the collection.

    The sampleic collection does not have any large JPEG files since it relies on SID files for large display.

    The flexibility of structure within the collection specific image directory is intentional and supports the variety of directory structures that are typically encountered.

    For example, at Michigan, we find it useful to load the image files on the production server in a structure that reflects the CD that the master image files are stored on. A single collection can easily have dozens of CDs worth of master image data (typically TIFF format files). In the process of generating SID and JPEG files, we maintain at least the name of the CD in the name of the directory that the SID and JPEG images are kept in.

    • Master CDs:;
      • CD0001
        • image001.TIF
          image002.TIF
          ...
          image050.TIF

        ...

        CD0005

        • image201.TIF
          image202.TIF
          ...
          image250.TIF

      Image directories (that are loaded to the production server in the $DLXSDATAROOT/img/c/collid directory)

      • CD0001
        • sid
          • image001.SID
            image002.SID
            ...
            image050.SID

          thumb

          • image001.JPG
            image002.JPG
            ...
            image050.JPG

        ...

        CD0005

        • sid
          • image201.SID
            image202.SID
            ...
            image250.SID

          thumb

          • image201.JPG
            image202.JPG
            ...
            image250.JP

    It is fine for the images to be loaded in this type of hierarchical structure. When the "mediaprep" program creates the index directory, it recursively parses directories to find all media files.

Outgoing Files

You may find that media files loaded to Image Class for various reasons get loaded more than once over time. For example, maybe the JPEG2000 files need to be regenerated at a higher quality and reloaded. Whatever the reason, mediaprep helps to manage the reload process by moving the files being replaced to a directory called "outgoing" located at...

$DLXSDATAROOT/img/c/collid/outgoing

mediaprep checks filenames for duplicates and in the case of duplicates keeps the newest file and moves older copies to outgoing. Optionally, you can tell it not to move files to outgoing with the <a href="#keepold">keepold=1</a> command line option, in which case the newest file will still be used, but the others will be left alone.

When moving files to outgoing, mediaprep also tries to move corresponding thumbnail images and text files. It is generally successful at this.

Another option, <a href="#sidjp2">sidjp2=1</a> causes JPEG2000 and MrSID files to be treated equally in terms of replacement. That is, a JPEG2000 can cause a MrSID file to be pushed to the "outgoing" directory, and vice versa! The file with the newest timestamp wins, as always. This is a useful feature if replacing all MrSID files with JPEG2000 files as we are at Michgian, but it could also be used if going the other direction (replacing all JPEG2000s with MrSIDs). It is important to realize that the sidjp2 option must be used in conjunction with the <a href="#purge">purge</a> option otherwise comparison of MrSIDs to JPEG2000s could not be done accurately because there might be multiple rows already in the database table.

Files inadvertantly moved to outgoing by mediaprep may be moved back to their original location, though think carefully about what the implications might be before doing so. The process involves tarring up the relevant directories in outgoing and untarring them at the root of the old location. Proceed with caution.


Advanced Customization

Image Class can be configured to support a wide variety of directory and filenaming conventions. This configuration is done in the $DLXSROOT/bin/i/image/localimageprep.cfg file. Copy the %gTypeHash, and @gTypeComarisonOrder definitions from imageprep.cfg to localimageprep.cfg and modify it to support local conventions

It is even possible to add support for multiple image sizes for a single image in a non-MrSID format such as JPEG. That is, you could have small, medium, and large JPEG images for a single item and have them all be available to the user in the interface. Michigan has done this for the APIS (Papyrus) collection where there is a mix of MrSID and JPEGs at multiple sizes (Though the examples are few and far between. In fact I can't find one at the moment.) Zooming is not possible without MrSID, but the user may select from the multiple sizes.

Below is the %gTypeHash Michigan added to localimageprep.cfg to handle the APIS collection. The hash holds for each type of image an array of regular expressions used to match image files. For APIS, thumbnail images must either be JPEG files in a "thumb" directory or GIF files with "-tn" preceding the extension. SID images are also allowed and can reside anywhere. JPEGs that aren't thumbnails are assumed to be large image files and are given the label "1200", which is a somewhat arbitrary estimation of the maximum pixel dimension of the file. JPEGs with "-50" preceding the extension are labelled as large JPEGs with maximum pixel dimensions of 600. It does not matter too much what the labels are as long as they cause the images to sort properly by size in the user interface.

The @gTypeComparison order array is important because it specifies an order of precedence for identifying images. In this case "thumb" is checked first, and if the filename matches one of the "thumb" regular expressions, it is not tested against the other types (i.e,. sid, 600, 1200).


if ($coll eq 'apis')
  {
    %gTypeHash =
      (
       'image:::dynamic:-:thumb' => [
                                     '/thumb/([^/]+)\\.(jpg)',
                                     '/thumb/([^/]+)\\.(JPG)',
                                    ],
       'image:::fixed:-:thumb' => [
                                   '/([^/]+)-tn\\.(gif)',
                                   '/([^/]+)-tn\\.(GIF)',
                                  ],

       'image:::dynamic:-:sid' =>   ['/([^/]+)\\.(sid)', '/([^/]+)\\.(SID)'],
       'image:::fixed:-:600' =>   ['/([^/]+)-50\\.(jpg)', '/([^/]+)-50\\.(JPG)'],
       'image:::fixed:-:1200' =>  ['/([^/]+)\\.(jpg)', '/([^/]+)\\.(JPG)'],
      );
    @gTypeComparisonOrder = ('image:::dynamic:-:thumb','image:::fixed:-:thumb', '
image:::fixed:-:600','image:::fixed:-:1200','image:::dynamic:-:sid');
  }

Personal tools