DLPS Development Environment

From DLXS Documentation

Revision as of 20:30, 24 July 2007 by Cboulay (Talk | contribs)
Jump to: navigation, search

This document describes various aspects of the development environment used at DLPS. There may be, for example some reasonable assumptions about the specific DLPS development environment in parts of our middleware (consulting certain environment variables, etc.). However, the DLXS system does not absolutely rely on most of these in order to work, i.e. in a more "plain vanilla" environment that might be set up by a DLXS site. Nontheless, there are valuable mechanisms to support development built into DLXS you may wish to take advantage of.

Source Control

We use CVS to handle source control, merging of multiple developers' code, tagging specific configurations of many files, releasing particular tagged versions of files, etc. The CVS source repository stands between each developer's work space and what we call the release directory structure. Each developer's work space is volatile and is expected to change frequently.

Completed work is committed to the CVS repository from the developer's workspace by issuing the cvs commit command. The release directory structure is updated via the cvs update command which pulls the committed source out of the CVS repository and into the release directory structure.

The release directory structure is semi-stable and does not change as frequently. We never modify files in the release directory structure directly. We always change files in the work directory, commit to CVS, and update into the release directory structure.

The release directory structure is an integration test area used to determine if changes from multiple developers work together. Finally we copy the release directory structure to one of our production machines after it has been tested. The production directory structure is the most stable environment.

In the absence of CVS you will need to copy files from your work directories to the release directories manually or with the aid of scripts. Of course, without CVS or a similar source control system, you will not have file versions, merging, etc.

Support for Multiple Virtual Host-based Work Directories

At DLPS, we have separate work directories that mirror most of the release/production directory structure. This allows us to have multiple developers working on code, making changes to collection databases, etc. at the same time.

We use a development model designed to support a workspace (directory structure and environment) for each developer separate from the release directory structure. As of Release #9, this structure will supercede the work directory model that used the special URL syntax in previous DLXS releases. In the virtual-host-based model, a virtual host is configured on the web server for each developer.

Here is an example Apache virtual host configuration supporting a workspace for smith:


<VirtualHost 123.456.789.123:80>
  ServerName smith.dev.umdl.umich.edu
  ServerAlias smith smith.dev smith.dev.umdl

  DocumentRoot          /l1/dev/smith/web
  ScriptAlias   /cgi/   /l1/dev/smith/cgi/

  SetEnv DLXSROOT       /l1/dev/smith
  SetEnv DLXSDATAROOT   /l1
  SetEnv DLPS_DEV       smith
  SetEnv REMOTE_USER    smith
</VirtualHost>

Note that the DocumentRoot and ScriptAlias point to /l1/dev/smith under which is smith's set of development directories, in particular cgi and web.

Each developer has a domain, e.g., smith.dev.umdl.umich.edu. Each virtual host sets a unique DLXSROOT environment variable value that points to a complete DLXS directory structure. For example, smith has a DLXSROOT value of /l1/dev/smith. Into this directory we check out the CVS repository. This gives each developer a mirror of the entire CVS development directory structure.

To run smith's development copy of the middleware the basic URL (e.g. TextClass) would be: http://smith.dev.umdl.umich.edu/cgi/t/text/text-idx

DLPS_DEV is the environment variable that signals to the middleware that it is running in development mode. The virtual host should set DLPS_DEV to the user ID that identifies smith to the system, e.g smith. At DLPS we use the developer's uniqname. If you maintain a "release" directory tree (separate from each developers' directory tree) where code from multiple developers is brought together for integration set DLPS_DEV to "1" for the virtual host that runs middleware from this area.

When running in development mode (the DLPS_DEV environment variable is set) the middleware supports a number of mechanisms to assist debugging and general development:

  • When there is an assertion failure a full stack traceback is displayed instead of just the associated message for consumption by the general public
  • The developer's copy of the database rows are read instead of the production copy. Note: the "release" (in the above sense) copy of the database rows are read if DLPS_DEV=1 rather than, for example, DLPS_DEV=smith.
    This mechanism is a part of a facility in collmgr which gives each developer a set of collection database rows. This allows each developer to experiment with database changes without affecting the rows in use by the released middleware. Additional information on this mechanism is available in the collmgr documentation.
  • The perl "use strict" pragma is enabled to support additional error checking for such things as references to undeclared variables, for example.
  • It is often convienient for the developer to control exactly which collections are authorized rather than relying on the Authentication/Authorization system to supply this information. This information is transmitted to the DLXS middleware through the AUTHZD_COLL environment variable. If this variable is not set (as might be the case for an individual developer's virtual host) the value of this variable is read from a file named AUTHZD_COLL stored in DLXSROOT/cgi/c/class, e.g. /l1/dev/smith/cgi/t/text/AUTHZD_COLL. The collection IDs that the developer wants to be authorized are listed one per line is this file and the middleware will populate the AUTHZD_COLL environment variable on its own from this data.

The DLXSDATAROOT environment variable allows the data to be under a different directory root than the middleware programs. We created DLXSDATAROOT because data and indexes are too large to duplicate for each developer and are not generally CVS source controlled. Each virtual host sets this variable to point to the root of the release directory structure (outside of the developer's workspace) that contains the obj and idx directories where data and indexes reside. DLXSDATAROOT is optional. If it is set, the middleware favors it over DLXSROOT when accessing data and indexes. If it is not set, the middleware will use DLXSROOT to locate the data. In that event, you would need a copy of the data directories and their content in your development space.

The REMOTE_USER environment variable is set for convenience to mimic the action of the authorization system which is absent on the development machines. The middleware determines whether the user is logged in by testing REMOTE_USER for a value.