ResumptionTokens

From oaibp

Jump to: navigation, search

Main Page >> Data Provider Implementations

[edit] Resumption Tokens

[edit] Protocol Definition

Resumption tokens are an option within the OAI protocol that allows data providers to institute a measure of flow control. Resumption tokens allow data providers to chunk responses to list requests, specifically the ListRecords, ListIdentifiers, and ListSets requests. When an OAI repository receives a list request, it can respond with an incomplete list which includes a resumptionToken. The harvester then requests the next chunk of the list by including the resumptionToken in its next request. The repository, upon receiving the request containing a resumptionToken, must provide the next chunk of the response, as indicated by the resumptionToken. This sequence (the incomplete list sequence) is continued until the repository sends the last incomplete list response. An empty resumptionToken attribute must be included in order to indicate that the list request is complete.

The entities contained in an incomplete list response (i.e. the OAI records, headers, or set) must be intact and complete. A repository may not issue half of an OAI record in one incomplete list response and the second half in the next.

A further requirement (called idempotency in the OAI protocol document. See also idempotency in the Implementation Guidelines.) is that a repository must be able to accept the same resumptionToken more than once and return the same response (that is, the contents of the chunk are the same). This allows harvesters to recover from lost responses, as they can simply make the same request again. The only time when this may not be the case is when records in the complete list request have been added, modified, or deleted and are thus out of the datestamp range of the initial request. The protocol states that in this case "strict idempotency of the incomplete-list requests using resumptionToken values is not required. Instead, the incomplete list returned in response to a re-issued request must include all records with unchanged datestamps within the range of the initial list request. The incomplete list returned in response to a re-issued request may contain records with datestamps that either moved into or out of the range of the initial request. In cases where there are substantial changes to the repository, it may be appropriate for a repository to return a badResumptionToken error, signaling that the harvester should restart the list request sequence."

The format of the resumptionToken is not specified by the protocol.

A resumptionToken may include an attribute for the expiration date of the resumptionToken (expirationDate), the complete list size (completeListSize), and the number of incomplete lists that have been returned (cursor).

The Open Archives Initiative outlines flow control and use of the resumptionToken in the protocol: http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl and in the Implementation Guidelines: http://www.openarchives.org/OAI/2.0/guidelines-repository.htm#resumptionToken. Repository implementers should be familiar with both of these documents.

Data providers may also be interested in the Flow Control and Load Balancing section of the Implementation Guidelines.

[edit] Best Practices for resumptionTokens

It is a best practice for resumptionTokens to be implemented in any sizeable (over 2 MB) OAI repository. Implementation of resumptionTokens are beneficial for both the data provider (limits the load on its server) and for the service provider (can process responses in reasonable chunks).

The incomplete list size should be reasonable -- ideal response size is probably between .5 to 2 MB -- though this may depend on both the capabilities of both the data provider and the service provider. If responses are too big, they may be difficult to retrieve reliably via HTTP. If responses are too small, a great deal of extra network traffic is required to harvest a repository's records. The number of records returned per resumptionToken will often depend on the size of the metadata, not necessarily the number of records.

A data provider does not need to format a resumptionToken so that it is understandable to a service provider.

A data provider should decide whether or not to allow resumptionTokens to expire (the best practices are neutral on this issue). However, if resumptionToken's do expire it is a best practice to include the expirationDate attribute in the resumptionToken. Expiration dates should be set to allow harvesters adequate time to process the last incomplete list received. The Implementation Guidelines recommend that expiration dates be valid for at least tens of minutes.

It is a best practice to include both the completeListSize and cursor attributes. The completeListSize attribute is often the only place where there is an indication of the total number of records that will be included in the complete response, and thus is a useful indicator of the size of the OAI repository. (This information might also be recorded in the Identify response as well as set descriptions.)

resumptionTokens in the response should not be URL encoded. This is different from an OAI request, in which resumptionTokens MUST be URL encoded. It is a best practice not to use characters in resumptionTokens that require URL encoding.

It is a best practice to issue a badResumptionToken to stop the sequence of requests if a repository cannot continue to complete a request. A harvester is expected to start again from the initial list request. This might occur for a variety of reasons including if significant changes, additions, and/or deletions have been made to a repository that would affect the idempotency of the resumptionToken. It is a best practice to include human readable text that explains the badResumptionToken response (for example: The resumptionToken has expired).

This is an example of a resumptionToken from the http://hal.ccsd.cnrs.fr/oai/oai.php repository:

    <resumptionToken expirationDate="2005-07-26T16:57:24Z" completeListSize="31979"     
    cursor="4">lr42e519f4d1e58</resumptionToken>

This resumptionToken indicates when it will expire, how many incomplete lists have been returned, and what the complete number of records is for the ListRecords request. As stated above it is a best practice to include both these attributes in a resumptionToken.

Personal tools