NamespacesAndSchemas

From oaibp

Jump to: navigation, search

Main Page >> Shareable Metadata

Contents

[edit] XML Schemas and Namespaces

[edit] Summary of Best Practices

  • Use XML Schemas endorsed by relevant communities.
  • Use XML Namespaces when required to validate XML metadata against a given XML Schema.

As pointed out in General Areas of Competency Needed to be an OAI Data Provider working knowledge of XML, XML namespaces, and XML schemas is fundamental to being an OAI data provider. Following are some basic pieces of information important to providing shareable XML metadata. The NSDL XML FAQ from the NSDL Metadata Primer and the Wikipedia entry on XML are helpful resources for beginners.

[edit] XML Schemas

XML Schemas are a way to indicate the expected structure of XML documents, using a machine-readable grammar; they allow for machine validation of the contents of an XML file.

The OAI protocol requires that every OAI response validate against the OAI-PMH XML Schema at http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd. Thus, service providers can anticipate the format of the information they will harvest, and both data and service providers can automatically validate responses against the XML schema to ensure consistency. Note that the OAI-PMH schema deliberately nests multiple XML schemas: a single OAI ListRecords response uses the OAI-PMH XML schema for the OAI response elements, an XML schema for the metadata format of the records, and additionally, XML schemas for any OAI "about" blocks associated with the records.

To reiterate: every OAI response, including every metadata record you provide, must be XML schema valid. If a particular metadata record has an element not allowed by the XML schema for its metadata, then that particular record will not be valid. If a particular record has a bad value according to the XML schema for its metadata, then that record will not be valid. For example, metadata in the oai_dc format must validate according to the XML schema provided by OAI (at http://www.openarchives.org/OAI/2.0/oai_dc.xsd), which allows only the 15 Simple Dublin Core elements. Additional metadata formats served by an OAI repository must validate to the XML schema indicated both in the ListRecords responses and in the ListMetadataFormats response. More information about this is included below.

When choosing which XML Schema to use for a given metadata format, best practice is to use XML schemas that have been officially vetted by specific communities, governing agencies, etc. A benefit is that the schema will have thoroughly tested for completeness, errors, and compliance with related standards. This is demonstrable through the widespread use of the schema by other users in a given community. See Potential Metadata Formats for Use with OAI for some metadata formats with official XML schemas.

In instances where no officially vetted XML schema exists, providers may opt to generate their own. Of course, the XML schema itself must validate. It is also a requirement for the XML schema creator to make the schema persistent and accessible online so that documents bound to the specification can be effectively validated. Any new version of the schema should replace the older version, at the same location and be backwards compatible whenever possible. The older version should be archived. It is also advisable for schema creators to generate crosswalks that can effectively orient other users to given concepts and data elements extant in related metadata formats.

As far as possible, the data provider shall utilize an existing XML schema. If the data provider needs additional elements, s/he shall develop a schema. This new XML schema should refer to appropriate namespaces when using concepts from existing schemas. For example, if the schema is a profile of the Dublin Core element set with two additional elements, all elements referring to the DC concepts should be labeled with the appropriate DC namespaces.

[edit] XML Schema References

A specific overview of XML namespaces and the use of XML schemas within an OAI environment can be found at the National Science Digital Library's "XML, Namespaces, and Schemas" FAQ:

More information can be found from these general resources:

  1. the XML Schema Primer (http://www.w3.org/TR/xmlschema-0/)
  2. the XML Schema specification (http://www.w3.org/XML/Schema)
  3. the OAForum tutorial (http://www.oaforum.org/tutorial/english/page5.htm#section4)
  4. book: _Definitive XML Schema_ by Priscilla Walmsley. Prentice Hall PTR 2002.

[edit] XML Schema Validation tools

The XML schema validation of an XML document (including an XML schema itself) can be tested with a variety of XML schema tools, including:

  1. the online W3C Schema validator - XSV (http://www.w3.org/2001/03/webdata/xsv W3C Schema validator)
  2. a variety of tools are listed at the XML Schema page (http://www.w3.org/XML/Schema)
  3. Xerces from Apache (http://xml.apache.org/). It has been our experience that Xerces validation is more rigorous and correct than other tools. However, Xerces defaults are validation off, and for DTD validation rather than schema validation so the appropriate options must be selected. Make sure you understand these features.

[edit] XML Namespaces

XML namespaces serve as mechanisms to contextualize or scope information in XML instance documents. For example, in a registrar's office, <pass> in an XML document may mean a student succeeded in a course, while <pass> in an XML document about soccer may mean one player has sent the ball to another player. XML namespaces prevent "name collision" -- when a single name has an ambiguous meaning because it means different things in different contexts. XML namespaces are used to disambiguate XML information, such as XML element or attribute names.

XML namespaces must be a valid URI. However, there is no requirement that a namespace URI be resolvable -- many XML namespace URIs will return nothing if used as URLs in a web browser even though they use the http: URI scheme.

XML namespace declarations assign Uniform Resource Identifiers (URI) to XML namespaces. Let's examine the following XML document:

    <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"  
    xmlns:dc="http://purl.org/dc/elements/1.1/" >  
    <dc:title>NSDL Metadata Primer</dc:title>  
    </oai_dc:dc>  

There are two namespaces and two elements in the example. The two namespaces are declared with the "xmlns:oai_dc" and "xmlns:dc" attributes of the outer element. "xmlns" indicates an XML namespace declaration ("XML NameSpace" = xmlns). The attribute name characters after "xmlns:" are the XML namespace prefix that will be used to indicate this namespace in qualified names in the XML document. The value of the xmlns attribute is the URI for the namespace. Note that the oai_dc namespace URI, http://www.openarchives.org/OAI/2.0/oai_dc/, is a URL, but does not resolve to anything if it is entered into a web browser. However, this URL is clearly in the domain of the Open Archives Initiative, which provides the definition for the XML elements in the namespace. Thus, the namespace is associated with the organization responsible for setting the scope of the namespace, but in this case, it does not resolve to anything. The namespace declaration xmlns:dc="http://purl.org/dc/elements/1.1/" assigns the URI http://purl.org/dc/elements/1.1/ to the "dc" namespace prefix.

XML namespace prefixes are used in a qualified names: the characters before the colon in a qualified name are the element's "namespace prefix," while the characters after the colon are the "local name" for the element. In our example, the outer element's qualified name is "oai_dc:dc" -- it has local name "dc" and is scoped to the namespace URI indicated by the namespace prefix "oai_dc". The inner element's qualified name is "dc:title" -- it has local name "title" and is scoped to the namespace URI indicated by the namespace prefix "dc." Note that the "dc" before the colon in "dc:title" refers to a namespace URI, while the "dc" in "oai_dc:dc", since it is after the colon, is the local name.

XML requires that you have an XML namespace declaration for each namespace prefix you use in your XML. OAI-PMH requires the use of namespaces, and hence their declaration in your served XML. Further, most XSLT engines require strict adherence to XML namespaces.

There can be any number of XML schemas for a single namespace. One of the reasons it is suggested that namespaces be assigned in a domain controlled by the issuing organization is to encourage all schemas written for that format to adhere to the same concept of the format. For example, the Dublin Core Metadata Initiative provides documentation on usage for Simple and for Qualified Dublin Core at http://dublincore.org/documents/usageguide/. They also provide sample XML schemas for each, but organizations may use their own XML schemas for Qualified Dublin Core, as does the National Science Digital Library, for example.

[edit] Default Namespace Declarations

Default namespace declarations have a null namespace prefix. These namespace declarations look like this:

    <dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/" >  
    <title xmlns="http://purl.org/dc/elements/1.1/" >NSDL Metadata Primer</title>  
    </dc>  

Note that the two examples above are semantically the same: they have the same locally named elements, scoped to the same namespace URIs, and containing the same values. The fact that different namespace prefixes are used in the second example is NOT a semantic difference in XML.

If you use the null namespace prefix, OAI-PMH requires that you must have a default XML namespace declaration to indicate the appropriate namespace URI.

[edit] Scope of Namespace Declarations

The second example above declares a second default namespace declaration on the <title> element. This illustrates that XML namespace declarations have a scope within the XML document: each XML namespace declaration (default or not) pertains to the element in which it is declared, and all that element's children unless it is superceded by a namespace declaration for the same prefix in one of its descendants. In our second example, the default namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/ for the outer element, but the <title> element over-rides the default namespace, declaring it to be URI "http://purl.org/dc/elements/1.1/ -- which is true for the <title> element and all of the <title> element's children. However, the closing tag </dc> is again using the default namespace URI of http://www.openarchives.org/OAI/2.0/oai_dc/ because we are no longer in the <title> child element or any of its descendants. Thus, the </dc> tag is correctly parsed as the closing tag for the first <dc> tag.

[edit] XML Namespace References

A specific overview of XML namespaces and the use of XML schemas within an OAI environment can be found at the National Science Digital Library's "XML, Namespaces, and Schemas" FAQ:

More information can be found from these general resources:

  1. xml.com's "XML Namespaces by Example" (http://www.xml.com/pub/a/1999/01/namespaces.html)
  2. Ronald Bourret's XML Namespaces FAQ (http://www.rpbourret.com/xml/NamespacesFAQ.htm)
  3. Wikipedia entry on XML Namespaces (http://en.wikipedia.org/wiki/XML_namespace)
  4. World Wide Web Consortium's "Namespaces in XML" (http://www.w3.org/TR/REC-xml-names/)
  5. Jenni Tennison's "Handling Namespaces" for XSLT (http://www.jenitennison.com/xslt/namespaces.html)
  6. XML Namespaces by James Clark (http://www.jclark.com/xml/xmlns.htm)

[edit] Binding XML Schemas to Namespaces

XML namespaces are additionally used as part of the mechanism to bind XML instance documents to particular XML schemas. Note that there is no XML schema indicated for either of the two examples above.

XML schemas are actual documents, not abstract concepts (such as namespace URIs), so XML schema locations are indicated with URLs which resolve to an actual XML schema document. The location for an XML schema is indicated with the "schemaLocation" attribute, which resides in the schema instance namespace (see http://www.w3.org/TR/xmlschema-0/#ref40). It is conventional to use "xsi" as the namespace prefix for the schema instance namespace, so the qualified name of the attribute is "xsi:schemaLocation". Don't forget to properly declare the namespace URI for the "xsi" prefix with a namespace declaration: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". The value of "xsi:schemaLocation" should be the namespace URI followed by a blank, followed by the URL for the appropriate XML schema for the indicated namespace. The following is one way to correctly indicate the OAI-PMH schema in an OAI response:

   <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/  http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">  
   (xml body)  
   </OAI-PMH>  

Here's an example taken from the OAI protocol document. In this example, the namespace is http://www.openarchives.org/OAI/2.0/oai_dc/, and the XML schema's URL is http://www.openarchives.org/OAI/2.0/oai_dc.xsd

   <oai_dc:dc  
   xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"  
   xmlns:dc="http://purl.org/dc/elements/1.1/"  
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
   xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" >  
   <dc:publisher>Los Alamos arXiv</dc:publisher>  
   <dc:rights>Metadata may be used without restrictions as long as the oai identifier remains attached to it.</dc:rights>  
   </oai_dc:dc>  

As another example, the XML schema at http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc_v1.02.xsd for the namespace "http://ns.nsdl.org/nsdl_dc_v1.02/" would be indicated like this:

   <nsdl_dc:nsdl_dc xmlns:nsdl_dc="http://ns.nsdl.org/nsdl_dc_v1.02/  
    xsi:schemaLocation="http://ns.nsdl.org/nsdl_dc_v1.02/ http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc_v1.02.xsd">  

[edit] How to indicate XML Namespaces and Schemas in OAI Requests and Responses

[edit] OAI metadata prefix

A "namespace prefix" in XML associates a local name with the appropriate namespace declaration and therefore the URI for the namespace scoping the name. For example, <oai_dc:dc> has "oai_dc" as a namespace prefix, which indicates "dc" is scoped to the namespace URI indicated in the namespace declaration "xmlns:oai_dc=..." on the nearest ancestor element. See the NSDL XML FAQ for more information about XML namespaces.

A "metadataPrefix" in OAI-PMH is the string used to uniquely identify a particular metadata format for an OAI repository. "metadataPrefix" is a required argument for ListRecords, GetRecord and ListIdentifiers requests. An OAI repository's mappings from metadataPrefixes to metadata namespace URIs and their XML schemas are exposed via ListMetadataFormats. This is explained in section 3.4 of the OAI-PMH specification http://www.openarchives.org/OAI/openarchivesprotocol.html#MetadataNamespaces.

While the OAI-PMH reserves "oai_dc" as a metadataPrefix, no XML namespace prefixes are dictated in OAI-PMH. In fact, the OAI-PMH metadataPrefix and the XML namespace prefix in the OAI response may differ; however, it is strongly recommended that the same characters be used in both contexts.

[edit] ListMetadataFormats

Every metadata format served by an OAI respository must have a namespace URI and an XML schema. These are exposed with ListMetadataFormats responses. This is explained in section 4.4 of the OAI-PMH specification: http://www.openarchives.org/OAI/openarchivesprotocol.html#ListMetadataFormats. What follows is an example response for an OAI server providing two metadata formats: the required oai_dc, and the NSDL's version of Qualified Dublin Core.

   <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/  http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">  
   <responseDate>2006-02-08T14:27:19Z</responseDate>  
   <request verb="ListMetadataFormats">http://services.nsdl.org:8080/nsdloai/OAI</request>  
   <ListMetadataFormats>  
   <metadataFormat>  
   <metadataPrefix>oai_dc</metadataPrefix>  
   <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>  
   <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>    
   </metadataFormat>  
   <metadataFormat>  
   <metadataPrefix>nsdl_dc</metadataPrefix> 
   <schema>http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc_v1.02.xsd</schema>  
   <metadataNamespace>http://ns.nsdl.org/nsdl_dc_v1.02/</metadataNamespace>  
   </metadataFormat>  
   </ListMetadataFormats>  
   </OAI-PMH>  

The <metadataPrefix> element nested within each <metadataFormat> element must contain the metadataPrefix string used to identify this format in OAI requests. It is strongly recommended that this string be the same as the XML namespace prefix used for the namespace URI in XML metadata records.

The <schema> element nested within each <metadataFormat> must contain the URL for the schema document to be used to validate the XML metadata records, and the <metadataNamespace> element must contain the namespace URI for the metadata format.

Personal tools