Converting New URI Schemes or URN Sub-Schemes to HTTP

David Booth
HP Software
david@dbooth.org
Latest version: http://dbooth.org/2006/urn2http/

The views expressed herein are my own and do not necessarily reflect those of HP.  Comments are invited.

Abstract

New URI schemes or URN sub-schemes are sometimes proposed for resource identification in applications where the HTTP protocol is deemed unsuitable.  This paper argues that URIs based on specialized HTTP URI prefixes would be a better choice in virtually all cases, even if the resource resolution or data transfer properties of HTTP are insufficient for these applications.   A simple recipe is presented for converting proposed URI schemes or URN sub-schemes to HTTP using specialized URI prefixes.  This technique cleanly separates the use of the URI as an identifier (to establish resource identity) from the use of the URI as a locator (to retrieve representations).  The resulting capabilities of the HTTP URIs are virtually a direct superset of those of URIs based on new URI schemes or URN sub-schemes.

Table of Contents

Introductions: A Scenario

A (fictitious) organization, XyzConsortium, representing a group of companies and individuals working in a particular field, xyz, wishes to define a kind of persistent identifier that has resolution characteristics different from HTTP.  It defines a new URI scheme or URN sub-scheme, xyzsheme, and publishes a specification, XyzSpec, that defines:
The resulting URIs are of the form xyzscheme:foo, where xyzscheme:foo identifies a resource r, and foo obeys the syntactic conventions dictated by XyzSpec in addition to obeying the syntactic conventions for URIs in general.   Users of xyzscheme URIs are expected to know and follow the conventions published in XyzSpec, and specialized software is made available for resolving xyzscheme URIs to their associated data d or metadata md1.  Metadata md1 is expressed in a widely understood format that is not specific to the xyz field.  However, data d may be in a format that is specific to the xyz field.

Heavy users of xyzscheme URIs are happy with xyzscheme URIs and do not mind the fact that they must have special software installed to retrieve data d and/or metadata md1.  They often need both the data d and the metadata md1.  However, casual users (or users in more peripheral fields) who do not have the xyzscheme resolution software installed are not so happy.  They are unable to make much use of these URIs without the xyzscheme software, in spite of the fact that many of their applications only need metadata md1 and do not need data d at all.  A few complain to XyzConsortium, but many others quietly forego the benefit that such metadata could have provided to their applications.

To facilitate use by more casual users (and users in related fields), XyzConsortium decides to offer HTTP URIs as synonyms for its existing xyzcheme URIs using the following recipe.

Recipe for Converting to HTTP URIs

Step 1.  XyzConsortium creates a web site, http://xyzpurl.org, for forwarding HTTP requests. In setting up xyzpurl.org, XyzConsortium uses all of the institutional and legal safeguards at its disposal to ensure that the site will continue to exist and faithfully implement its intended purpose for as long as possible.

Step 2.  XyzConsortium publishes a specification, XyzHttpSpec, for interpreting the specialized HTTP URI prefix, "http://xyzpurl.org?".  In particular, it declares that for any URI of the form http://xyzpurl.org?foo:
(This technique of defining an HTTP URI prefix is the same technique used by thing-described-by.org.)

Step 3.  XyzConsortium configures the xyzpurl.org web server such that an HTTP GET on any URI of the form http://xyzpurl.org?foo will be redirected (using an HTTP 303 See Other[1] status code) to another HTTP URI, p, where metadata, md2, may be obtained. (Think of p as a caching proxy for accessing metadata md1 of xyzscheme:foo.) 

Metadata md2 should unambiguously identify[2] resource r that http://xyzpurl.org?foo (or xyzscheme:foo) names, i.e., it should provide sufficient information for users to distinguish resource r from all other resources.  It should be expressed in a widely understood format that does not require xyz-specific software to interpret.  And of course, md2 must be consistent with md1 (and d).  Normally, md1 would be sufficient to meet these requirements.  Furthermore, ideally md2 should include:
Step 4.  XyzConsortium furthermore publishes a declaration stating that: (a) for any URI http://xyzpurl.org?foo conforming to XyzHttpSpec, http://xyzpurl.org?foo identifies the same resource r that xyzscheme:foo identifies; and (b) if for any reason the xyzpurl.org web site no longer functions, or if it fails to faithfully implement the intended purpose of XyzHttpSpec, then any information it serves should be ignored and the meaning of http://xyzpurl.org?foo should be regarded as identical to the meaning of xyzscheme:foo. forevermore.

Variations

Many variations of this basic recipe are possible of course.  Some applications for which new URI schemes or URN sub-schemes are proposed may only have metadata md1 and no data d; other applications may mix metadata md1 and data d

Conflicts in Metadata

If both http://xyzpurl.org?foo and xyzscheme:foo URIs are used, and they both identify the same resource r, then there would be two paths for obtaining authoritative metadata about r, and hence the metadata retrieved via the two paths could potentially conflict.  If such conflicts are due to p providing stale data due to caching, then metadata md2 should indicate the time(s) when the data is/was known to be valid.  If conflicts are not due to caching or latency, then p is not faithfully implementing XyzHttpSpec. 

Example: LSIDs

Suppose a resource owner wishes to mint http URIs but also wants to offer the URI resolution functionality of LSIDs[6].  To do this, the resource owner can create a special purpose http URI prefix, such as such as http://entrez.example/2007/lsid: , and declare that prefix as indicating that such URIs could be accessed using the LSID protocol.  So for a URI of the form 
http://entrez.example/2007/lsid:authority:namespace:identifier:revision
a naive client dereferencing the URI would thus use HTTP, but an LSID-aware client might access the data using an LSID-aware proxy, which would:
Of course, the proxy would not need to be hard-coded to recognize the prefix.  It could merely read some string pattern matching rules (or an ontology) to map http://entrez.example/2007/lsid: URIs to urn:lsid: URIs.

Furthermore, the resource metadata returned when the http URI is naively dereferenced using HTTP could include a pointer to the URI pattern matching rules (or an ontology), so that an LSID-aware proxy that did not previouly recognize the http://entrez.example/2007/lsid:  prefix could be automatically bootstrapped to learn of its special meaning.

Comparing Capabilities of xyzscheme URIs with http://xyzpurl.org URIs

Because the http://xyzpurl.org URIs cleanly separate resource identification from resource resolution or data transfer issues, deferring to xyzscheme conventions for those tasks, the capabilities of http://xyzpurl.org URIs are virtually a direct superset of the capabilities of xyzscheme URIs, as the following table illustrates.

Users
xyzscheme URIs
http://xyzpurl.org URIs
Heavy users willing to install special xyzscheme software
Software will recognize the "xyzscheme:"          prefix on xyzscheme:foo      _    URIs and apply the conventions defined in XyzSpec to retrieve the data or metadata associated with resource r. Software will recognize the "http://xyzpurl.org?" prefix on http://xyzpurl.org?foo URIs and apply the conventions defined in XyzSpec to retrieve the data or metadata associated with resource r.
Casual users without special xyzscheme software
Software cannot access data or metadata.
Software may be able to access metadata, md2, which may include a subset of md1 or a superset of md1.

Bootstrapping Protocol Adoption

This section was added 12-Oct-2009.
Another major benefit of HTTP URIs is that they can be used to bootstrap the adoption of a new protocol by resolving to a download of a browser extension or other software that implements the new protocol, as suggested by Graham Klyne[7].  In contrast, if a new protocol is based on a new URI scheme, a user who wishes to enjoy the features of a new protocol has no choice but to manually download and install the software that implements that protocol.  Since users are far more likely to accept a browser extension download than to manually locate, download and install new software, the use of HTTP URIs could dramatically improve the adoption rate of a new protocol.

Inherent Differences

This section was added 8-Aug-2006.
Although the above has illustrated how the capabilities of HTTP URIs can generally be a direct superset of the capabilities of URIs based on new schemes or URN sub-schemes, there are some inherent differences for which new URI schemes or URN sub-schemes could still be seen as advantageous, such as:
Are these differences important enough in practice to warrant creating a new URI scheme or URN sub-scheme?  In my opinion, no.  However, this may depend on the application.  Please email me if you know of applications where you think these differences are important enough to justify the creation of new URI schemes or URN sub-schemes, or if you know of other inherent differences that I have missed.

Conclusion

HTTP URIs with specialized prefixes provide greater capability than URIs based on new URI schemes or URN sub-schemes in virtually all cases.  Furthermore, such HTTP URIs seem better equipped to survive the test of time than URIs based on new URI schemes or URN sub-schemes:
Addendum 2006-08-02: See also Kunze and Rodgers excellent work on Archival Resource Keys (ARKs)[5].  They provide a much more thorough discussion of how to achieve persistence.

Frequently Asked Questions (FAQ)

Q: Why does http://xyzpurl.org not violate URI opacity?
A: The principle of URI opacity is intended to prevent agents from incorrectly guessing properties of the associated resource or representation.  However, in this case, software that obeys the XyzHttpSpec is not guessing, it is following the explicit declaration of the URI owner (XyzConsortium).

Q: Why does http://xyzpurl.org?foo do a 303-redirect instead of returning a representation? After all, xyzscheme:foo is supposed to identify an information resource!
A: An HTTP URI  that returns a 303 status in response to an HTTP GET may be any kind of resource -- including an information resource.  (See the W3C TAG's httpRange-14 decision.)  This recipe suggests using a 303-redirect:
Q: Can specialized HTTP prefixes be used for transient URIs -- URIs that are not intended to persist?
A: Sure.  The prefix owner can associate any desired properties with the prefix.  The prefix could indicate that the URI is transient.

Q: New URI schemes or URN sub-schemes allow different URI owners to mint URIs independently, while a user discovering a URI will know that the URI has the property defined by that scheme, without having to know the conventions defined by each URI owner.  For example, xyzscheme:foo.com/fum and xyzscheme:bar.com/boo can be syntactically recognized as obeying the conventions for xyzscheme even though they were minted by different organizations, foo.com and bar.com.  How can this be done with HTTP URIs?
A: Here are two techniques:

References

1. HTTP 303 See Other status code:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
2. Describing Versus Identifying:
http://dbooth.org/2006/identity/#xtocid187478
3. URI Opacity:
http://www.w3.org/TR/webarch/#uri-opacity
4. W3C TAG's httpRange-14 decision:
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
5. The ARK Persistent Identifier Scheme, J. Kunze and R. P. C. Rodgers, Internet-Draft draft-kunze-ark-11.txt:
http://www.ietf.org/internet-drafts/draft-kunze-ark-11.txt
6. LSID specification:
http://xml.coverpages.org/lsid.html
7. Graham Klyne suggestion of using HTTP URIs to retrieve protocol handlers:
http://lists.w3.org/Archives/Public/uri/2009Sep/0029.html


12-Oct-2009: Added section on Bootstrapping Protocol Adoption.
19-May-2009: Updated email address.
1-Mar-2007: Added LSID example
8-Aug-2006: Added ARK reference
2-Aug-2006: Initial publication