Converting New URI Schemes or URN Sub-Schemes to HTTP

David Booth
HP Software
david@dbooth.org
Latest version: http://dbooth.org/2006/urn2http/

The views expressed herein are my own and do not necessarily reflect those of HP. Comments are invited.

Abstract

New URI schemes or URN sub-schemes are sometimes proposed for resource identification in applications where the HTTP protocol is deemed unsuitable. This paper argues that URIs based on specialized HTTP URI prefixes would be a better choice in virtually all cases, even if the resource resolution or data transfer properties of HTTP are insufficient for these applications. A simple recipe is presented for converting proposed URI schemes or URN sub-schemes to HTTP using specialized URI prefixes. This technique cleanly separates the use of the URI as an identifier (to establish resource identity) from the use of the URI as a locator (to retrieve representations). The resulting capabilities of the HTTP URIs are virtually a direct superset of those of URIs based on new URI schemes or URN sub-schemes.

Introductions: A Scenario
Recipe for Converting to HTTP URIs

Variations
Conflicts in Metadata
Example: LSIDs

Comparing Capabilities of xyzscheme URIs with http://xyzpurl.org URIs

Bootstrapping Protocol Adoption
Inherent Differences

Conclusion
Frequently Asked Questions (FAQ)
References

Introductions: A Scenario

A (fictitious) organization, XyzConsortium, representing a group of companies and individuals working in a particular field, xyz, wishes to define a kind of persistent identifier that has resolution characteristics different from HTTP. It defines a new URI scheme or URN sub-scheme, xyzsheme, and publishes a specification, XyzSpec, that defines:

Syntactic conventions for constructing such URIs (over and above the syntactic requirements of URIs in general), perhaps incorporating such niceties as version numbers or checksums into the URIs;
Conventions for locating data, d, associated with such a URI; and
Conventions for locating metadata, md1, associated with such a URI.

The resulting URIs are of the form xyzscheme:foo, where xyzscheme:foo identifies a resource r, and foo obeys the syntactic conventions dictated by XyzSpec in addition to obeying the syntactic conventions for URIs in general. Users of xyzscheme URIs are expected to know and follow the conventions published in XyzSpec, and specialized software is made available for resolving xyzscheme URIs to their associated data d or metadata md1. Metadata md1 is expressed in a widely understood format that is not specific to the xyz field. However, data d may be in a format that is specific to the xyz field.

Heavy users of xyzscheme URIs are happy with xyzscheme URIs and do not mind the fact that they must have special software installed to retrieve data d and/or metadata md1. They often need both the data d and the metadata md1. However, casual users (or users in more peripheral fields) who do not have the xyzscheme resolution software installed are not so happy. They are unable to make much use of these URIs without the xyzscheme software, in spite of the fact that many of their applications only need metadata md1 and do not need data d at all. A few complain to XyzConsortium, but many others quietly forego the benefit that such metadata could have provided to their applications.

To facilitate use by more casual users (and users in related fields), XyzConsortium decides to offer HTTP URIs as synonyms for its existing xyzcheme URIs using the following recipe.

Recipe for Converting to HTTP URIs

Step 1. XyzConsortium creates a web site, http://xyzpurl.org, for forwarding HTTP requests. In setting up xyzpurl.org, XyzConsortium uses all of the institutional and legal safeguards at its disposal to ensure that the site will continue to exist and faithfully implement its intended purpose for as long as possible.

Step 2. XyzConsortium publishes a specification, XyzHttpSpec, for interpreting the specialized HTTP URI prefix, "http://xyzpurl.org?". In particular, it declares that for any URI of the form http://xyzpurl.org?foo:

Syntactic conventions for http://xyzpurl.org?foo must conform to the XyzSpec conventions for xyzscheme:foo;
Data associated with http://xyzpurl.org?foo may be located using the XyzSpec conventions for locating data associated with xyzscheme:foo;
Metadata associated with http://xyzpurl.org?foo may be located using the XyzSpec conventions for locating metadata associated with xyzscheme:foo.

(This technique of defining an HTTP URI prefix is the same technique used by thing-described-by.org.)

Step 3. XyzConsortium configures the xyzpurl.org web server such that an HTTP GET on any URI of the form http://xyzpurl.org?foo will be redirected (using an HTTP 303 See Other[1] status code) to another HTTP URI, p, where metadata, md2, may be obtained. (Think of p as a caching proxy for accessing metadata md1 of xyzscheme:foo.)

Metadata md2 should unambiguously identify[2] resource r that http://xyzpurl.org?foo (or xyzscheme:foo) names, i.e., it should provide sufficient information for users to distinguish resource r from all other resources. It should be expressed in a widely understood format that does not require xyz-specific software to interpret. And of course, md2 must be consistent with md1 (and d). Normally, md1 would be sufficient to meet these requirements. Furthermore, ideally md2 should include:

as much of metadata md1 as possible (preferably all of md1), which p could supply by using the XyzSpec conventions to retrieve (and perhaps cache) md1;
(if needed) pointers to metadata md1 and data d, preferably via protocols that are not xyz specific (such as HTTP); and
a pointer to the XyzHttpSpec, so that users discovering http://xyzpurl.org?foo URIs are encouraged to learn how they may be resolved more efficiently using xyzscheme conventions.

Step 4. XyzConsortium furthermore publishes a declaration stating that: (a) for any URI http://xyzpurl.org?foo conforming to XyzHttpSpec, http://xyzpurl.org?foo identifies the same resource r that xyzscheme:foo identifies; and (b) if for any reason the xyzpurl.org web site no longer functions, or if it fails to faithfully implement the intended purpose of XyzHttpSpec, then any information it serves should be ignored and the meaning of http://xyzpurl.org?foo should be regarded as identical to the meaning of xyzscheme:foo. forevermore.

Variations

Many variations of this basic recipe are possible of course. Some applications for which new URI schemes or URN sub-schemes are proposed may only have metadata md1 and no data d; other applications may mix metadata md1 and data d.

Conflicts in Metadata

If both http://xyzpurl.org?foo and xyzscheme:foo URIs are used, and they both identify the same resource r, then there would be two paths for obtaining authoritative metadata about r, and hence the metadata retrieved via the two paths could potentially conflict. If such conflicts are due to p providing stale data due to caching, then metadata md2 should indicate the time(s) when the data is/was known to be valid. If conflicts are not due to caching or latency, then p is not faithfully implementing XyzHttpSpec.

Example: LSIDs

Suppose a resource owner wishes to mint http URIs but also wants to offer the URI resolution functionality of LSIDs[6]. To do this, the resource owner can create a special purpose http URI prefix, such as such as http://entrez.example/2007/lsid: , and declare that prefix as indicating that such URIs could be accessed using the LSID protocol. So for a URI of the form

http://entrez.example/2007/lsid:authority:namespace:identifier:revision

a naive client dereferencing the URI would thus use HTTP, but an LSID-aware client might access the data using an LSID-aware proxy, which would:

recognize the http://entrez.example/2007/lsid: prefix;
convert it to urn:lsid: and
resolve the result using LSID resolution.

Of course, the proxy would not need to be hard-coded to recognize the prefix. It could merely read some string pattern matching rules (or an ontology) to map http://entrez.example/2007/lsid: URIs to urn:lsid: URIs.

Furthermore, the resource metadata returned when the http URI is naively dereferenced using HTTP could include a pointer to the URI pattern matching rules (or an ontology), so that an LSID-aware proxy that did not previouly recognize the http://entrez.example/2007/lsid: prefix could be automatically bootstrapped to learn of its special meaning.

Comparing Capabilities of xyzscheme URIs with http://xyzpurl.org URIs

Because the http://xyzpurl.org URIs cleanly separate resource identification from resource resolution or data transfer issues, deferring to xyzscheme conventions for those tasks, the capabilities of http://xyzpurl.org URIs are virtually a direct superset of the capabilities of xyzscheme URIs, as the following table illustrates.

Users	xyzscheme URIs	http://xyzpurl.org URIs
Heavy users willing to install special xyzscheme software	Software will recognize the "xyzscheme:" prefix on xyzscheme:foo _ URIs and apply the conventions defined in XyzSpec to retrieve the data or metadata associated with resource r.	Software will recognize the "http://xyzpurl.org?" prefix on http://xyzpurl.org?foo URIs and apply the conventions defined in XyzSpec to retrieve the data or metadata associated with resource r.
Casual users without special xyzscheme software	Software cannot access data or metadata.	Software may be able to access metadata, md2, which may include a subset of md1 or a superset of md1.

Bootstrapping Protocol Adoption

This section was added 12-Oct-2009.
Another major benefit of HTTP URIs is that they can be used to bootstrap the adoption of a new protocol by resolving to a download of a browser extension or other software that implements the new protocol, as suggested by Graham Klyne[7]. In contrast, if a new protocol is based on a new URI scheme, a user who wishes to enjoy the features of a new protocol has no choice but to manually download and install the software that implements that protocol. Since users are far more likely to accept a browser extension download than to manually locate, download and install new software, the use of HTTP URIs could dramatically improve the adoption rate of a new protocol.

Inherent Differences

This section was added 8-Aug-2006.
Although the above has illustrated how the capabilities of HTTP URIs can generally be a direct superset of the capabilities of URIs based on new schemes or URN sub-schemes, there are some inherent differences for which new URI schemes or URN sub-schemes could still be seen as advantageous, such as:

URI Length. HTTP URIs will generally be longer
Governing Authority. New URI schemes must be registered with IANA, whereas specialized HTTP prefixes may be defined by any URI owner. This may be a concern, both because IANA may be perceived as being more reputable than other organizations, and because IANA provides a single place to look for scheme definitions. However, if this concern is important enough, a registry of specialized HTTP prefixes could be created by a reputable organization -- potentially even IANA.
Expectations. Users discovering an xyzscheme URI expect it to be governed by a separate specification, whereas users discovering an HTTP URI with a specialized prefix may not realize that there is a separate specification governing it (over and above the http scheme specification). This can be mitigated by educating users, and one good way to do so is to serve useful metadata (indirectly) via the URI, as described above.

Are these differences important enough in practice to warrant creating a new URI scheme or URN sub-scheme? In my opinion, no. However, this may depend on the application. Please email me if you know of applications where you think these differences are important enough to justify the creation of new URI schemes or URN sub-schemes, or if you know of other inherent differences that I have missed.

Conclusion

HTTP URIs with specialized prefixes provide greater capability than URIs based on new URI schemes or URN sub-schemes in virtually all cases. Furthermore, such HTTP URIs seem better equipped to survive the test of time than URIs based on new URI schemes or URN sub-schemes:

HTTP URIs can be dereferenced by anyone, using GET -- not just by those who are the primary intended users. Therefore, HTTP URIs are likely to more widely disperse knowledge of their intended purpose and conventions for use, thus increasing the likelihood of their survival over time.
HTTP URIs offer a lower barrier to use: applications without specialized software can still do a follow-your-nose GET on an HTTP URI to potentially retrieve useful metadata about it. Therefore they are likely to achieve greater uptake, particularly in applications beyond their primary intended use.
HTTP URIs allow a new protocol adoption to be readily bootstrapped by dereferencing to browser extension downloads.

Addendum 2006-08-02: See also Kunze and Rodgers excellent work on Archival Resource Keys (ARKs)[5]. They provide a much more thorough discussion of how to achieve persistence.

Frequently Asked Questions (FAQ)

Q: Why does http://xyzpurl.org not violate URI opacity?
A: The principle of URI opacity is intended to prevent agents from incorrectly guessing properties of the associated resource or representation. However, in this case, software that obeys the XyzHttpSpec is not guessing, it is following the explicit declaration of the URI owner (XyzConsortium).

Q: Why does http://xyzpurl.org?foo do a 303-redirect instead of returning a representation? After all, xyzscheme:foo is supposed to identify an information resource!
A: An HTTP URI that returns a 303 status in response to an HTTP GET may be any kind of resource -- including an information resource. (See the W3C TAG's httpRange-14 decision.) This recipe suggests using a 303-redirect:

to facilitate cases where the named resource r is not an information resource, i.e., where it does not have a representation (data d);
to facilitate cases where representations (data d) would be inefficient or inappropriate to retrieve using HTTP;
to enable data and metadata to be separately associated with the URI; and
to permit the resource to have properties that are not intrinsic to information resources, such as immutability.

Q: Can specialized HTTP prefixes be used for transient URIs -- URIs that are not intended to persist?
A: Sure. The prefix owner can associate any desired properties with the prefix. The prefix could indicate that the URI is transient.

Q: New URI schemes or URN sub-schemes allow different URI owners to mint URIs independently, while a user discovering a URI will know that the URI has the property defined by that scheme, without having to know the conventions defined by each URI owner. For example, xyzscheme:foo.com/fum and xyzscheme:bar.com/boo can be syntactically recognized as obeying the conventions for xyzscheme even though they were minted by different organizations, foo.com and bar.com. How can this be done with HTTP URIs?
A: Here are two techniques:

The owner of the specialized HTTP prefix can use the rest of the URI to delegate minting authority to other URI owners, such as:
http://xyzpurl.org?xyzscheme:foo.com/fum
http://xyzpurl.org?xyzscheme:bar.com/boo
In effect, a class of specialized HTTP prefixes can be defined, and individually owned prefixes can declare themselves to be members of that class. For example, if the term http://xyzconsortium.org/terms/xyzprefix is defined to indicate that something is a specialized xyz HTTP prefix, then metadata served (indirectly) via http://foo.com?fum can indicate that "http://foo.com?" is a http://xyzconsortium.org/terms/xyzprefix , and metadata served (indirectly) via http://bar.com?bee can also indicate that "http://bar.com?" is a http://xyzconsortium.org/terms/xyzprefix .

References

1. HTTP 303 See Other status code:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
2. Describing Versus Identifying:
http://dbooth.org/2006/identity/#xtocid187478
3. URI Opacity:
http://www.w3.org/TR/webarch/#uri-opacity
4. W3C TAG's httpRange-14 decision:
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
5. The ARK Persistent Identifier Scheme, J. Kunze and R. P. C. Rodgers, Internet-Draft draft-kunze-ark-11.txt:
http://www.ietf.org/internet-drafts/draft-kunze-ark-11.txt
6. LSID specification:
http://xml.coverpages.org/lsid.html
7. Graham Klyne suggestion of using HTTP URIs to retrieve protocol handlers:
http://lists.w3.org/Archives/Public/uri/2009Sep/0029.html

12-Oct-2009: Added section on Bootstrapping Protocol Adoption.
19-May-2009: Updated email address.
1-Mar-2007: Added LSID example
8-Aug-2006: Added ARK reference
2-Aug-2006: Initial publication