URI Declaration Versus Use

David Booth, Ph.D.
HP Software
  Comments are invited: dbooth@hp.com

Latest version: http://dbooth.org/2007/uri-decl/
This version: http://dbooth.org/2007/uri-decl/20070817.htm
Views expressed herein are those of the author and do not necessarily reflect those of HP.

Abstract

It is important to distinguish between a URI declaration and regular assertions about the URI's associated resource.  This distinction enables different parties to make different choices about which assertions to accept, while still sharing a common definition of the associated resource. This distinction is not apparent in RDF, because URIs are declared implicitly in RDF.  The need becomes apparent when the URI of a non-information resource is dereferenced in an attempt to locate related information.  This paper motivates and explains this distinction, defines the notions of URI declaration and URI declaration page, and suggests some related best practices.  It also proposes a Web architectural rule specifying how URIs for non-information resources can be conveniently declared using established 303-redirect or hash URI mechanisms.

Table of Contents

Introduction

When an HTTP URI is used to name something that is not a web page or web site (i.e., not an information resource), it is important to distinguish between the declaration of that URI as a name for a particular resource, and regular assertions about that resource.  This difference is important to Web architecture and to other parties that wish to use the URI in assertions about the resource.   The issue arises when another party attempts to dereference the URI in order to learn about the URI and its associated resource.  The other party may wish to make use of the URI as a means of referring to the resource, without necessarily believing other assertions that are made about the resource.

This difference is particularly confusing in RDF.  Many programming languages distinguish between variable declarations and variable use, but RDF does not have a corresponding mechanism for URI declaration.  Thus, when RDF statements are served from a URI, it may not be evident which of those RDF statements are intended to constitute a URI declaration and which are intended to be regular assertions about the resource.  They all look the same.  In fact, given an RDF triple, there is no way to determine, by examining the triple, whether that triple should be considered a part of the URI declaration or a regular assertion about the resource.  It is up to the URI owner to indicate this distinction.

This paper describes the distinction between URI declaration and use, and suggests some best practices.  Even though this paper is written in terms of URIs, the concepts apply equally to IRIs. (See RFC 3986 and RFC 3987 for advice on minting URIs and IRIs.)  The following example will be used to illustrate the ideas.

Example: A URI for the Moon

Suppose I mint a URI for the moon: http://dbooth.org/2007/moon/ .  I own the domain dbooth.org, so I have the authority to do so.  (See URI ownership.)  Since the moon is not an information resource, in conformance with the W3C TAG's httpRange-14 decision I have configured my server such that an attempt to dereference that URI will result in a 303-redirect to http://dbooth.org/2007/moon/decl.html , which, when dereferenced, returns a page containing the following statements:

Statement M1: The URI http://dbooth.org/2007/moon/ hereby names a particular resource, such that:
    a: http://dbooth.org/2007/moon/ is a moon.
    b: http://dbooth.org/2007/moon/ orbits the Earth.

Statement M2: http://dbooth.org/2007/moon/ is made of green cheese.

Statement M3: For more information about http://dbooth.org/2007/moon/ , see also http://dbooth.org/2007/moon/about.html .

The role of these statements is discussed below.

URI declaration

Definition: A URI declaration is a set of statements that authoritatively declare the association between a URI and a particular resource.

A URI declaration is a performative speech act.  (See Cowen's message or Wikipedia.)  Its publication by someone who has the authority to make the declaration -- i.e., the URI owner or delegate -- defines the association between a URI and a resource.  Therefore, another party wishing to use that URI to denote that resource should take all assertions that constitute part of that URI declaration as true by definition.  This is a take-it-or-leave-it proposition: If you do not want to accept the assertions in the URI declaration, then you should not use that URI, because, in essence, you may be trying to talk about a different resource -- one that shares some, but not all, of the same characteristics. 

Suggested practice P1: A URI declaration should include sufficient information to distinguish the named resource from other resources, such that other parties can use the URI confidently to make statements about the resource. [Is there a WebArch reference for this? The closest I find is Good practice: Identify with URIs..  -- DBooth]

For example, statement M1.a above ("http://dbooth.org/2007/moon/ is a moon") is not sufficient to uniquely identify the intended resource, because there are many moons.  However M1.a and M1.b together are sufficient, at least for many purposes.  Beware that sufficient information for one purpose may not be sufficient information for another purpose.  Pat Hayes has several times pointed out that one application may require finer (or different) distinctions than another.  (See Hayes' message on the URI/identity issue or his IRW presentation "In Defense of Ambiguity".)  Thus, P1 is a guideline -- not a hard and fast rule.

Definition: A URI declaration page is an information resource whose primary purpose is to provide URI declarations.

A URI declaration page is quite similar to the idea of a Published Subject Indicator.   However, a single URI declaration page could contain declarations for multiple URIs.  Thus, the relationship between URI declaration pages and resources is many-to-many. 

Names versus resources

We are treating a URI as a name for a resource, so that when the name is used in an assertion about the resource, it will be understood as referring to the resource.   But the treatment of a name in an explicit name declaration is very different: it is treated simply as a literal sequence of characters.  Thus, in the URI declaration phrase 'The URI "http://dbooth.org/2007/moon/" hereby names . . .',  http://dbooth.org/2007/moon/ refers only to a sequence of characters that conforms to URI syntax, whereas in the statement "http://dbooth.org/2007/moon/ is a moon" it refers to a resource.  In other words, the subject of a URI declaration as a whole (such as M1) is a URI string -- not a resource --  whereas the subject of a regular assertion is a resource, even though some subordinate parts of the URI declaration (such as M1.a and M1.b) may use resources as subjects.

This distinction is readily apparent in a language like Java or C++ that uses explicit name declarations, but not usually in RDF, because RDF does not usually use or need explicit name declarations.  (A named graph is an exception though.)  Nonetheless, the difference is important because other parties wishing to use http://dbooth.org/2007/moon/ to make statements about the moon need to know whether a statement like M2, "http://dbooth.org/2007/moon/ is made of green cheese", is a subordinate part of the URI declaration or a separate statement about the moon.  The URI declaration gives them a convenient means of ensuring that they share a common, core understanding of the resource that http://dbooth.org/2007/moon/ denotes, even though they may not agree on other assertions that are made about that resource.

Components of a URI declaration

More precisely, a URI declaration consists of:
  1. a URI u;
  2. a predicate p(x), where x is a resource; and
  3. a performative speech act, issued by the URI's owner or delegate, that indicates u and p(x).
The URI declaration can be understood as stating:

"If a resource r exists such that p(r) is true, then henceforth u denotes r.
Otherwise, if no such resource exists, the URI declaration is malformed."

If the predicate p is expressed as an RDF graph, then conceptually a URI declaration creates a named graph, where p is the graph and the URI becomes its name.

It is important to realize that the mere pairing of u and p does not constitute a URI declaration without a distinguishable speech act.  Thus, a critical aspect of any mechanism for making URI declarations is the ability to distinguish the performative speech act from other, normal speech.  There are many ways this can be done; usually context is involved.  Also, in some sense the evidence that such a speech act has occurred is more important than the act itself, because what matters is that other parties believe that such an act has actually occurred.  Thus, a digitally signed statement provides evidence that the signer made the signed statement, even if the reader did not witness the act of making or signing the statement.

In the moon example above, URI u is http://dbooth.org/2007/moon/ , predicate p(x) is the conjunction of M1.a and M1.b, and x is the moon.  Note that if M2 ("http://dbooth.org/2007/moon/ is made of green cheese") had also been a part of p(x) then the URI declaration would have been malformed, since there is no moon that orbits the Earth and is made of green cheese.  The performative speech act is the act of publishing statement M1 ("The URI http://dbooth.org/2007/moon/ hereby names . . . .").  In this example, the English phrasing " . . . hereby names . . ." distinguishes this performative speech act from M2, which is intended as normal speech.

The word "authoritative" has sometimes caused confusion in discussions of URI declarations.  If a URI 303-redirects to a URI declaration page, or if its racine leads to a URI declaration page, in what sense is a URI declaration made by that page "authoritative"?   Does it mean that:
A URI declaration is authoritative only in defining the association between the declared URI and a particular resource.  The declaration creates a social expectation that other parties making use of that URI will use it to denote that same resource.  This is analogous to the social expectation that is created when an organization publishes a specification named XYZ and a product manufacturer then advertises an XYZ product.  If that product does not conform to the XYZ specification, the manufacturer will be viewed as having violated that social expectation.

Web architecture and implicit URI declarations

How should URI declarations be indicated on the Web?

 The "following your nose" algorithm

[Editorial note: Somewhere a precise definition of this algorithm should be provided.  I didn't bother to do so here, but it is needed.  Perhaps the draft TAG Finding on "Dereferencing HTTP URIs" would be a good place for it.  That document already has a cursory description of the algorithm.  -- DBooth]

Given a URI, it is very helpful to others if that URI's declaration page can be readily located, using the URI as a starting point:

Suggested practice P2: URI owners should mint and support their URIs such that an attempt to dereference a URI of a non-information resource will lead to a URI declaration page for that URI, using one of the following mechanisms:
Thus, http://dbooth.org/2007/moon/ 303-redirects to its URI declaration page at http://dbooth.org/2007/moon/decl.html .

Proposed rule for implicit URI declarations

Page http://dbooth.org/2007/moon/decl.html uses English both to make clear that a URI declaration is intended, and to distinguish between the URI declaration and regular assertions about the moon.  But what should be done in other cases, such as RDF, that do not have a mechanism for explicit URI declarations?

I propose that the Web architecture treat the act of serving a page using either of the above two follow-your-nose mechanisms -- hash or 303 -- as a performative speech act of URI declaration:

Proposed rule R1:  Given a URI u, if either of the follow-your-nose mechanisms described above yields a representation r, then, unless otherwise indicated, the conjunction of assertions made in r represents an implicit URI declaration for u.

And the converse:

Proposed rulel R2: Unless otherwise indicated (such as by rule R1 or by some explicit indication), publication of assertions about a resource denoted by a URI should not be construed as a performative speech act of declaring that URI.

This does not mean that rule R1 should be the only way to declare a URI.  There could be other mechanisms also, particularly explicit mechanisms.

Rule R1 clearly has the first two components of a URI declaration, but what is the performative speech act?  First, publication of the page -- regardless of the URI that leads to it -- represents the utterance of the declaration.  Second, the follow-your-nose algorithm provides prima facie evidence that the declaration is authorized by the owner of the originating URI.  This is important because the domain name in the URI of the declaration page could be quite different from the domain name of the original resource URI.  This act of publishing the page in response to the follow-your-nose algorithm from the original URI is what distinguishes this performative speech act from other, normal speech.

Rule R1 also implies that, unless otherwise indicated, every assertion in the page obtained should be considered a part of the URI declaration.  Therefore:

Suggested practice P3: A URI declaration page should avoid making assertions about the URI's associated resource that are not intended to be a part of that URI's declaration.

In the moon example above, this means that statement M2 ("http://dbooth.org/2007/moon/ is made of green cheese") should not be included in an equivalent RDF page, because if it were it would be considered a part of the URI declaration and the URI http://dbooth.org/2007/moon/ would thus be unusable to parties who wish to refer to the moon and do not choose to believe the moon is made of green cheese.  On the other hand, statement M3 ("For more information about http://dbooth.org/2007/moon/ , see also http://dbooth.org/2007/moon/about.html") is safe to include in the URI declaration page, because it is merely a suggestion: it does not affect the satisfiability of p(x).  Notice that by rule R2, page http://dbooth.org/2007/moon/about.html should not be interpreted as a URI declaration page for http://dbooth.org/2007/moon/ .

This also means that if several URIs share the same URI declaration page, examination of the URI declaration page via one of those URIs will not necessarily indicate whether the other URIs are also being declared.  To avoid the inefficiency of having to dereference each of those URIs in order to determine their URI declarations, either specialized URI prefixes can be defined (as described in "Converting New URI Schemes or URN Sub-Schemes to HTTP"), or explicit URI declaration mechanisms could be defined, such as the one proposed below.

If a URI declaration page only contains URI declarations, how can other parties find other information about the associated resources?

Suggested practice P4: A URI declaration page should provide links to other information about the resources whose URIs are declared by that page. 

This does not mean that a URI owner should be responsible for providing links to all other information about the associated resource.  But providing links to other known sources of information would be helpful to others, and the URI declaration page is a logical starting place to look for such links.  It should be understood that providing a link does not imply any particular endorsement.

Explicit URI declaration in RDF

One example of explicit URI declaration would be publication of a specification that defines certain URIs, even if those URIs are not dereferenceable.  [Thanks to Richard Cyganiak for suggesting this example. -- DBooth]  This raises the question of whether there is a recognized RDF way to express URI declarations.

 I do not know of any explicit URI declaration predicate that has already been defined for RDF -- please tell me if there is one -- but it would be easy to define one using named graphs:

If g is the URI of a named graph, and u is a URI, then the following N3 statements provide an explicit URI declaration for u:

@prefix dbooth: <http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .
g dbooth:declares "u"^^xsd:anyURI .

Note the quotes around URI u, because in the declaration context it must be treated as a literal string -- not a reference to a resource. 

This predicate has two of the three elements required for a URI declaration.  A performative speech act (or evidence of one) is still needed to complete the declaration.

Declaring URIs for Information Resources

The discussion above has focused on non-information resources.  How does URI declaration apply to information resources?  The best explanation I have so far is that, following the TAG's httpRange-14 decision, an HTTP 200 Okay response to an HTTP GET on a URI should be interpreted as an implicit declaration of that URI: the URI is declared as a name for the information resource that responded to the GET request.  In effect, the HTTP 200 Okay response from a URI u declares:

<u> a w:InformationResource .
<u> log:uri "u"^^xsd:anyURI .

where w:InformationResource is the class of information resources and log:uri indicates that the URI (string) on the right names the resource on the left. 

According to the WebArch, an information resource is independent of a URI: any number of different URIs could be associated with the same information resource.  Therefore the HTTP 200 Okay response by itself is not enough to know whether some other URI might also name the same information resource.  Of course, the content returned with the HTTP 200 Okay reponse might indicate whether there are other URIs for that resource.

Acknowledgements

Thanks to Jeremy Carroll for review comments.

Comments by all are invited.  If I have missed a reference that I should have included, please let me know.


17-Aug-2007: Added section on declaring URIs for information resources, and clarifications suggested by Richard Cyganiak.
2-Aug-2007: Mentioned evidence of a speech act.  Added more about "authoritative".  Added link to PSI document.  Added mention of URI declaration creating a named graph.
1-Aug-2007: Misc clarifications per Pat Hayes' private email.
31-Jul-2007: Corrected the datatype of u (to xsd:anyURI); misc clarifications.
30-Jul-2007: Added TOC, clarified speech act, misc minor fixes..
25-Jul-2007:
Original draft.