URI Declaration Versus Use
Views
expressed herein are those of the author and do not necessarily
reflect those of HP.
Abstract
It is important to distinguish between a URI declaration and regular
assertions about the URI's associated resource. This distinction
enables different parties to make different choices about which
assertions to accept, while still sharing a common definition of the
associated resource. This distinction is not apparent in RDF, because
URIs are declared implicitly in RDF. The need becomes apparent
when the URI of a non-information resource is dereferenced in an
attempt to locate related information. This paper motivates and
explains this distinction, defines the notions of URI declaration and URI declaration page, and suggests
some related best practices. It also proposes a Web architectural
rule specifying how URIs for non-information resources can be
conveniently declared using established 303-redirect or hash URI
mechanisms.
Table of Contents
Introduction
When an HTTP URI is used to name something that is not a web page or
web site (i.e., not an information
resource), it is important to distinguish between the declaration
of that URI as a name for a particular resource, and regular assertions
about that resource. This difference is important to Web
architecture and to other parties that wish to use the URI in
assertions about the resource. The issue arises when
another party attempts to dereference the URI in order to learn about
the URI and its associated resource. The other party may wish to
make use of the URI as a means of referring to the resource, without
necessarily believing other assertions that are made about the
resource.
This difference is particularly confusing in RDF. Many
programming languages distinguish between variable declarations
and variable use, but RDF does not have a corresponding mechanism for
URI declaration. Thus, when RDF
statements are served from a URI, it may not be evident which
of those RDF statements are intended to constitute a URI declaration
and which
are intended to be regular assertions about the resource. They
all look the same. In fact, given an RDF triple, there is no way
to determine, by examining the triple, whether that triple should be
considered a part of the URI declaration or a regular assertion about
the resource. It is up to the URI owner to indicate this
distinction.
This paper describes the distinction between URI declaration and use,
and suggests some best practices. Even though this paper is
written in terms of URIs,
the concepts apply equally to IRIs. (See RFC 3986 and RFC 3987 for advice on
minting URIs and IRIs.) The following example will be used to
illustrate the ideas.
Example: A URI for the Moon
Suppose I mint a URI for the moon: http://dbooth.org/2007/moon/
. I own the domain dbooth.org, so I have the authority to do
so. (See URI
ownership.) Since the moon is not an information resource, in
conformance with the W3C
TAG's httpRange-14 decision I have
configured my server such that an attempt to dereference that URI will
result in a 303-redirect to http://dbooth.org/2007/moon/decl.html
, which, when dereferenced, returns a page containing the following
statements:
The role of these statements is discussed below.
URI declaration
Definition: A URI declaration is a set of statements that
authoritatively declare
the association between a URI and a particular resource.
A URI declaration is a performative speech act. (See Cowen's
message or Wikipedia.)
Its publication by someone who has the authority to make the
declaration -- i.e., the URI owner or delegate -- defines the
association between a URI and a resource. Therefore, another
party wishing to use that URI to denote that resource should take all
assertions that constitute part of that URI declaration as true by
definition. This is a take-it-or-leave-it proposition: If you do
not want to accept the assertions in the URI declaration, then you
should not use that URI, because, in essence, you may be trying to talk
about a different resource -- one that shares some, but not all, of the
same characteristics.
Suggested
practice P1: A URI declaration
should include sufficient information to
distinguish the named resource from other resources, such that other
parties can use the URI confidently to make statements about the
resource. [Is there a WebArch reference for this? The closest I find is
Good practice:
Identify with URIs.. -- DBooth]
For example, statement M1.a above ("http://dbooth.org/2007/moon/
is a moon") is not sufficient to uniquely identify the intended
resource, because there are many moons. However M1.a and M1.b
together are sufficient, at least for many purposes. Beware that
sufficient information for one purpose may not be
sufficient information for another purpose. Pat Hayes has several
times pointed out that one application may require finer (or different)
distinctions than another. (See Hayes' message
on the URI/identity issue or his IRW presentation "In
Defense of Ambiguity".) Thus, P1 is a guideline -- not a hard
and fast rule.
Definition: A URI declaration page is an information resource whose primary
purpose
is to provide URI declarations.
A URI declaration page is quite similar to the idea of a Published
Subject Indicator. However, a single URI declaration
page could contain declarations for multiple URIs. Thus, the
relationship between URI declaration pages and resources is
many-to-many.
Names versus resources
We are treating a URI as a name for a resource, so that when the name
is used in an assertion about the resource, it will be understood as
referring to the resource. But the treatment of a name in
an explicit name declaration is very different: it is treated simply as
a literal sequence of characters. Thus, in the URI declaration
phrase 'The URI "http://dbooth.org/2007/moon/"
hereby names . . .', http://dbooth.org/2007/moon/
refers only to a sequence of characters that conforms to URI syntax,
whereas in the statement "http://dbooth.org/2007/moon/
is a moon" it refers to a resource. In other words, the subject
of a URI declaration as a whole (such as
M1) is a URI string -- not a resource -- whereas the subject of a
regular assertion is a resource, even though some subordinate parts of
the URI declaration (such as M1.a and M1.b) may use resources as
subjects.
This distinction is readily apparent in a language like Java or C++
that uses explicit name declarations, but not usually in RDF, because
RDF does not usually use or need explicit name declarations. (A named
graph is an exception though.) Nonetheless, the difference is
important because other parties wishing to use http://dbooth.org/2007/moon/
to make statements about the moon need to know whether a statement like
M2, "http://dbooth.org/2007/moon/
is made of green cheese", is a subordinate part of the URI declaration
or a separate statement about the moon. The URI declaration gives
them a convenient means of ensuring that they share a common, core
understanding of the resource that http://dbooth.org/2007/moon/
denotes, even though they may not agree on other assertions that are
made about that resource.
Components of a URI declaration
More precisely, a URI declaration consists of:
- a URI u;
- a predicate p(x), where
x is a resource; and
- a performative speech act, issued by the URI's owner or delegate,
that indicates u and p(x).
The URI declaration can be understood as stating:
"If a resource r exists such that p(r) is true, then henceforth u denotes r.
Otherwise, if no such resource exists, the URI declaration is
malformed."
If the predicate p is
expressed as an RDF graph, then conceptually a URI declaration creates
a named
graph, where p is the
graph and the URI becomes its name.
It is important to realize that the mere pairing of u and p does not constitute a URI
declaration without a distinguishable speech act. Thus, a
critical aspect of any mechanism for making URI declarations is the
ability to distinguish the performative speech act from other, normal
speech. There are many ways this can be done; usually context is
involved. Also, in some sense the evidence that such a speech act
has occurred is more important than the act itself, because what
matters is that other parties believe
that such an act has actually occurred. Thus, a digitally
signed statement provides evidence that the signer made the signed
statement, even if the reader did not witness the act of making or
signing the
statement.
In the moon example above, URI u
is http://dbooth.org/2007/moon/
, predicate p(x) is the
conjunction of M1.a and M1.b, and x
is the moon. Note that if M2 ("http://dbooth.org/2007/moon/
is made of green cheese") had also been a part of p(x) then the URI declaration would
have been malformed, since there is no moon that orbits the Earth and
is made of green cheese. The performative speech act is the act
of publishing statement M1 ("The URI http://dbooth.org/2007/moon/
hereby names . . . ."). In this example, the English phrasing " .
. . hereby names . . ." distinguishes this performative speech act from
M2, which is intended as normal speech.
The word "authoritative" has sometimes caused confusion in discussions
of URI declarations. If a URI 303-redirects to a URI declaration
page, or if its racine leads to a URI declaration page, in what sense
is a URI declaration made by that page "authoritative"? Does it
mean that:
- the assertions in the URI declaration are necessarily true?
No.
- the author of that page believes
that the assertions are true? Not necessarily.
- the author of that page is a recognized expert on the subject of
that page? No.
- the URI owner gets to control what others may say about the URI's
associated resource? No.
- the URI is the most popular or dominant URI for denoting the
associated resource? No.
- [Are there other examples I
should have included here?]
A URI declaration is authoritative only in defining the association between the declared
URI and a particular resource. The declaration creates a social
expectation that other parties making use of that URI will use it to
denote that same resource. This is analogous to the social
expectation that is created when an organization publishes a
specification named XYZ and
a product manufacturer then advertises an XYZ product. If that product
does not conform to the XYZ specification,
the manufacturer will be viewed as having violated that social
expectation.
Web architecture and implicit
URI declarations
How should URI declarations be indicated on the Web?
The "following your nose" algorithm
[Editorial note: Somewhere a precise
definition of this algorithm should be provided. I didn't
bother to do so here, but it is needed. Perhaps the draft TAG
Finding on "Dereferencing HTTP URIs" would be a good place for
it. That document already has a cursory description of the
algorithm. -- DBooth]
Given a URI, it is very helpful to others if that URI's
declaration page can be readily located, using the URI as a starting
point:
Suggested practice P2:
URI owners should mint and support their URIs such that an attempt to
dereference a URI of a non-information resource will lead to a
URI declaration page for that URI, using one of the following
mechanisms:
- If the URI contains a fragment identifier, then the racine of the
URI
(i.e., the part before the #) should lead to a suitable URI
declaration page.
- If the URI does not contain a fragment identifier, then an
attempt to
dereference the URI should yield a 303-redirect that leads to a
suitable URI
declaration page.
Thus, http://dbooth.org/2007/moon/
303-redirects to its URI declaration page at http://dbooth.org/2007/moon/decl.html
.
Proposed rule for implicit URI
declarations
Page http://dbooth.org/2007/moon/decl.html
uses English both to make clear that a URI declaration is intended, and
to distinguish between the URI declaration and regular assertions about
the moon. But what should be done in other cases, such as RDF,
that do not have a mechanism for explicit URI declarations?
I propose that the Web architecture treat the act of serving a page
using either of the above two follow-your-nose mechanisms -- hash or
303 -- as a performative speech act of URI declaration:
Proposed
rule R1: Given a URI u,
if either of the follow-your-nose mechanisms described above yields a
representation r, then, unless otherwise indicated, the conjunction of
assertions made in r represents an implicit URI declaration for u.
And the converse:
Proposed rulel R2:
Unless otherwise indicated (such as by rule R1 or by some explicit
indication), publication of assertions about a resource denoted by a
URI should not be construed as a performative speech act of declaring
that URI.
This does not mean that rule R1 should be the only way to declare a URI.
There could be other mechanisms also, particularly explicit mechanisms.
Rule R1 clearly has the first two components of a URI declaration, but
what is the performative speech act? First, publication of the
page -- regardless of the URI that leads to it -- represents the
utterance of the declaration. Second, the follow-your-nose
algorithm provides prima facie evidence that the declaration is authorized by the owner of the
originating URI. This is important because the domain name in the
URI of the declaration page could be quite different from the domain
name of the original resource URI. This act of publishing the
page in response to the
follow-your-nose algorithm from the original URI is what
distinguishes this performative speech act from other, normal speech.
Rule R1 also implies that, unless otherwise indicated, every assertion
in the page obtained should be considered a part of the URI
declaration. Therefore:
Suggested
practice P3: A URI declaration
page should avoid making assertions about the URI's
associated resource that are not intended to be a part of that URI's
declaration.
In the moon example above, this means that statement M2 ("http://dbooth.org/2007/moon/
is made of green cheese") should not
be included in an equivalent RDF page, because if it were it
would be considered a part of the URI declaration and the URI http://dbooth.org/2007/moon/
would thus be unusable to parties who wish to refer to the moon and do
not choose to believe the moon is made of green cheese. On the
other hand, statement M3 ("For more information about http://dbooth.org/2007/moon/ ,
see also http://dbooth.org/2007/moon/about.html")
is safe to include in the URI
declaration page, because it is merely a suggestion: it does not affect
the satisfiability of p(x).
Notice that by rule R2, page http://dbooth.org/2007/moon/about.html
should not be interpreted as
a URI declaration page for http://dbooth.org/2007/moon/
.
This also means that if several URIs share the same URI declaration
page, examination of the URI declaration page via one of those URIs
will not necessarily indicate whether the other URIs are also being
declared. To avoid the inefficiency of having to dereference each
of
those URIs in order to determine their URI declarations, either
specialized URI prefixes can be defined (as described in "Converting New URI Schemes or
URN Sub-Schemes to HTTP"), or explicit URI declaration mechanisms
could be defined, such as the one proposed below.
If a URI declaration page only contains URI declarations, how can other
parties find other information about the associated resources?
Suggested
practice P4: A URI
declaration page
should provide links to other information about the resources whose
URIs are declared by that page.
This does not mean that a URI owner should be responsible for providing
links to all other information about the associated resource. But
providing links to other known sources of information would be helpful
to others, and the URI declaration page is a logical starting
place to look for such links. It should be understood that
providing a link does not imply any particular endorsement.
Explicit URI declaration in RDF
One example of explicit URI declaration would be publication of a
specification that defines certain URIs, even if those URIs are not
dereferenceable. [Thanks to
Richard Cyganiak for suggesting this example. -- DBooth] This
raises the question of whether there is a recognized RDF way to express
URI declarations.
I do not know of any explicit URI declaration predicate that has
already been defined for RDF -- please tell me if there is one -- but
it would be easy to define one using named graphs:
If
g
is the URI of a named graph, and
u
is a URI, then the following
N3 statements
provide an explicit URI declaration for
u:
@prefix dbooth:
<http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .
g
dbooth:declares "
u"^^xsd:anyURI
.
Note the quotes around URI u,
because in the declaration context it must be treated as a literal
string -- not a reference to
a resource.
This predicate has two of the three elements required for a URI
declaration. A performative speech act (or evidence of one) is
still needed to complete the declaration.
Declaring URIs for Information Resources
The discussion above has focused on non-information resources.
How does URI declaration apply to information resources? The best
explanation I have so far is that, following the TAG's httpRange-14
decision, an HTTP 200 Okay response to an HTTP GET on a URI should
be interpreted as an implicit declaration of that URI: the URI is
declared as a name for the information resource that responded to the
GET request. In effect, the HTTP 200 Okay response from a URI u declares:
<
u
>
a
w:InformationResource .
<
u
> log:uri "
u"^^xsd:anyURI .
where w:InformationResource
is the class of information resources and log:uri indicates
that the URI (string) on the right names the resource on the
left.
According to the WebArch,
an information resource is independent of a URI: any number of
different URIs could be associated with the same information
resource. Therefore the HTTP 200 Okay response by itself is not
enough to know whether some other URI might also name the same
information resource. Of course, the content returned with the
HTTP 200 Okay reponse might indicate whether there are other URIs for
that resource.
Acknowledgements
Thanks to Jeremy Carroll for review comments.
Comments by all are invited. If I have missed a reference that I
should have included, please let me know.
17-Aug-2007: Added section
on declaring URIs for information resources, and clarifications
suggested by Richard Cyganiak.
2-Aug-2007: Mentioned
evidence of a speech act. Added more
about "authoritative". Added link to PSI document. Added
mention of URI declaration creating a named graph.
1-Aug-2007: Misc clarifications per Pat Hayes' private email.
31-Jul-2007: Corrected the datatype of u (to xsd:anyURI); misc
clarifications.
30-Jul-2007: Added TOC, clarified speech act, misc minor fixes..
25-Jul-2007: Original draft.