URI Declaration in Semantic Web Architecture

David Booth, Ph.D.
david@dbooth.org

This document: http://dbooth.org/2007/uri-decl/

Views expressed herein are those of the author and do not necessarily reflect those of the author's employer.

Abstract

A URI declaration permits assertions about a URI's associated resource to be classified into two groups: core assertions, which are provided by the URI declaration, and ancillary assertions, which are all others. This distinction enables different parties to share a common understanding of the associated resource (by accepting the core assertions) while making different choices about which ancillary assertions to accept. Resource identity is established by a two-step mapping from a URI to a set of core assertions, and hence through an interpretation to the resource that is denoted by that URI. This paper defines these concepts and proposes some related best practices and a Web architectural rule specifying how URIs for non-information resources can be conveniently declared using existing hash or hashless (303-redirect) URI mechanisms.

Introduction

Identity is a two-step mapping
History of URI declarations

Example: A URI for the Moon

URI declaration

Definition of "URI declaration"
Suggested practice P1 (URI declaration should distinguish the resource)
Definition of "URI declaration page"

Names versus resources
Ancillary assertions
RDF blank nodes (bnodes)

Formalizing URI declarations in RDF

The decl:declares property
Class decl:UriDeclaration
Class decl:EvidencedUriDeclaration

URI declaration and resource identity
Declaring URIs for Information Resources
Granularity of URI declarations
Web architecture and implicit URI declarations

The "following your nose" algorithm

Suggested practice P2 (Use follow-your-nose algorithm to publish URI declarations)

Proposed rule for implicit URI declarations

Proposed rule R1 (Publicaton with follow-your-nose algorithm represents implicit URI declaration)
Proposed rule R2 (Converse of R1: Publication in other ways does not automatically represent URI declaration)
Suggested practice P3 (URI declaration page should avoid assertions that are not intended to be part of the URI declaration)
Suggested practice P4 (URI declaration page should provide links to other information)

Why distinguish between URI declarations and other assertions?

A URI declaration anchors the URI's meaning
Clumping facilitates reuse

Naming
Gravity

What does "authoritative" mean?

Introduction

The Semantic Web's use of a URI as a globally scoped name conveniently enables anyone to make assertions about the denoted resource. Many such assertions may be made by many parties -- both the URI owner (minter) and other users of the URI. The URI user thus faces the question of which of potentially many assertions to use. Some of these assertions may be mutually incompatible -- contradictory -- and some may be incompatible with the user's intended application or beliefs. Clearly the question of which assertions to use will depend both on the user's application, as different sets of assertions may be best suited to different applications, and on the user's good judgement in selecting those that he/she believes are the most trustworthy and suitable. But is there a common subset of assertions that all users of the URI should use when they use the URI to make statements about its denoted resource? Yes.

This paper argues that it is important to Semantic Web architecture to distinguish between core assertions for a URI, which are mandatory for anyone choosing to use that URI to make statements about the resource it denotes, and ancillary assertions for the URI, which are optional. Core assertions are those provided in an authoritative declaration for that URI; all other assertions involving that URI are ancillary assertions.

Identity is a two-step mapping

A URI declaration represents what Pat Hayes calls a "baptism " of the URI: loosely, its purpose is to establish the association between the URI and whatever resource that URI is intended to denote. It does this by "baptising" a set of assertions -- the core assertions -- that characterize the intended resource. However, since it is not possible in a theoretical sense to ensure that these core assertions will actually be interpreted as identifying the intended resource, more precisely a URI declaration only specifies the first step in a two-step mapping::

Step 1: A mapping from the URI to a set of core assertions that are intended to characterize the resource; then
Step 2: An interpretation of those core assertions as identifying the actual resource.

Step 2 is outside the control of Semantic Web architecture. It is inherently ambiguous and is left for philosophers to ponder. Fortunately, step 1 is what matters to Semantic Web applications, and this is what URI declarations address.

History of URI declarations

The ideas behind URI declarations are not new. A number of people in the Semantic Web community have used and advocated practices that amount to URI declaration. For example, Dan Connolly's paper on "A Pragmatic Theory of Reference for the Web" recommends:

1. To mint a term in the community, choose a URI of the form doc#id and publish at doc some information that motivates others to use the term in a manner that is consistent with your intended meaning(s).
2. Use of a URI of the form. doc#id implies agreement to information published at doc.

[Added 15-Jun-2010] And Sandro Hawke wrote in his RDF 2 Wishlist post (see also UseImpliesConsent, on the W3C esw wiki):

. . . the use of particular IRIs as predicates implies certain things, as defined by the IRI’s owner

[Added 15-Jun-2010] And Cool URIs for the Semantic Web recommends:

1. Be on the Web.
Given only a URI, machines and people should be able to retrieve a description about the resource identified by the URI from the Web.

[Added 6-Jul-2010] And Henry Story writes:

[We] can dereference foaf:knows to find out what it means. This is
the canonical way to find it's meaning, and is the initial procedure we
should use to arbitrate between competing understandings of its meaning.

[Added 13-Jan-2011] And Jiří Procházka writes:

[The] authoritative definition of a term is the one obtained by dereferencing it

Thus, this paper is an attempt to crystalize and standardize concepts, terminology and practices that had previously been vague, unnamed and not universally accepted in the Semantic Web community. The ideas presented here are relevant to the Semantic Web community in three ways:

They help clarify architectural issues and discussions around best practices.
For URI owners (minters), they offer guidance about how a URI declaration should be provided, and what it should contain.
For URI users (both readers and writers), they offer guidance about how a URI should be used and what additional assertions should be made using the URI.

For convenience, this paper focuses on URIs that denote something other than Web pages or Web sites, i.e., non-information resources, but the principles discussed here are also applicable to information resources. Also, although this paper is written in terms of URIs, the concepts apply equally to IRIs. (See RFC 3986 and RFC 3987 for advice on minting URIs and IRIs.)

Example: A URI for the Moon

Suppose I mint a URI to denote the moon: http://dbooth.org/2007/moon/ . I own the domain dbooth.org, so according to the AWWW's guidance on URI ownership, I have the authority to do so. Since the moon is not an information resource, in conformance with the W3C TAG's httpRange-14 decision I have configured my server such that an attempt to dereference that URI will result in a 303-redirect to http://dbooth.org/2007/moon/decl.html , which, when dereferenced, returns a page containing the following statements:

Statement M1: The URI "http://dbooth.org/2007/moon/" hereby names a particular resource, such that the following core assertions hold:
    a: http://dbooth.org/2007/moon/ is a moon.
    b: http://dbooth.org/2007/moon/ orbits the Earth.
    c: http://dbooth.org/2007/moon/ may have ancillary assertions at http://dbooth.org/2007/moon/about.html .

Although the above assertions were expressed informally in English prose, they might instead have been expressed formally in a machine-processable langauge such as RDF and look something like the following N3 excerpt (omitting the usual rdfs: prefix declaration and the irrelevant foo: prefix declaration):

<http://dbooth.org/2007/moon/> a foo:moon .
<http://dbooth.org/2007/moon/> foo:orbitsAround foo:Earth .
<http://dbooth.org/2007/moon/> rdfs:seeAlso <http://dbooth.org/2007/moon/about.html> .

For the purposes of this paper, it does not matter whether the assertions are expressed formally or informally, though to be most useful in the Semantic Web they should be expressed either directly in RDF or indirectly in an XML format that has a GRDDL transformation to RDF.

URI declaration

Definition: A URI declaration is a set of statements, or "core assertions", that authoritatively declare the association between a URI and a particular resource.

A URI declaration involves a performative speech act. (See Cowen's message or Wikipedia.) Its publication by someone who has the authority to make the declaration -- the URI owner or delegate -- creates the association between a URI and a resource. Therefore, any party wishing to use that URI to make statements about its denoted resource should take all assertions that constitute part of that URI declaration -- the core assertions -- as true by definition. In the moon example above, the core assertions are M1a, M1b and M1.c.

Proposed rule R0: Any party using a URI to make statements about the URI's denoted resource should use that URI in a manner consistent with the URI's declaration.

This is a take-it-or-leave-it proposition: If you do not want to accept the core assertions specified by the URI declaration, then you should not use that URI to make statements about its denoted resource, because in essence you may be trying to talk about a different resource -- one that shares some, but not all, of the same characteristics. Of course, violations of this rule may be completely invisible when committed by an application in the privacy of its own RAM, so in practise this rule is most relevant when statements (ancillary assertions) are written or published about the resource.

Suggested practice P1: A URI declaration should include sufficient information to distinguish the denoted resource from other resources. [Is there a WebArch reference for this? The closest I find is Good practice: Identify with URIs.. -- DBooth]

For example, statement M1.a above ("http://dbooth.org/2007/moon/ is a moon") is not sufficient to uniquely identify the intended resource, because there are many moons in the universe. However M1.a and M1.b together are sufficient, at least for many purposes. By definition M1.c is also a core assertion, but in this case its effect is merely informative: it has no impact on the identity of the denoted resource because it is trivially satisfiable regardless of what resource the URI denotes.

Beware that sufficient information to uniquely identify the resource for one purpose may not be sufficient information for another purpose. Pat Hayes has several times pointed out that one application may require finer (or different) distinctions than another. (See Hayes' message on the URI/identity issue or his IRW presentation "In Defense of Ambiguity".) Thus, P1 is a guideline -- not a hard and fast rule. (See Splitting Identities in Semantic Web Architecture for a discussion of how a URI declaration that is considered ambiguous can be related to more specific URI declarations.)

Furthermore, although the intent of a URI declaration is to supply core assertions that uniquely identify the denoted resource, there is no requirement that the core assertions be limited to assertions about that resource. For example, if the URI declaration for http://dbooth.org/2007/moon/ had contained an additional core assertion stating "Elvis is king", users would have been required to accept this assertion or forego use of the URI. This may seem odd, but there are two reasons for it. One is that some statements that are not directly about the moon may still represent assumptions that are important to the proper understanding and use of the URI. The other is that I do not know of any practical and objective way to judge whether an assertion is relevant or not, because its relevance may depend on the minter's intent. I would be interested in ideas for other approaches that would limit the assertions to those that are relevant to the denoted resource.

Definition: A URI declaration page is an information resource whose primary purpose is to provide URI declarations.

A URI declaration page is quite similar to the idea of a Published Subject Indicator. However, a single URI declaration page could contain declarations for multiple URIs. Thus, the relationship between URI declaration pages and resources is many-to-many.

Names versus resources

We are treating a URI as a name for a resource, so that when the name is used in an assertion about the resource, it will be understood as referring to, or denoting, the resource. But the treatment of a name in an explicit name declaration is very different: it is treated simply as a literal sequence of characters - a use-mention distinction. Thus, in the URI declaration phrase 'The URI "http://dbooth.org/2007/moon/" hereby names . . .', http://dbooth.org/2007/moon/ refers only to a sequence of characters that conforms to URI syntax, whereas in the statement "http://dbooth.org/2007/moon/ is a moon" it refers to a resource. In other words, the subject of a URI declaration as a whole (i.e., M1 as a whole) is a URI string -- not the denoted resource -- whereas the subject of a normal assertion is the denoted resource, even though the subordinate parts of the URI declaration that constitute the core assertions (M1.a, M1.b and M1.c) use resources as subjects.

This distinction between a name as a simple string versus the thing it denotes is critical to the idea of a URI declaration. It is more apparent in languages like Java or C++ that use explicit name declarations, and the distinction has been made evident in the example above by using such stilted phrasing as 'The URI "http://dbooth.org/2007/moon/" hereby names . . .'. But there is no such distinction in bare RDF, because RDF does not have (or need) name declarations. (Named graphs, however, extend RDF to make this distinction and enable a URI as a string to be associated with a graph.) This is why there is no visible difference between core assertions and ancillary assertions when they are expressed in RDF. The difference is created by their context: core assertions are those specified by the URI declaration, and all others are ancillary assertions.

Ancillary assertions

A URI declaration gives special importance to the assertions that are part of that declaration in order to distinguish them from other assertions about the associated resource. Ancillary assertions are any statements about the URI's associated resource that are not a part of the URI declaration. They may be made by the URI owner or anyone else. In contrast with assertions that comprise the URI declaration, ancillary assertions are optional when using a URI, regardless of who issues them: a URI user may choose to assert or not assert (i.e., to believe or not believe) ancillary assertions while using the URI.

For example, http://dbooth.org/2007/moon/about.html contains two ancillary assertions, M11 and M12:

Statement M11: http://dbooth.org/2007/moon/ has a diameter of 3474 kilometers.
Statement M12: http://dbooth.org/2007/moon/ is about 384400 kilometers from Earth.

A user electing to use http://dbooth.org/2007/moon/ to denote the moon must accept assertions M1.a, M1.b and M1.c, but may or may not assert M11 and M12.

Core versus ancillary assertions

[Added 20-Jul-2010] Others have observed that contradictions are common when independently created RDF graphs from the web are merged. In an effort to avoid such contraditions, in "Scalable Authoritative OWL Reasoning for the Web", Hogan, Harth and Polleres define the notion of "terminological triples" and use a restricted set of entailment rules that are less likely to lead to contradictions. This is essentially a heuristic means of attempting to distinguish between core assertions (which are definitional) and auxiliary assertions.

RDF blank nodes (bnodes)

An RDF bnode has an empty set of core assertions: its interpretation is initially unconstrained.

Formalizing URI declarations

Three things are essential to a URI declaration:

a URI u;
a predicate p(x), where x is a resource; and
a performative speech act, issued by the URI's owner or delegate, that indicates u and p(x).

The act of URI declaration can be understood as stating:

"If a resource r exists such that p(r) is true, then henceforth u denotes r."

If the predicate p is expressed as an RDF graph (a set of assertions), then the resulting URI declaration is analogous to a named graph, where p is the graph and u is its name. However, instead of u denoting p itself, u denotes the resource that satisfies p.

It is important to realize that the mere pairing of u and p does not complete a URI declaration without a distinguishable speech act or evidence thereof. Thus, a critical aspect of any mechanism for making URI declarations is the ability to distinguish the performative speech act from other, normal speech. There are many ways this can be done; usually context is involved. Also, in some sense the evidence that such a speech act has occurred is more important than the act itself, because what matters is that other parties believe that such an act has actually occurred. Thus, a digitally signed statement provides evidence that the signer made the signed statement, even if the reader did not witness the act of making or signing the statement.

In the moon example above, the performative speech act is the act of publishing statement M1 ("The URI 'http://dbooth.org/2007/moon/' hereby names . . . ."), URI u is http://dbooth.org/2007/moon/ , predicate p(x) is the conjunction of M1.a, M1.b and M1.c, and x is the moon. Note that if we had added a statement M2 saying "http://dbooth.org/2007/moon/ is made of green cheese" to the URI declaration then (at least in many models of the universe) there would be no way to satisfy p(x), because there is no moon that orbits the Earth and is made of green cheese.

We cannot capture the act of URI declaration in bits, but we can capture the result of this act and we can capture evidence that the act occurred. This would allow URI declarations to be explicitly expressed in RDF. One example of explicit URI declaration would be publication of a specification that defines certain URIs, even if those URIs are not dereferenceable. (Thanks to Richard Cyganiak for suggesting this example.) As far as I know there is no established way to do this in RDF, but it would be easy to define some using named graphs or log:Formulas in n3. Here are some classes and properties that could be defined for representing URI declarations explicitly in RDF.

@@ DBooth: Should I add something about rdfs:isDefinedBy, and why it doesn't fit the bill ? http://www.w3.org/TR/2000/CR-rdf-schema-20000327/#s2.3.5 @@

The decl:declares property

The decl:declares property can be defined to indicate that a particular graph declares a particular URI, as follows.

If :g denotes a named graph, and u is a URI, then the following N3 statements provide an explicit URI declaration for u:

@prefix decl:
<http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .

:g
decl:declares "

u"^^xsd:anyURI

Note the quotes around URI u, because in the declaration context it must be treated as a literal string -- not a reference to a resource -- so the domain and range of decl:declares would be:

decl:declares a rdfs:Property ;

    rdfs:domain log:Formula ;

    rdfs:range xsd:anyURI .

where log:Formula is an RDF graph expressed in N3 that represents the set of core assertions that constrain the interpretation of u.

Class decl:UriDeclaration

It may also be useful to make statements about the association, that is created by the act of URI declaration, between a URI and a set of core assertions. We can define a class decl:UriDeclaration to represent such pairs:

decl:UriDeclaration a rdfs:Class ;

    rdf:label "URI Declarations" ;

    rdf:comment """The class of URI declarations as
URI-graph associations.

        A decl:UriDeclaration is

        a pairing (or association)
between a URI and an rdfg:Graph that

        results from the act of URI
declaration.  Each decl:UriDeclaration

        has two properties:

               
decl:hasUriOfDeclaration
u
;

               
decl:hasGraphOfDeclaration
u
.
""" .



decl:hasUriOfDeclaration a rdfs:Property ;

    rdf:label "has URI of declaration" ;

    rdf:comment """decl:hasUriOfDeclaration selects the
URI component

        of a decl:UriDeclaration."""
;

    rdfs:domain decl:UriDeclaration ;

    rdfs:range xsd:anyURI .



decl:hasGraphOfDeclaration a rdfs:Property ;

    rdf:label "has graph of declaration" ;

    rdf:comment """decl:hasGraphOfDeclaration selects
the graph component

        of a decl:UriDeclaration."""
;

    rdfs:domain decl:UriDeclaration ;

    rdfs:range log:Formula .

This class allows assertions to be made directly about URI declarations, such as indicating that one URI declaration is broader than another, as discussed in Splitting Identities in Semantic Web Architecture.

Class decl:EvidencedUriDeclaration

If evidence of the act of URI declaration is captured, this could also be bundled with a decl:UriDeclaration to form a decl:EvidencedUriDeclaration:

decl:EvidencedUriDeclaration a rdfs:Class ;
    rdf:label "Evidenced URI Declarations" ;
    rdf:comment """The class of tuples representing the result of act
        of URI declaration with evidence that the act occurred.  Each
        EvidencedUriDeclaration has two properties:
                hasEvidence e ;
                hasUriDeclaration ud . """ .

decl:hasEvidence a rdfs:Property ;
    rdf:label "has evidence" ;
    rdf:comment """decl:hasEvidence is for selecting the evidence
        component of a decl:EvidencedUriDeclaration.  It
        provides evidence that the act of URI declaration
        occurred.  We do not specify how this evidence
        must be expressed or what standards of quality
        it should meet.""" ;
    rdfs:domain decl:EvidencedUriDeclaration .
    # rdfs:range rdf:Resource .

decl:hasUriDeclaration a rdfs:Property ;
    rdf:label "has URI declaration" ;
    rdf:comment """decl:hasUriDeclaration is for selecting the
        decl:UriDeclaration component of a
        decl:EvidencedUriDeclaration.""" ;
    rdfs:domain decl:EvidencedUriDeclaration ;
    rdfs:range decl:UriDeclaration .

URI declaration and resource identity

The notion of URI declaration also helps shed light on the question of resource identity. By design, a URI denotes one resource. But what resource does it denote? This question has plagued Web architecture discussions for some time. (See WWW2006 workshop on Identity, Reference and the Web.) But if we view this question operationally as asking "What assertions should be used if the URI is used?" then the answer becomes simply: "The core assertions provided by the URI declaration".

Declaring URIs for Information Resources

The discussion above has focused on non-information resources. How does URI declaration apply to information resources? Consider a URI such as http://example/foo#bar having root http://example/foo , which dereferences to an HTTP 200 response containing some assertions about the resource denoted by http://example/foo#bar . The issue has a few factors:

It would be helpful if the HTTP 200 response were architecturally treated as authoritative, both because that would be more compatible with the non-Semantic Web and because it is simple and objective.
It seems important to be able to make statements about the information resource denoted by http://example/foo without being required to accept the assertions that it contains.
It would be helpful to be able to declare additional properties of an information resource, such as: all representations are the same, they have a particular MD5 checksum, etc.

The first two factors seem to argue in favor of treating an HTTP 200 Okay response as an implicit, minimal declaration of the URI that was dereferenced, not including assertions contained in the representation that is returned. For a URI u, the URI declaration would be equivalent to the N3 assertions:

<u>

a
w:InformationResource
.

<u> log:uri "u"^^xsd:anyURI .

where w:InformationResource is the class of information resources and log:uri indicates that the URI (string) on the right denotes the resource on the left.

The third factor -- the desire to declare additional properties of an information resource -- seems to argue against treating an HTTP 200 Okay response as an implicit URI declaration, but since there are other ways that additional properties can be associated with an information resource, this does not seem like a compelling argument. For example, instead of merely placing a document at http://example/doc/bits and publishing that URI another URI such as

        http://example/doc/ir

can be minted such that when this second URI is dereferenced, it 303-redirects to a metadata URI such as

        http://example/doc/metadata

and when that is dereferenced it returns a 200 OK with an URI declaration such as:

        <http://example/doc/ir> a w;InformationResource .
        <http://example/doc/ir> :hasProvenance prov:whatever .
        <http://example/doc/ir> :hasVersion "1.20" .
        <http://example/doc/ir> :hasMd5Checksum "567990020087678940" .
        <http://example/doc/ir> :hasBitsAt "http://example/doc/bits" .

And dereferencing http://example/doc/bits can yield a 200 OK with whatever data bits you were trying to denote. Note that http://example/doc/ir would be the URI that you would want to publish -- not http://example/doc/bits .

@@ Add explanation of how <http://example/doc/ir> and <http://example/doc/bits> are related and link to paper on disambiguation. @@

According to the AWWW, an information resource is independent of a URI: any number of URIs could denote the same information resource. Therefore the HTTP 200 Okay response by itself is not enough to know whether some other URI might also name the same information resource. Of course, the content returned with the HTTP 200 Okay reponse might indicate whether there are other URIs for that resource.

Granularity of URI declarations

At one extreme, a URI declaration could assert everything that the URI owner believes to be true about the associated resource (at least within the URI owner's chosen model of the world), but this is likely to limit the reusability of the URI by applications that need to model some aspect of the resource differently. At the other extreme, the URI declaration could contain no assertions whatsoever, in which case the URI is not likely to be very useful, because its meaning will not be anchored. Between these extremes, what declaration granularity would best facilitate reuse of the URI?

At present, my best guess is that it is a trade-off between specifying less in order to enable more flexible uses of the URI, thereby also making the URI less useful on its own, versus specifying more in order to make the URI more immediately useful, but thereby constraining its reusability in conjunction with other, ancillary assertions. I would be interested in others' thoughts on this. In some sense Michel Dumontier's style of associating assertions with URIs amounts to a practice of providing minimal URI declarations consisting of only rdf:label and rdf:comment assertions -- see his email and example protein page -- except that as of this writing he is using rdfs:isDefinedBy instead of rdfs:seeAlso to point to ancillary assertions.

Web architecture and implicit URI declarations

How should URI declarations be indicated on the Web, and what should the Web architecture say about them? At present, the Web architecture does not explicitly specify any way to declare URIs.

The "following your nose" algorithm

[Editorial note: Somewhere a precise definition of this algorithm should be provided. I didn't bother to do so here, but it is needed. Perhaps the draft TAG Finding on "Dereferencing HTTP URIs" would be a good place for it. That document already has a cursory description of the algorithm. -- DBooth]

Given a URI, it is very helpful to others if that URI's declaration page can be readily located, using the URI as a starting point:

Suggested practice P2: URI owners should mint and support their URIs such that an attempt to dereference a URI of a non-information resource will lead to a URI declaration page for that URI, using one of the following "follow-your-nose" mechanisms:

Hash: If the URI contains a fragment identifier, then the part of the URI before the "#" should lead to a URI declaration page that is served with an RDF or other media type that delegates interpretation of the fragment identifier to the content that is served.
303: If the URI does not contain a fragment identifier, then an attempt to dereference the URI should yield a 303-redirect that leads to a URI declaration page.

The hash case is restricted to serving RDF or comparable media types because the semantics of a fragment identifier are governed by the media type (see RFC3986 and RFC2396), and the RDF media type delegates interpretation of the fragment identifier to the content that is served. (See the RDF Concepts section on Fragment Identifiers.)

Thus, http://dbooth.org/2007/moon/ 303-redirects to its URI declaration page at http://dbooth.org/2007/moon/decl.html . Notice that the declaration page is an information resource, so P2 does not apply to http://dbooth.org/2007/moon/decl.html . This is further discussed in the next section.

Proposed rule for implicit URI declarations

What act should be interpreted as URI declaration? I propose that the Web architecture treat the act of serving a page using either of the above two follow-your-nose mechanisms -- hash or 303 -- as a performative speech act of URI declaration:

Proposed rule R1: Given a URI u, if either of the follow-your-nose mechanisms described above yields a representation r, then, unless otherwise indicated, the conjunction of assertions made in r represents an implicit URI declaration for u.
@@ BUG: For the hash case, we need to say something about the media type, because the media type governs the meaning of the fragment identifier. The above is fine if the media type is some kind of RDF, but not if it is HTML, because then

And the converse:

Proposed rulel R2: Unless otherwise indicated (such as by rule R1 or by some explicit indication), publication of assertions about a resource denoted by a URI should not be construed as a performative speech act of declaring that URI.

Note that rule R1 does not apply to http://dbooth.org/2007/moon/decl.html , which denotes an information resource: rule R1 requires some indirection, either by stripping a fragment identifier from the URI or by a 303 redirect when the URI is dereferenced. This is intentional, as it permits statements to be made about the document that http://dbooth.org/2007/moon/decl.html denotes without being required to accept the assertions contained in that document. Similarly (and by rule R2) page http://dbooth.org/2007/moon/about.html should not be interpreted as a URI declaration page for http://dbooth.org/2007/moon/ , even though it makes statements about the resource denoted by http://dbooth.org/2007/moon/ .

Rule R1 should not necessarily be the only way to declare a URI. There could be other mechanisms also, particularly explicit mechanisms.

Rule R1 clearly has the first two components of a URI declaration, but what is the performative speech act? First, publication of the page -- regardless of the URI that leads to it -- represents the utterance of the declaration. Second, the follow-your-nose algorithm provides prima facie evidence that the declaration is authorized by the owner of the originating URI. This is important because the domain name in the URI of the declaration page could be quite different from the domain name of the original resource URI. This act of publishing the page in response to the follow-your-nose algorithm from the original URI is what distinguishes this performative speech act from other, normal speech.

This also means that if several URIs share the same URI declaration page, examination of the URI declaration page via one of those URIs will not necessarily indicate whether the other URIs are also being declared. To avoid the inefficiency of having to dereference each of those URIs in order to determine their URI declarations, either specialized URI prefixes can be defined (as described in "Converting New URI Schemes or URN Sub-Schemes to HTTP"), or explicit URI declaration mechanisms could be defined, such as the one proposed below.

Rule R1 also implies that, unless otherwise indicated, every assertion in the page obtained should be considered a part of the URI declaration and thus a core assertion. Therefore:

Suggested practice P3: A URI declaration page should not make assertions about the URI's associated resource that are not intended to be a part of that URI's declaration.

If a URI declaration page only contains URI declarations, how can other parties find ancillary assertions about the associated resources?

Suggested practice P4: A URI declaration page should provide links to suggested ancillary assertions about the resources whose URIs are declared by that page.

This does not mean that a URI owner should be responsible for providing links to all other information about the associated resource. But providing links to other known sources of information would be helpful to others, and the URI declaration page is a logical starting place to look for such links. It should be understood that providing a link does not imply any particular endorsement.

Why distinguish between URI declarations and other assertions?

Why do URI declarations matter? Why is it architecturally important to distinguish between core assertions and ancillary assertions? The answer is "to facilitate data reuse", but to explain why it facilitates data reuse, we should state a couple of assumptions.

Assumption 1: When minting a URI for a resource, the URI minter publishes a description of that resource. This does not always happen, but it clearly is recommended best practice to do so. So for the purpose of this explanation we will assume that it does happen, and we will view the resource description conceptually as a set of assertions, whether they are expressed formally or not. At one extreme, it may be an empty set, but we will ignore that case since it is equivalent to not publishing a resource description at all. At the other extreme, the minter's resource description may include all assertions that the minter has reason to believe about the resource.

Assumption 2: URI users will need additional assertions about the resource, beyond what the URI minter supplied. For example, the user may be combining assertions from several sources. This act of combining assertions from multiple sources is what the Semantic Web is all about, so this seems like a reasonable assumption.

A URI declaration anchors the URI's meaning

Given these assumptions, the establishment of a set of mandatory core assertions permits the meaning of a URI to be anchored, to prevent it from drifting, and this in turn increases the likelihood that independent assertions made using the URI can be successfully joined. Here's how.

Different applications have different needs. Thus, different URI users will necessarily wish to make different sets of assertions involving the URI. Some of these sets of assertions will be mutually incompatible, in spite of the fact that each set may be useful and valuable for some applications. One reason why this happens and needs to be permitted to happen is that when the real world is modeled formally, approximations are made. An approximation that is good enough for one application may be inadequate for another application (and may lead to logical inconsistency). For example, it is not possible to completely characterize a real world entity such as a person. This does not mean that precision and correctness should be abandoned. Rather, the point is to acknowledge that precision and correctness must be evaluated in relation to a particular application: they are not universal.

On the other hand, a key point in the use of URIs in the Semantic Web is to enable independently created data to be readily integrated, and useful new conclusions to be reached, by joining assertions that use the same URI. How can the desire for integration be reconciled with the fact that some sets of assertions will not be usable together? First, observe that different users will need to make their own choices about which sets of assertions to use. There will be no universal answer to the question: "Which sets of assertions should I use?" The right answer will vary depending on the application and context. If there is no commonly agreed-upon URI declaration -- if all sets of assertions made with the URI were equally optional -- then there would still be some possibility of being able to use two sets of assertions together. But if there is a set of core assertions that all users must accept, then the likelihood of compatibility can be increased by increased clumping, as explained below.

Clumping assertions facilitates reuse

Clumping is the effect of assertion sets being drawn closer together, to have more assertions in common, in a manner similar to the network effect. Clumping is caused by a combination of naming and "gravity".

Naming

By giving a name to a particular set of assertions, and publicizing that name, a URI declaration makes it easier to use the assertions repeatedly:

the potential user already knows that the items in the named chunk are compatible with each other, so the effort required to evaluate them is reduced; and
once the user has gotten to know a particular named set, it can be reused repeatedly as a standard, instead of separately evaluating the details of many different bundles that vary slightly from each other.

The benefits of naming are profoundly evident in software and copyright licensing: it is far easier to evaluate and get to know a few standard software or copyright licenses such as the Creative Commons GNU GPL, LGPL, BSD and Attribution licenses than to separately evaluate a custom license for every work that one considers using.

Gravity

The effect of naming would help facilitate reuse even if the core assertions were not mandatory. But having them mandatory helps in a second, closely related way.

Suppose the core assertions A1 for a URI were optional (like ancillary assertions), and two Semantic Web applications AppB and AppC were independently developed to use two sets of assertions, B and C involving that URI, as shown in Figure 1 below.

Core assertions A1 may be close enough to some of what AppB and AppC need that AppB and AppC may be written to take advantage of them. Indeed, by virtue of being provided in the URI declaration, core assertions A1 are more likely to be used than third-party ancillary assertions, so the naming effect already pulls B and C closer together even if core assertions A1 are considered optional.

But if core assertions A1 are optional, then AppB and AppC may choose to use different subsets of A1. Any assertions that are in common between B and C (the intersection) are known to be compatible, but the assertions in A1 that are not in common could conflict with other assertions in either B or C. For B and C to be used together by a third application, they must not conflict, so if the likelihood of conflict can be further reduced then reuse will be better facilitated.
Assertion sets B and C partially overlap each other and core assertions A1.

Assertion sets B and C partially overlap each other and core assertions A1.

Figure 1: AppB and AppC are developed independently to use assertion sets B and C, respectively. If core assertions A1 are optional, then B and C may partially overlap them.

For any application, there is a range of ways that the application can be implemented. Thus, if core assertions A1 are mandatory instead of optional, then AppB and AppC may be written slightly differently to accommodate and potentially take advantage of them, thus using sets B' and C' of assertions instead of B and C, as shown in Figure 2 below.

Core assertions A1 cause assertion sets B1 and C1 shift to B2 and C2

Figure 2: Core assertions A1 pull B and C closer (becoming B' and C'), thus reducing the likelihood of conflict.

In effect, the "gravitational pull" of core assertion set A1 has caused the assertion sets of AppB and AppC to move closer to each other. Because B' and C' now have more assertions in common than B and C had, and perhaps also because their remaining assertions are now more closely related to each other, the likelihood of compatibility is increased (i.e., the likelihood of conflict is reduced).

@@ TO DO: Add link to paper on disambiguating, and explain what to do if AppB really needs B. @@

Admittedly, these clumping arguments are speculative and rely on a sizeable element of faith. It would be nice to quantitatively measure this clumping effect, if anyone can figure out a good way to do so.

[Are there more reasons I should add, that help explain why URI declarations matter?]

What does "authoritative" mean?

The word "authoritative" has sometimes caused confusion in discussions of URI declarations. If a URI 303-redirects to a URI declaration page, or if it has a fragment identifier and dereferencing the part before the hash "#" leads to a URI declaration page, in what sense is a URI declaration made by that page "authoritative"? Does it mean that:

the assertions in the URI declaration are necessarily true? No.
the author of that page believes that the assertions are true? Not necessarily.
the author of that page is a recognized expert on the subject of that page? No.
the URI owner gets to control what others may say about the URI's associated resource? No.
the URI is the most popular or dominant URI for denoting the associated resource? No.
[Are there other examples I should have included here?]

A URI declaration is authoritative only in defining the association between the declared URI and a particular resource. (More precisely, it defines the first part of this association, as explained above.) The declaration creates a social expectation that other parties making use of that URI will use it to denote that same resource. (More precisely, it creates the social expectation that a party using the URI to denote its resource agrees with the core assertions in the URI declaration.) This is analogous to the social expectation that is created when a standards organization publishes a specification named XYZ and a product manufacturer then advertises an XYZ product. If that product does not conform to the XYZ specification, the manufacturer will be viewed as having violated a social expectation.

Acknowledgements

Since these ideas have been a part of the semantic web ether for quite a while, it is hard to know where they originated. If anyone knows, I'd be happy to be enlightened.

Thanks to Jeremy Carroll for early review comments.

Comments by all are invited. If I have missed a reference that I should have included, please let me know.

Change Log
13-Jan-2011: Added link to Jiří Procházka's message mentioning that the "authoritative definition of a
term is the one obtained by dereferencing it". Removed the link to the previous version, as it was too much trouble to update, and the Web Archive can be used instead. Expanded the acknowledgements. Updated affiliation.
20-Jul-2010: Added reference to "Scalable Authoritative OWL Reasoning for the Web", by Aidan Hogan, Andreas Harth and Axel Polleres, and explained how their notion of "terminological triple" relates to core assertions.
6-Jul-2010: Added link to Henry Story's mention of dereferencing URIs to determine their meaning.
15-Jun-2010: Added links to Sandro Hawke's blog entry on RDF 2 Wishlist, and to Cool URIs for the Semantic Web. Also updated my affiliation.
19-May-2009: Updated my email address.
3-Dec-2008: Added entries to the TOC.
26-Nov-2008: Fixed a bug: hash case should require RDF or other delegating media type. Changed namespace prefix from dbooth: to decl:. Moved some sections. Editorial changes. Made two-step mapping of resource identity more prominent. Changed rdfg:Graph (named graph) to log:Formula because I'm using N3 and it seems easier. Added classes decl:UriDeclaration and decl:EvidencedUriDeclaration and associated properties for accessing their components. Added a section on bnodes. Changed the title from "URI Declaration Versus Use" to "URI Declaration in Semantic Web Architecture" to be more descriptive.
03-Apr-2008: Tweaked prose and added explanation of the URI-resource association as a two-step mapping.
27-Mar-2008: Made the domain and range of dbooth:declares explicit.
28-Feb-2008: Minor editorial changes.
25-Feb-2008: Lots of changes. Added sections on granularity, ancillary assertions, and why it's important to distinguish URI declarations from ancillary assertions. Added explanation of how a URI declaration relates to resource identity. Rewrote the intro. Removed statement M2 about the moon being made of green cheese because it was causing confusion.
6-Nov-2007: Added TOC entry for "authoritative".
17-Aug-2007: Added section on declaring URIs for information resources, and clarifications suggested by Richard Cyganiak.
2-Aug-2007: Mentioned evidence of a speech act. Added more about "authoritative". Added link to PSI document. Added mention of URI declaration creating a named graph.
1-Aug-2007: Misc clarifications per Pat Hayes' private email.
31-Jul-2007: Corrected the datatype of u (to xsd:anyURI); misc clarifications.
30-Jul-2007: Added TOC, clarified speech act, misc minor fixes..
25-Jul-2007: Original draft.