The URI Lifecycle in Semantic Web Architecture

David Booth
Cleveland Clinic (contractor)
david@dbooth.org

Preprint from Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09), 11-Jul-2009.
Latest version: http://dbooth.org/2009/lifecycle/

Views expressed herein are those of the author and do not necessarily reflect those of Cleveland Clinic.

Abstract.  Various parties are typically involved in the creation and use of a URI, including the URI owner, an RDF statement author, and a consumer of that RDF statement. What principles should these parties follow, to ensure that a consistent resource identity is established and (to the extent possible) maintained throughout that URI's lifetime? This paper proposes a set of roles and responsibilities for establishing and determining a URI's resource identity through its lifecycle.

Key words: Semantic Web, RDF, identity, URI declaration, URI definition

1    Introduction

Semantic web applications are based both on formal logic and web architecture.  The Architecture of the World Wide Web (AWWW) [Jacobs 2004] describes some of the most important architectural principles underlying web applications, but additional architectural principles are needed that have not yet been well established for semantic web applications. Some of these pertain to the creation of URIs and the association of a URI to a resource, i.e., the URI's resource identity. This paper proposes some architectural responsibilities pertaining to resource identity and the lifecycle of a URI. They are intended as a starting point for discussion.

The AWWW defines the notion of information resources, which roughly correspond to web pages. But semantic web applications routinely use URIs to denote non-information resources: things such as people, proteins and cars. This paper will focus on the lifecycle of URIs that are used to denote non-information resource.

Note that the lifecyle of a URI is independent of the lifecycle of the resource that it denotes.  For example, a URI that denotes the Greek philosopher Plato may be minted long after Plato has died.  Similarly, one could mint a URI to denote one's first great-great-grandson even though such a child has not been conceived yet.

Words such as "MUST", "SHOULD" and "MAY" that are written in all capitals are used in the sense of RFC 2119 [Bradner 1997].

2  Roles in the URI lifecycle

Three roles seem critically important to the URI lifecycle:

3  Events in the URI lifecycle

Four common events in the URI lifecycle are illustrated in Figure 1 and described below.
Figure 1: URI lifecycle

Event 1: Owner mints a URI

Minting a URI is the act of establishing the association between the URI and the resource it denotes. A URI MUST only be minted by the URI's owner or delegate. Minting a URI from someone else's URI space is known as URI squatting.[Swick 2006]

URI owner responsibility 1: When minting a URI, the URI owner (or delegate) SHOULD publish a URI declaration [Booth2007] at the follow-your-nose (f-y-n) location, containing core assertions whose purpose is to constrain the set of permissible interpretations [Hayes 2004] for this URI. These core assertions SHOULD NOT be changed after their publication. 

Note that a single document can serve as a URI declaration for many URIs: the correspondence between URIs and URI declarations is many-to-one.

In essence, publication of a URI's declaration creates a social expectation that the URI will be used in a way that is consistent with its declaration. This is analogous to the social expectation created when a standards organization publishes a definition for a term such as "Foo Compliant". If a party later claims that their widget is "Foo Compliant", yet that widget is not actually consistent with the "Foo Compliant" definition, that party will be seen as violating this social expectation.

Other information for statement authors and consumers

Ideally, a URI declaration should also include other information (either directly or by reference) that will help statement authors and consumers make use of this URI, such as: Although this additional information may be included directly in a URI declaration, information that is likely to need updating independent of the core assertions would be better to include by reference, so that updating this additional information will not cause consumers to think that the core assertions had changed when they did not.

Cool URIs for the Semantic Web
[Sauermann 2009] describes best practices for minting URIs and hosting associated URI declarations (though it does not use the term "URI declaration").

Avoiding URI proliferation and near aliases

URI owner responsibility 2: A URI owner SHOULD NOT mint a new URI if a suitable alternate URI already exists.

The AWWW points out that URI aliases -- multiple URIs that denote the same resource -- impose a cost on users. However, the cost of dealing with multiple URIs that denote similar but not identical resources -- near aliases -- is even greater than the cost of direct aliases, because users are forced to understand the relationships and differences between the URI declarations. Therefore, even if a new URI is deemed necessary for administrative reasons, it would be better to write the new URI declaration in terms of an existing URI's declaration than to create a new, slightly different declaration. Properties such as owl:sameAs, owl:equivalentClass and owl:equivalentProperty [Dean 2004] may be useful in some circumstances, but because they require use (rather than mention [Anonymous 2009]) of the old URI they may not be desirable in the new URI's declaration.

We do not yet have well established conventions for indicating that one URI's declaration is equivalent to another URI's declaration, though properties such as s:isBroaderThan and s:isNarrowerThan [Booth 2009] which are designed to be asserted between URIs themselves (rather than between the resources they denote), are a step in this direction.

Event 2: Author uses the URI in a statement.

An RDF statement author has a choice about whether to use a given URI in a statement. The guiding principle is: 

Statement author responsibility 3: Use of a URI implies agreement with the core assertions of its URI declaration.
Hence, the statement author is responsible for ensuring that he/she does indeed agree with those assertions and must NOT use the URI if he/she does not agree. However, this is not intended to represent a legal commitment. Rather it is an identity commitment: it indicates that the set of interpretations for that statement is intended to be constrained by the core assertions of the URI's declaration, thus constraining the resource identity of the URI.

[Added 2011-02-11] See also W3C TAG issue-39 for additional discussion on the issue of "Meaning of URIs in RDF documents".

Transitive closure of the URI declaration

Determining the complete identity commitment would involve computing the transitive closure of the URI declaration's core assertions: for each URI used in the core assertions, obtain the core assertions of that URI's declaration, etc., recursively.

Statement author responsibility 4: The statement author making new assertions SHOULD compute the transitive closure of the URI declarations for all URIs used, to ensure that they are consistent with the author's new assertions.

There is a risk if the author does not compute this transitive closure: a logical contradiction may go undetected until a consumer attempts to process the statement.

Identity commitment and time

What if a URI's declaration is changed after a statement author has published a statement using that URI? Should consumers assume that the statement author agrees with the new core assertions? Clearly not, since, when the statement was written, the statement author had no way of looking into the future to know what those changes would be. Hence, a more precise way of stating the identity commitment that a statement author makes by using a URI would be something like:
Statement author responsibility 3a: Use of a URI in a statement implies agreement with the core assertions of the URI declaration that existed at the time the statement was written.
For this reason, RDF documents and URI declarations should indicate the date when they were written or updated. This will allow a consumer reading an RDF document later to determine whether any associated URI declarations are obsolete, and, if so, the consumer can make an informed choice about whether to seek out the original URI declaration or try using the latest.  Services such as the Internet Archive or Memento may be helpful in locating prior versions.

Event 3: Consumer reads a statement.

A consumer attempting to interpret an RDF graph wishes to know what resource each URI denotes.

Consumer responsibility 5: The set of possible interpretations for the graph SHOULD be constrained to those that are consistent with the merge of that graph and the transitive closure of the core assertions from all of that graph's URI declarations.

Consumer responsibility 6: In selecting these URI declarations, the consumer SHOULD use the URI declaration that is believed to be current for that URI (preferably from a local cache, for efficiency).

However, the consumer MAY select a different declaration. For example:

Event 4: URI is obsolete.

A URI can become obsolete if its owner deprecates it.  Ideally in this case the owner should arrange for the URI declaration to (directly or indirectly) indicate the preferred URI, as suggested above.

A URI can also become obsolete if its URI declaration has been compromised.

Statement author responsibility 7: Statement authors SHOULD NOT use a URI in new RDF statements if its URI declaration has been compromised such that use of the URI is likely to cause confusion among consumers.

This can happen, for example, if the URI declaration has been modified in violation of its published change policy or if it becomes inaccessible. In such cases, consumers may be confused about what URI declaration (or version) they should use to interpret the URI. If this occurs, a statement author should either find a different URI to use (preferably) or, if no suitable substitute is found, mint a new URI If no other URI, a new URI should be minted and its declaration should indicate that it deprecates the old URI.

Other events in the URI lifecycle

Other, less common events in the URI lifecycle may also be of interest.

Community expropriation of a URI.

In some cases, the resource identity of a URI may be so entrenched in the community that, even if its declaration is compromised or unavailable at the follow-your-nose location, statement authors still wish to use the URI according to a declaration that is well-known in the community. For example, the original URI owner may have gone bankrupt, and the domain name may have been sold to an unscrupulous company that proceeds to publish a new, misleading declaration for the URI.  Or, the original URI owner may never have published the declaration at the URI's follow-your-nose location.

In such cases, the community MAY temporarily expropriate that URI by continuing to write RDF statements based on the URI's original declaration, if: Furthermore:

Statement author responsibility 8: For each new use of an expropriated URI in an RDF document, the statement author SHOULD include an rdf:isDefinedBy statement that indicates the location of the new URI declaration.
[Issue: Is this the right requirement?]

Cases of community expropriation should be rare. The reason to make the expropriation temporary is to avoid the indefinite accumulation of URIs that require special processing.

4    Conclusions

In understanding resource identity -- the association of a URI to a particular resource -- it is helpful to look at the roles, events and responsibilities involved in the lifecycle of a URI.  This paper proposes a set of roles and responsibilities for establishing and determining a URI's resource identity through its lifecycle.

References

[Anonymous 2009] Anonymous, Use-mention distinction, Wikipedia, retrieved 23-Mar-2009, http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction

[Booth 2007] David Booth.  URI Declaration in Semantic Web Architecture.  25-Jul-2007.  http://dbooth.org/2007/uri-decl/

[Booth 2009a] David Booth.  Splitting Identities in Semantic Web Architecture, 26-Feb-2009, http://dbooth.org/2007/splitting/

[Booth 2009b] David Booth.  Denotation as a Two-Step Mapping in Semantic Web Architecture.  Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09), 11-Jul-2009.  http://dbooth.org/2009/denotation/

[Bradner 1997] S. Bradner. RFC2119 - Key words for use in RFCs to Indicate Requirement Levels, March 1997.  http://www.faqs.org/rfcs/rfc2119.html

[Dean 2004] Mike Dean, Guus Schreiber, editors.  OWL Web Ontology Language Reference.  W3C Recommendation 10-Feb-2004.  http://www.w3.org/TR/owl-ref/

[Hayes 2004] Patrick Hayes, editor.  RDF Semantics.  W3C Recommendation 10-Feb-2004.  http://www.w3.org/TR/rdf-mt/

[Jacobs 2004] Ian Jacobs, Norman Walsh, editors.  Architecture of the World Wide Web, Volume One.  W3C Recommendation 15-Dec-2004.  http://www.w3.org/TR/webarch/

[Miles 2009] Alistair Miles and Sean Bechhofer, editors. SKOS Simple Knowledge Organization System Reference, W3C Candidate Recommendation 17-Mar-2009, http://www.w3.org/TR/skos-reference

[Sauermann 2009] Leo Sauermann and Richard Cyganiak.  Cool URIs for the Semantic Web, W3C Working Draft 21-Mar-2009, http://www.w3.org/TR/cooluris

[Swick 2006] Ralph Swick. URI squatting; please don't. 10-Mar-2006, public email message archived at http://lists.w3.org/Archives/Public/public-swbp-wg/2006Mar/0036.html


Change log
11-Feb-2011: Added mention of W3C TAG issue-39.
16-Jul-2010: Separated out statement author responsibility #8 from the discussion of community expropriation.
13-Jul-2010: Added links to Internet Archive and Memento.
24-Jun-2010: Minor editorial improvements.
27-May-2010: Added named anchors
19-May-2009: Added mention of URI deprecation and made small editorial improvements.
14-May-2009: Editorial improvements.
23-Mar-2009: Initial version