The URI Lifecycle in Semantic Web Architecture
David Booth
Cleveland Clinic (contractor)
david@dbooth.org
Preprint from Twenty-first
International Joint Conference on Artificial Intelligence (IJCAI-09),
11-Jul-2009.
Latest version: http://dbooth.org/2009/lifecycle/
Views
expressed
herein
are
those of the author and do not necessarily
reflect those of Cleveland Clinic.
Abstract.
Various
parties
are
typically involved in the creation and use of a
URI, including the URI owner, an RDF statement author, and a consumer
of that RDF statement.
What principles should these parties follow, to ensure that a
consistent resource identity is
established and (to the extent possible)
maintained throughout that URI's lifetime?
This paper proposes a set of roles and responsibilities for
establishing and determining
a URI's resource identity through its lifecycle.
Key words: Semantic Web, RDF,
identity, URI declaration, URI definition
1 Introduction
Semantic web applications are based both on formal logic and web
architecture. The Architecture
of the World Wide Web (AWWW) [Jacobs
2004]
describes some of the most important architectural principles
underlying web applications, but additional architectural principles
are needed that have not yet been well established
for semantic web
applications. Some of these pertain to the creation of URIs
and the association of a URI to a resource, i.e., the URI's resource
identity.
This paper proposes some architectural responsibilities pertaining
to resource identity and the lifecycle of a URI.
They are intended as a starting point for discussion.
The AWWW defines the notion of information
resources, which
roughly correspond to web pages.
But semantic web applications routinely use URIs to denote
non-information
resources: things such as people, proteins and cars.
This paper will focus on the lifecycle of URIs
that are used to denote non-information resource.
Note that the lifecyle of a URI is independent of the lifecycle of the
resource that it denotes. For example, a URI that denotes the
Greek philosopher Plato may be minted long after Plato has died.
Similarly, one could mint a URI to denote one's first
great-great-grandson even though such a child has not been conceived
yet.
Words such as "MUST", "SHOULD" and "MAY" that are written in all
capitals
are used in the sense of RFC 2119 [Bradner 1997].
2 Roles in the URI lifecycle
Three roles seem critically important to the URI lifecycle:
- URI owner. This is
the person or social entity that has the authority to establish an
association between a URI and a resource, as defined in AWWW.
Normally
it
is
the owner of the domain from which the URI is minted,
however,
the
owner
may delegate minting authority for all or portions
of a URI space.
- Statement
author.
This is a person or agent that decides to use the URI in an RDF
statement to denote a resource.
- Consumer.
This
is
a
person
or application that reads an RDF statement and wishes to know
what resource the URI was intended to denote.
3 Events in the URI lifecycle
Four common events in the URI lifecycle are illustrated
in Figure 1 and described below.
Event 1: Owner mints a URI
Minting a URI is the act of establishing
the association between the URI and the resource it denotes.
A URI MUST only be minted by the URI's owner or
delegate.
Minting a URI from someone else's URI space
is known as URI
squatting.[Swick
2006]
URI owner
responsibility 1: When
minting a URI, the URI owner (or delegate) SHOULD publish a URI
declaration [Booth2007] at the follow-your-nose
(f-y-n) location, containing
core assertions whose purpose is to
constrain the set of permissible interpretations [Hayes 2004] for this URI.
These core assertions SHOULD NOT be changed after their
publication.
Note that a single document can serve as a URI
declaration for many
URIs: the correspondence between URIs and URI declarations is
many-to-one.
In essence, publication of a URI's declaration creates a social
expectation
that the URI will be used in a way that is consistent with its
declaration.
This is analogous to the social expectation created when
a standards organization publishes a definition for a term such as "Foo
Compliant".
If a party later claims that their widget is "Foo Compliant", yet that
widget is not actually consistent with the "Foo Compliant" definition,
that party will be seen as violating this social expectation.
Other information for statement authors and
consumers
Ideally, a URI declaration should also include
other information (either directly or by reference)
that will help statement authors and consumers make use of this URI,
such as:
- Date written, author, copyright, revision history and other
metadata.
- The relationship between this URI declaration and other URI
declarations. For example, this URI declaration may be broader
or narrower
than another URI declaration: permitting a URI's set of interpretations
that is a superset or subset of the other URI's set of possible
interpretations,
as described in Splitting
Identities in Semantic Web Architecture [Booth 2009a]. Note
that, as explained in Denotation
as
a
Two-Step
Mapping in Semantic Web
Architecture [Booth 2009b], a relationship between two URI
declarations is not the same as a relationship between two URIs, nor is
it the same as a relationship between the resources denoted by those
URIs.
- The relationship between this URI and other URIs. For
example, this URI may deprecate another URI or be deprecated by another
URI.
- Change policy for the core assertions. Some ontologies, such as SKOS [Miles 2009],
have intentionally chosen to permit the definitions of their terms to
be changed without minting new URIs for them. Although such a policy
could
be disastrous for some applications, for others it may be the most cost
effective.
Although changing the core assertions may change the set of permissible
interpretations for a URI -- thus changing the URI's resource identity
-- such
changes are okay if the change policy has set expectations
appropriately.
- Pointers to ancillary
assertions that are believed
to be compatible with this URI declaration.
- Pointers to related ontologies or data.
Although this additional information may be included directly in a URI
declaration, information that is likely to need updating independent of
the core assertions would be better to include by reference, so that
updating this additional information will not cause consumers to think
that the core assertions had changed when they did not.
Cool URIs for the Semantic Web [Sauermann 2009] describes best
practices for
minting
URIs and hosting associated URI declarations (though it does not use
the term
"URI declaration").
Avoiding URI proliferation and near aliases
URI owner
responsibility 2: A URI
owner SHOULD NOT mint a new URI if a suitable
alternate URI already exists.
The AWWW points out that URI aliases
-- multiple URIs that
denote the same resource -- impose a cost on users.
However, the cost of dealing with multiple URIs that denote similar
but not identical resources -- near aliases -- is even greater than the
cost of direct aliases, because users are forced
to understand the relationships and differences between the URI
declarations.
Therefore, even if a new URI is deemed necessary for administrative
reasons,
it would be better to write the new URI declaration in terms of an
existing URI's declaration
than to create a new, slightly different declaration.
Properties such as owl:sameAs,
owl:equivalentClass and owl:equivalentProperty [Dean 2004]
may be useful in some circumstances,
but because they require use (rather than mention
[Anonymous 2009])
of the old URI they may not be desirable in the new URI's declaration.
We do not yet have well established conventions for indicating
that one URI's declaration is equivalent to another URI's declaration,
though
properties such as s:isBroaderThan
and s:isNarrowerThan [Booth 2009]
which
are designed to be asserted between URIs themselves (rather than
between the
resources they denote),
are a step in this direction.
Event 2: Author uses the URI in a statement.
An RDF statement author has a choice about whether to use a given URI
in a statement.
The guiding principle is:
Statement
author
responsibility
3: Use of a URI implies agreement with the
core assertions
of its URI declaration.
Hence, the statement author is responsible for
ensuring that he/she does indeed agree with those assertions
and must NOT use the URI if he/she does not agree.
However, this
is not intended to represent a legal commitment. Rather it is an identity
commitment:
it indicates that
the set of interpretations for that statement is intended to be
constrained by the core assertions of the URI's declaration,
thus constraining the resource identity of the URI.
[Added 2011-02-11] See also W3C TAG
issue-39 for additional discussion on the issue of "Meaning of URIs
in RDF documents".
Transitive closure of the URI declaration
Determining the complete identity commitment would involve computing
the transitive closure of the URI declaration's
core assertions: for each URI used in the core assertions, obtain the
core assertions
of that URI's declaration, etc., recursively.
Statement author
responsibility 4: The
statement
author
making new assertions SHOULD compute the transitive
closure of the URI declarations for all URIs used, to ensure that they
are consistent with the author's new assertions.
There is a risk if the author does
not compute this transitive closure:
a logical contradiction may go undetected until a consumer
attempts to process the statement.
Identity commitment and time
What if a URI's declaration is changed after a statement author has
published
a statement using that URI? Should consumers assume that the statement
author agrees with the new core assertions?
Clearly not,
since, when the statement was written,
the statement author had no way of looking into the future to know what
those changes would be.
Hence, a more precise way of stating the identity commitment that a
statement
author makes by using a URI would be something like:
Statement author
responsibility 3a:
Use of a URI in a statement implies agreement with the core
assertions of the URI declaration that existed at the time the
statement was written.
For this reason, RDF documents and URI declarations should indicate the
date when they were written or updated.
This will allow a consumer
reading an RDF document later to determine whether any associated URI
declarations
are obsolete, and, if so, the consumer can make an informed
choice about whether to seek out the original URI declaration or try
using the latest. Services such as the Internet Archive or Memento may be
helpful in locating prior versions.
Event 3: Consumer reads a statement.
A consumer attempting to interpret an RDF graph
wishes to know what resource each URI denotes.
Consumer responsibility
5: The set
of possible interpretations for the graph SHOULD
be constrained to those that are consistent with the merge of that
graph and the transitive closure of the core assertions from all of
that graph's URI declarations.
Consumer responsibility
6:
In selecting these URI declarations,
the consumer SHOULD use the URI declaration that is believed to be
current for that URI (preferably from a local cache, for efficiency).
However, the consumer MAY select a different declaration.
For example:
- If the consumer wishes to be assured of most accurately following
the statement author's intent, then the consumer might select
the declaration that existed at the time the statement was made.
- If the consumer believes that the current declaration has been
compromised (for example, by a management or ownership change of the
URI domain --
see community expropriation
of a URI)
then the consumer might select an alternate declaration.
Event 4: URI is obsolete.
A URI can become obsolete if its owner deprecates it. Ideally in
this case the owner should arrange for the URI declaration to (directly
or indirectly) indicate the preferred URI, as suggested above.
A URI can also become obsolete if its URI declaration has been
compromised.
Statement author
responsibility 7: Statement
authors
SHOULD
NOT use a URI in new RDF statements if its URI
declaration has been compromised such that use of the URI is likely to
cause confusion among consumers.
This can happen, for example,
if the URI declaration has been modified in violation of its published
change policy or if it becomes inaccessible.
In such cases, consumers may be confused about what URI declaration (or
version)
they should use to interpret the URI.
If this occurs, a statement author should either find a different URI
to use (preferably)
or, if no suitable substitute is found, mint a new URI
If no other URI, a new URI should be minted and its declaration should
indicate
that it deprecates the old URI.
Other events in the URI lifecycle
Other, less common events in the URI lifecycle may also be of interest.
Community expropriation of a URI.
In some cases, the resource identity of a URI may be so entrenched
in
the community that, even if its declaration is compromised or
unavailable at the follow-your-nose
location, statement
authors
still wish to use the URI according to a declaration that is well-known
in the community.
For example, the original URI owner may have gone bankrupt,
and the domain name may have been sold to an unscrupulous company that
proceeds to publish a new, misleading declaration for the URI.
Or, the original URI owner may never have published the declaration at
the URI's follow-your-nose location.
In such cases, the community MAY temporarily expropriate that URI
by continuing to write RDF statements based on the URI's original
declaration, if:
- the cost of changing to new URI would be unreasonably high;
- the original URI declaration is widely known and copies are
easily located by consumers;
- sufficient community discussion has taken place to make this
decision;
- the decision is widely publicized and documented; and
- a new URI is minted, based on the original URI declaration, with
a URI declaration that indicates that the new URI deprecates the old
URI, specifies
a cut-off date by which all new RDF statements SHOULD use the new URI,
and provides a link to the community discussion and decision.
Furthermore:
Statement author
responsibility 8:
For each new use of an expropriated URI in an RDF document, the
statement author SHOULD include
an rdf:isDefinedBy statement that indicates the location of the new
URI declaration.
[Issue: Is this the right requirement?]
Cases of community expropriation should be rare. The reason to make
the expropriation
temporary is
to avoid the indefinite accumulation of URIs that require special
processing.
4 Conclusions
In understanding resource identity -- the association of a URI to a
particular resource -- it is helpful to look at the roles, events and
responsibilities involved
in the lifecycle of a URI. This paper proposes a set of roles and
responsibilities for establishing and determining a URI's resource
identity through its lifecycle.
References
[Anonymous 2009] Anonymous, Use-mention
distinction, Wikipedia, retrieved
23-Mar-2009, http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction
[Booth 2007]
David Booth. URI Declaration
in Semantic Web Architecture. 25-Jul-2007. http://dbooth.org/2007/uri-decl/
[Booth 2009a] David
Booth. Splitting Identities in
Semantic Web Architecture,
26-Feb-2009, http://dbooth.org/2007/splitting/
[Booth 2009b] David Booth. Denotation
as
a
Two-Step
Mapping in Semantic Web Architecture.
Twenty-first International Joint Conference on Artificial Intelligence
(IJCAI-09), 11-Jul-2009. http://dbooth.org/2009/denotation/
[Bradner 1997] S. Bradner. RFC2119 -
Key words for use in RFCs to Indicate
Requirement Levels, March 1997. http://www.faqs.org/rfcs/rfc2119.html
[Dean 2004] Mike Dean, Guus Schreiber, editors. OWL Web Ontology Language Reference.
W3C
Recommendation
10-Feb-2004.
http://www.w3.org/TR/owl-ref/
[Hayes 2004] Patrick Hayes, editor. RDF Semantics. W3C
Recommendation 10-Feb-2004. http://www.w3.org/TR/rdf-mt/
[Jacobs 2004] Ian Jacobs, Norman Walsh, editors. Architecture of
the World Wide Web, Volume One. W3C Recommendation
15-Dec-2004. http://www.w3.org/TR/webarch/
[Miles 2009] Alistair Miles and Sean Bechhofer, editors. SKOS Simple Knowledge Organization System
Reference, W3C Candidate Recommendation 17-Mar-2009, http://www.w3.org/TR/skos-reference
[Sauermann 2009] Leo Sauermann and Richard Cyganiak. Cool URIs for the Semantic Web, W3C
Working Draft 21-Mar-2009, http://www.w3.org/TR/cooluris
[Swick 2006] Ralph Swick. URI
squatting; please don't. 10-Mar-2006, public email
message archived at http://lists.w3.org/Archives/Public/public-swbp-wg/2006Mar/0036.html
Change log
11-Feb-2011: Added mention of W3C TAG issue-39.
16-Jul-2010: Separated out statement author responsibility #8 from the
discussion of community expropriation.
13-Jul-2010: Added links to Internet Archive and Memento.
24-Jun-2010: Minor editorial improvements.
27-May-2010: Added named anchors
19-May-2009: Added mention of URI deprecation and made small editorial
improvements.
14-May-2009: Editorial improvements.
23-Mar-2009:
Initial
version