- UNFINISHED DRAFT -
Splitting Identities in Semantic Web
Architecture
Views
expressed
herein
are
those
of
the author and do not necessarily
reflect those of HP.
Abstract
A URI declaration that
is precise enough for some applications may be considered ambiguous for
other applications. Another way to say this is that the resource
identity of a URI -- the resource denoted by that URI -- may be have
multiple, conflicting interpretations. This paper explains how an
ambiguous URI declaration can be related to more specific URI
declarations.
Introduction
Consider the following scenarios.
Scenario 1: AKT
A protein called AKT is discovered and described. A URI is minted
to denote it, and assertions are published using this URI.
Further exploration reveals that there are actually three
different substances, AKT1, AKT2 and AKT2, that were confused as being
a single substance. What should happen to the original URI for
AKT? What should happen to its declaration?
Scenario 2: David Booth of HP
Someone mints a URI to denote the David Booth who works at HP.
But it turns out that there are three people named "David Booth" at
HP. What should be done?
Scenario 3: Mark Baker and his home page
Mark insists that http://markbaker.ca/ denotes himself[@@ref]].
But the WebArch @@ add ref@@ says that a person is not an "information
resource" (in the WebArch sense).
Scenario 5: Dialects of Southern Zhuang
See John Cowan's post:
http://lists.w3.org/Archives/Public/www-tag/2008Mar/0038.html
What do these scenarios have in common? These scenarios all
involve the problem that something that at one point was thought to
be (or could be modeled as) a single entity is later (or in some
other context) viewed as two or more distinct things. This paper
discusses how these distinctions can be made and named using URIs, and
how the relationships between them can be indicated in the semantic web.
@@ TODO: Add something about Pluto being de-classified as a planet. @@
Scenario 1: AKT
This scenario was inspired from the
actual history of
AKT. However, the
names and other details are completely fictional.
Jann, a doctor studying gene expression, discovers a gene he calls AKT
that is involved in
cellular survival pathways. He mints a URI for this gene,
http://jann.example#akt, publishes a URI declaration for it at
http://jann.example, and publishes various RDF assertions
based on his observations, modeling AKT as an individual member of
class http://jann.example#Gene, which he had previously defined.
Others then use Jann's URI to publish
assertions about AKT. This data is very
helpful to Katie, who develops a semantic web application
that helps her screen patients. The application improves the
survival rates of thousands of people.
A few years after Jann's
initial discovery, another researcher, Luke, uses more sophisticated
equipment and techniques and discovers that Jann's research was flawed:
what Jann thought was a single gene, AKT, is actually three distinct
genes. Luke dubs them AKT1, AKT2 and AKT3 and mints corresponding
new URIs http://luke.example#akt1, http://luke.example#akt2 and
http://luke.example#akt3. Colloquially, researchers start using
"AKT" to refer more specifically to AKT1. Luke writes to Jann,
informing Jann of Luke's discovery, and suggests that Jann's URI
declaration for http://jann.example#akt be modified to add additional
assertions that would specifically identify AKT1.
It is worth noting that in this particular scenario, the problem that
Jann and Luke face could have been averted if they had initially
modeled these
genes as classes rather than as individuals, because then AKT1, AKT2
and AKT3 could simply have been subclasses of AKT. Indeed,
modeling things as classes does help avert -- or at least postpone --
this kind of issue, though it may create other issues. However,
the
point of this scenario is not to debate Jann's modeling decisions,
it is to illustrate how
this ambiguity issue can be addressed when it does arise. Thus,
we need to
assume that, for whatever reason, Jann did what he did, and Katie's
application then depended on Jann's definitions.
Q: Should Jann delete his URI declaration for
http://jann.example#akt, or change it to more specifically refer to
AKT1?
A: No. Doing so may disrupt working applications, such as
Katie's screening application. However, it would be helpful if
his URI declaration offered a pointer to information on AKT1, AKT2 and
AKT3.
Q: How can Luke indicate the relationship between AKT1, AKT2 and AKT3
and Jann's AKT?
A. As described below, he can write
something like:
"http://jann.example#akt"^^xsd:anyURI s:isBroaderThan
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .
@@@@
Broadening or narrowing a URI declaration
There are many ways that one
might indicate that one URI declaration is broader (less constraining)
than another, and experience will show which ways turn out to be most
convenient.
Properties
s:isBroaderThanDeclaration and s:isNarrowerThanDeclaration
These properties relate two decl:UriDeclarations
directly.
@prefix s: <http://t-d-b.org?http://dbooth.org/2007/splitting/#> .
@prefix decl: <http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .
s:isBroaderThanDeclaration a rdfs:Property ;
rdf:label "isBroaderThanDeclaration" ;
rdf:comment """s:isBroaderThanDeclaration indicates that the subject
URI declaration is broader than the object URI declaration,
which means that any resource that satisfies the constraints
expressed by the object URI declaration also satisfies
the constraints of the subject URI declaration.
See http://dbooth.org/2007/uri-decl/#precise-def-uri-decl .""" ;
rdfs:domain decl:UriDeclaration ;
rdfs:range decl:UriDeclaration .
s:isNarrowerThanDeclaration a rdfs:Property ;
rdf:label "isNarrowerThanDeclaration" ;
rdf:comment """isNarrowerThanDeclaration is the inverse of
s:isBroaderThanDeclaration.""" ;
rdfs:domain decl:UriDeclaration ;
rdfs:range decl:UriDeclaration .
Properties s:isBroaderThan and
s:isNarrowerThan
Since it may be inconvenient to represent decl:UriDeclarations
explicitly, it may be easier in practice to express broader/narrow
relations in terms of URIs. However, because a URI could
potentially have multiple URI declarations, statements that relate URI
declarations indirectly through URIs are weaker than those made
directly on URI declarations.
s:isBroaderThan a rdfs:Property ;
rdf:label "isBroaderThan" ;
rdf:comment """s:isBroaderThan indicates that the subject URI
has a URI declaration that is broader than some URI declaration
of the object URI. (See s:isBroaderThanDeclaration.)
This is a convenience property:
Since a URI could have more than one URI declaration,
this property makes weaker statements than
s:isBroaderThanDeclaration. """ ;
rdfs:domain xsd:anyURI ;
rdfs:range xsd:anyURI .
s:isNarrowerThan a rdfs:Property ;
rdf:label "narrows" ;
rdf:comment """isNarrowerThan is the inverse of s:isBroaderThan.""" ;
rdfs:domain xsd:anyURI ;
rdfs:range xsd:anyURI .
Example:
"http://jann.example#akt"^^xsd:anyURI s:isBroaderThan
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .
Properties
s:isBroaderThanResource
and
s:isNarrowerThanResource
Just as s:isBroaderThan indirectly relate URI declarations through
assertions on URIs, it would also be possible to indirectly relate URI
declarations through assertions on resources, such as:
@prefix s: <http://t-d-b.org?http://dbooth.org/2007/splitting/#> .
@prefix decl: <http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .
s:isBroaderThanResource a rdfs:Property ;
rdf:label "isBroaderThanResource" ;
rdf:comment """s:isBroaderThanResource indicates that the subject
has a URI whose URI declaration is broader than the URI declaration
of some URI that denotes the object.
""" ;
rdfs:domain rdf:Resource ;
rdfs:range rdf:Resource .
s:isNarrowerThanResource a rdfs:Property ;
rdf:label "isNarrowerThanResource" ;
rdf:comment """isNarrowerThanResource is the inverse of
s:isBroaderThanResource.""" ;
rdfs:domain rdf:Resource ;
rdfs:range rdf:Resource .
However, since many URIs can denote the same resource, these represent
much weaker assertions than s:isBroaderThan and s:isNarrowerThan.
Therefore, use of
s:isBroaderThanResource and s:isNarrowerThanResource is
not recommended. Properties s:isBroaderThan and
s:isNarrowerThan should be used instead.
Multiple URI declarations and URI collision
The Architecture of the World Wide Web defines URI collision
as "Using the same URI to directly identify different resources".
URI collision may occur if a URI has more than one URI
declaration. However, different declarations of a URI do not
necessarily cause URI collision, because the constraints they express
could be equivalent even though they are written differently.
How should multiple URI declarations for a URI be interpreted? If
one has a way to preferentially select one over another -- perhaps one
is more recent (thus implicitly obsoleting others), or perhaps the
evidence of the act of declaration is more compelling for one than
another, or perhaps one can determine which URI declaration was
intended when a statement author made a statement using the URI (see
slide 2 at http://dbooth.org/2008/irsw/slides.ppt
) -- then it probably makes the most sense to use that URI declaration
to interpret the meaning of the URI in an RDF statement.
Otherwise, one could think of the complete URI declaration for the URI
as consisting of the disjunction of the individual URI declarations.
URI substitution by reference
Consider this scenario:
SCENARIO 6: Gary publishes an ontology
of building materials, and includes a URI for concrete:
http://gary.example/lumber#concrete. Helen creates an ontology of
colors, which she publishes at http://helen.example/colors ,
using
http://gary.example/lumber#concrete to denote concrete. Ian finds
Helen's assertions and wishes to use them, but he notices that Gary's
ontology at http://gary.example/lumber erroneously asserts that "30.49
inches = 1 foot", and Ian's application cannot withstand that erroneous
assertion. Ian is aware that Pat has published an ontology at
http://pat.example/lumber that is equivalent to Gary's ontology except
that it does not contain this erroneous assertion.
Question: What should Ian do?
My answer: Ian should effectively rewrite Helen's assertions to
use http://pat.example/lumber#concrete instead of
http://gary.example/lumber#concrete throughout. He can either do
this by modifying a copy of Helen's assertions, or by reference, using
special expressions to indicate proper URI substitution in Helen's
graph.
If Ian does not have permission to copy and modify Helen's assertions,
how can he effectively modify Helen's assertions by
reference? Tim Berners-Lee and Dan Connolly have written
about Delta: an
ontology for
the distribution of differences between RDF graphs, and that may
also be useful in performing graph manipulations. Nathan Rixham
suggests that these graph operations on Helen's graph could be easily
done with SPARQL:
CONSTRUCT { <http://pat.example/lumber#concrete> ?p ?o } where {
<http://gary.example/lumber#concrete> ?p ?o
}
or with N3 diff:
@prefix diff: <http://www.w3.org/2004/delta#>.
{ <http://gary.example/lumber#concrete> ?p ?o }
diff:replacement
{ <http://pat.example/lumber#concrete> ?p ?o }.
httpRange-14 implications
A key argument in the httpRange-14
debate was that a person
is
not an information resource, and this distinction between
information resources and non-information resources is important to Web
architecture. But does this mean that Mark Baker's use of
his URI
http://markbaker.ca/ to directly
denote himself is a violation of Web
architecture, given that an HTTP GET on the URI yields a 200 Okay
response? (See Dan Connolly's very nice analysis of
this example.) No. The notion of s:isBroaderThan
suggests
that there is no
architectural need to view this "ambiguous" use of http://markbaker.ca/
as a violation of semantic web architecture, since it is conceptually
no different from the AKT scenario (above) in which
the "ambiguous" URI is good
enough for some applications but not for others.
Therefore, the use of a URI to
directly denote both an information resource and a non-information
resource should be viewed as a violation of good practice, but not a
violation of Web architecture.
Thus, in my view the TAG's httpRange-14 decision was correct -- an HTTP
200 response implies an information resource -- and the AWWW's view
that a person is not an information resource is probably correct, but
the AWWW notion that a URI denotes one resource may need more
explanation in the context of the semantic web, given the two-step
way in which the referent of a URI is determined in the semantic
web.
However, the httpRange-14
decision can be interpreted as saying
that if an HTTP GET on a URI u
yields a 200 response, then the resource denoted by u is an information resource.
Thus, a 200 response can be treated as an implicit URI declaration for u, and presumably that declaration
would include a core assertion to the effect:
<u> a awww:InformationResource .
Therefore, if the class of awww:InformationResources were declared to
be disjoint with the class of sumo:Human, then a statement such as:
<http://markbaker.ca> a sumo:Human .
would immediately cause a contradiction when http://markbaker.ca yields
a 200 response and its implicit URI declaration is asserted.
Thus, if this interpretation of the httpRange-14 decision is
maintained, and a 200 response is viewed as providing implicit core
assertions (rather than ancillary assertions), and an
awww:InformationResource is not a human, then Mark Baker's URI would
effectively be unusable in denoting himself as a person, even if such
use would not violate the Web architecture.
Change log
24-Nov-2012: Corrected scenario 6 SPARQL and n3 diff examples, thanks
to Roman Evstifeev.
6-Apr-2011: Added Nathan Rixham's suggestion to use SPARQL or n3 diff.
19-May-2009: Updated my email address.
26-Feb-2009: Updated the
AKT scenario to note that modeling AKT as a class (instead of an
instance) would have made it easier to relate AKT1, AKT2 and AKT3 to it
(as subclasses). Also clarified section #urisub, as part of it
appears to have been accidentally deleted when I was doing other edits.
16-Jan-2009: Added
s:isBroaderThanResource and s:isNarrowerThanResource.
3-Dec-2008: Corrected
domain and range of s:isBroaderThanDeclaration.
26-Nov-2008: Changed
namespace prefix decl: to s:. Changed from named graphs
rdfg:Graph to log:Formula, because I am using N3. Changed properties
"broadens" and "narrows" to "isBroaderThan" and "isNarrowerThan".
Added properties s:isBroaderThanDeclaration and
s:isNarrowerThanDeclaration. Deleted properties
hasBroaderResource and hasBroaderDenotedBy. Added
section on multiple URI declarations and URI collision.
4-Apr-2008: Added
explanation of how blank nodes could be used instead of decl:broadens,
and added more explanation of httpRange-14 implications.
27-Mar-2008: Added intro
scenarios back in. Added httpRange-14 implications section.
1-Mar-2008: Replaced draft
with placeholder text on graph editing by reference.
30-Jun-2007: Initial draft (unpublished).