- UNFINISHED DRAFT! -
Splitting Identities
@@ TODO: improve title? @@
David Booth, Ph.D.
HP Software
Comments are invited:
dbooth@hp.com
Latest version:
@@@@
This version: @@@@
Views
expressed herein are those of the author and do not necessarily
reflect those of HP.
Abstract
@@ To do @@
@@ WARNING: This is document is VERY unfinished. @@
Introduction
Consider the following scenarios.
Scenario 1: AKT
A protein called AKT is discovered and described. A URI is minted
to denote it, and assertions are published using this URI.
Further exploration reveals that there are actually three
different substances, AKT1, AKT2 and AKT2, that were confused as being
a single substance. What should happen to the original URI for
AKT? What should happen to its declaration?
Scenario 2: David Booth of HP
Someone mints a URI to denote the David Booth who works at HP.
But it turns out that there are three people named "David Booth" at
HP. What should be done?
Scenario 3: Mark Baker and his home page
Mark insists that http://markbaker.ca/ denotes himself[@@ref]].
But the WebArch @@ add ref@@ says that a person is not an "information
resource" (in the WebArch sense).
Scenario 5: Dialects of Southern Zhuang
See John Cowan's post:
http://lists.w3.org/Archives/Public/www-tag/2008Mar/0038.html
What do these scenarios have in common? These scenarios all
involve the problem that something that at one point was thought to
be (or or could be modeled as) a single entity is later (or in some
other context) viewed as two or more distinct things. This paper
discusses how these distinctions can be made and named using URIs, and
how the relationships between them can be indicated in the Semantic Web.
@@@@
Scenario 1: AKT
[Note: Although this scenario is adapted from the actual history of
AKT, the names Jann, Luke, and Katie are fictitious.]
Jann, a doctor studying gene expression, discovers a gene he calls AKT that is involved in
cellular survival pathways. He mints a URI for this gene,
http://jann.example#akt, publishes a URI declaration for it at
http://jann.example, and publishes various RDF assertions
based on his observations. Others then use Jann's URI to publish
assertions about AKT. This data is very
helpful to Katie, who develops a Semantic Web application
that helps her screen patients. The application improves the
survival rates of thousands of people. A few years after Jann's
initial discovery, another researcher, Luke, uses more sophisticated
equipment and techniques and discovers that Jann's research was flawed:
what Jann thought was a single gene, AKT, is actually three distinct
genes. Luke dubs them AKT1, AKT2 and AKT3 and mints corresponding
new URIs http://luke.example#akt1, http://luke.example#akt2 and
http://luke.example#akt3. Colloquially, researchers start using
"AKT" to refer more specifically to AKT1. Luke writes to Jann,
informing Jann of Luke's discovery, and suggests that Jann's URI
declaration for http://jann.example#akt be modified to add additional
assertions that would specifically identify AKT1.
Q: Should Jann delete his URI declaration for
http://jann.example#akt, or change it to more specifically refer to
AKT1?
A: No. Doing so may disrupt working applications, such as
Katie's screening application. However, it would be helpful if
his URI declaration offered a pointer to information on AKT1, AKT2 and
AKT3.
Q: How can Luke indicate the relationship between AKT1, AKT2 and AKT3
and Jann's AKT?
A. He can write something like:
"http://jann.example#akt"^^xsd:anyURI decl:broadens
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .
@@@@
Broadening or narrowing a URI declaration
The decl:broadens relationship
indicates that one URI's declaration is
broader (less constraining) than another. The decl:narrows
relationship is the inverse. They are analogous to skos:narrower
and skos:broader. In particular, if '"uB"^^xsd:anyURI decl:broadens "uA"^^xsd:anyURI .', then any real world
interpretation of <uA>'s
referent that satisfies the core assertions of uA's URI declaration will also
satisfy the core assertions of <uB>'s
URI declaration. Note that this relationship is between
URIs rather than being directly between the resources that those URIs
denote. This is to avoid implying acceptance of uB's core assertions.
decl:broadens a rdf:Property ;
rdf:label "is broader than" ;
rdf:comment "@@" ;
rdfs:domain xsd:anyURI ;
rdfs:range xsd:anyURI .
Example:
"http://jann.example#akt"^^xsd:anyURI decl:broadens
"http://luke.example#akt1"^^xsd:anyURI .
Another way to do this would be to use blank nodes:
_:akt uri:hasURI "http://jann.example#akt"^^xsd:anyURI .
_:akt1 uri:hasURI "http://luke.example#akt1"^^xsd:anyURI .
_:akt2 uri:hasURI "http://luke.example#akt2"^^xsd:anyURI .
_:akt3 uri:hasURI "http://luke.example#akt3"^^xsd:anyURI .
_:akt1 decl:hasBroaderResource _:akt .
_:akt2 decl:hasBroaderResource _:akt .
_:akt3 decl:hasBroaderResource _:akt .
where the uri:hasURI relationship indicates that the subject resource
is denoted by the object URI:
uri:hasURI a rdf:Property ;
rdf:label "hasURI" ;
rdf:comment """The subject resource is denoted by the object URI. It is
basically the same as log:uri, but has a range of xsd:anyURI,
so that a simple assertion like {r hasURI u} will cause u to be
recognized as type xsd:anyURI without having to assert it explicitly.""" ;
rdfs:subPropertyOf log:uri ;
# rdfs:domain rdfs:Resource ;
rdfs:range xsd:anyURI .
and decl:hasBroaderResource indicates that the real world
interpretation of the object resource is broader than the real world
interpretation of the subject resource.
These two approaches could also be combined in a hybrid relationship
decl:hasBroaderDenotedBy (or inversely decl:hasNarrowerDenotedBy):
decl:hasBroaderDenotedBy a rdf:Property ;
rdf:label "hasBroaderDenotedBy" ;
rdf:comment "@@" ;
# rdfs:domain rdfs:Resource ;
rdfs:range xsd:anyURI .
This would allow Luke to express the relationship more concisely:
<http://luke.example#akt1>
decl:hasBroaderDenotedBy
"http://jann.example#akt"^^xsd:anyURI .
This would indicate that the real world interpretation of the resource
denoted by
the object URI (http://jann.example#akt) is broader than the real world
interpretation of the
subject resource (which is denoted by http://luke.example#akt1).
@@
URI substitution by reference
Consider this scenario:
SCENARIO 6: Helen is lazy and publishes
her color assertions at http://helen.example/colors anyway, using
http://gary.example/lumber#concrete to denote concrete. Ian finds
Helen's assertions and wishes to use them, but he notices that Gary's
ontology at http://gary.example/lumber erroneously asserts that "30.49
inches = 1 foot", and Ian's application cannot withstand that erroneous
assertion. Ian is aware that Pat has published an ontology at
http://pat.example/lumber that is equivalent to Gary's ontology except
that it does not contain this erroneous assertion.
Question: What should Ian do?
My answer: Ian should effectively rewrite Helen's assertions to
use http://pat.example/lumber#concrete instead of
http://gary.example/lumber#concrete throughout. He can either do
this by modifying a copy of Helen's assertions, or by reference, using
special expressions to indicate proper URI substitution in Helen's
graph.
If Ian does not have permission to copy and modify Helen's assertions,
how can he effectively modify Helen's assertions by
reference? One way would be to have a property that, given
an information resource for an RDF graph, and a list consisting of <oldUri, newUri> pairs,
changes each oldUri to newUri throughout the
graph. Ian could then write something like the following:
@prefix dbooth: <http://dbooth.example/splitting#> .
# . . .
# Include Helen's assertions, but replace
# <http://gary.example/lumber#concrete>
# with <http://pat.example/lumber#concrete> .
<http://helen.example/colors> dbooth:includedReplacing
{ "http://gary.example/lumber#concrete"^^xsd:anyURI
"http://pat.example/lumber#concrete"^^xsd:anyURI ) .
# . . .
[Note: There may be other
existing/better ways to do this. Is there a SPARQL way to do
it? Please let me know if you know
of any. Thanks! -- DBooth]
httpRange-14 implications
A key argument in the httpRange-14@@ref@@ debate was that a person is
not an information resource@@ref@@, and this distinction between
information resources and non-information resources is important to Web
architecture. But does this mean that Mark Baker's use of
his URI
http://markbaker.ca/ to directly denote himself is a violation of Web
architecture, given that an HTTP GET on the URI yields a 200 Okay
response? (See Dan Connolly's very nice analysis of this example
in
@@ref@@.) No. The notion of decl:broadens suggests
that there is no
architectural need to view this "ambiguous" use of http://markbaker.ca/
as a violation of Semantic Web architecture, since it is conceptually
no different from the AKT scenario in which the "ambiguous" URI is good
enough for some applications but not for others.
Therefore, the use of a URI to
directly denote both an information resource and a non-information
resource should be viewed as a violation of good practice, but not a
violation of Web architecture.
Thus, in my view the TAG's httpRange-14 decision was correct -- an HTTP
200 response implies an information resource -- and the AWWW's view
that a person is not an information resource is probably correct, but
the AWWW notion that a URI denotes one resource may need more
explanation in the context of the Semantic Web, given the two-step
way@@ref@@ in which the referent of a URI is determined in the
Semantic
Web.
However, the httpRange-14 decision@@ref@@ can be interpreted as saying
that if an HTTP GET on a URI u
yields a 200 response, then the resource denoted by u is an information resource.
Thus, a 200 response can be treated as an implicit URI declaration for u, and presumably that declaration
would include a core assertion to the effect:
<u> a awww:InformationResource .
Therefore, if the class of awww:InformationResources were declared to
be disjoint with the class of sumo:Human, then a statement such as:
<http://markbaker.ca> a sumo:Human .
would immediately cause a contradiction when http://markbaker.ca yields
a 200 response and its implicit URI declaration is asserted.
Thus, if this interpretation of the httpRange-14 decision is
maintained, and a 200 response is viewed as providing implicit core
assertions (rather than ancillary assertions), and an
awww:InformationResource is not a human, then Mark Baker's URI would
effectively be unusable in denoting himself as a person, even if such
use would not violate the Web architecture.
4-Apr-2008: Added
explanation of how blank nodes could be used instead of decl:broadens,
and added more explanation of httpRange-14 implications.
27-Mar-2008: Added intro
scenarios back in. Added httpRange-14 implications section.
1-Mar-2008: Replaced draft
with placeholder text on graph editing by reference.
30-Jun-2007: Initial draft (unpublished).