- UNFINISHED DRAFT -

Splitting Identities in Semantic Web Architecture

David Booth, Ph.D.
HP Software
Comments are invited: david@dbooth.org

Latest version: http://dbooth.org/2007/splitting/

Views expressed herein are those of the author and do not necessarily reflect those of HP.

Abstract

A URI declaration that is precise enough for some applications may be considered ambiguous for other applications.  Another way to say this is that the resource identity of a URI -- the resource denoted by that URI -- may be have multiple, conflicting interpretations.  This paper explains how an ambiguous URI declaration can be related to more specific URI declarations.

Introduction

Consider the following scenarios.

Scenario 1: AKT

A protein called AKT is discovered and described.  A URI is minted to denote it, and assertions are published using this URI.  Further exploration reveals that there are actually three different substances, AKT1, AKT2 and AKT2, that were confused as being a single substance.  What should happen to the original URI for AKT?  What should happen to its declaration?

Scenario 2: David Booth of HP

Someone mints a URI to denote the David Booth who works at HP.  But it turns out that there are three people named "David Booth" at HP.  What should be done?

Scenario 3: Mark Baker and his home page

Mark insists that http://markbaker.ca/ denotes himself[@@ref]].  But the WebArch @@ add ref@@ says that a person is not an "information resource" (in the WebArch sense).

Scenario 5: Dialects of Southern Zhuang

See John Cowan's post: http://lists.w3.org/Archives/Public/www-tag/2008Mar/0038.html


What do these scenarios have in common?  These scenarios all involve the problem that something that at one point was thought to be (or could be modeled as) a single entity is later (or in some other context) viewed as two or more distinct things.  This paper discusses how these distinctions can be made and named using URIs, and how the relationships between them can be indicated in the semantic web.

@@ TODO: Add something about Pluto being de-classified as a planet. @@

Scenario 1: AKT

This scenario was inspired from the actual history of AKT.  However, the names and other details are completely fictional.

Jann, a doctor studying gene expression, discovers a gene he calls AKT that is involved in cellular survival pathways.  He mints a URI for this gene, http://jann.example#akt, publishes a URI declaration for it at http://jann.example, and publishes various RDF assertions based on his observations, modeling AKT as an individual member of class http://jann.example#Gene, which he had previously defined.  Others then use Jann's URI to publish assertions about AKT.  This data is very helpful to Katie, who develops a semantic web application that helps her screen patients.  The application improves the survival rates of thousands of people. 

A few years after Jann's initial discovery, another researcher, Luke, uses more sophisticated equipment and techniques and discovers that Jann's research was flawed: what Jann thought was a single gene, AKT, is actually three distinct genes.  Luke dubs them AKT1, AKT2 and AKT3 and mints corresponding new URIs http://luke.example#akt1, http://luke.example#akt2 and http://luke.example#akt3.  Colloquially, researchers start using "AKT" to refer more specifically to AKT1.  Luke writes to Jann, informing Jann of Luke's discovery, and suggests that Jann's URI declaration for http://jann.example#akt be modified to add additional assertions that would specifically identify AKT1.

It is worth noting that in this particular scenario, the problem that Jann and Luke face could have been averted if they had initially modeled these genes as classes rather than as individuals, because then AKT1, AKT2 and AKT3 could simply have been subclasses of AKT.  Indeed, modeling things as classes does help avert -- or at least postpone -- this kind of issue, though it may create other issues.  However, the point of this scenario is not to debate Jann's modeling decisions, it is to illustrate how this ambiguity issue can be addressed when it does arise.  Thus, we need to assume that, for whatever reason, Jann did what he did, and Katie's application then depended on Jann's definitions.

Q: Should Jann delete his URI declaration for http://jann.example#akt, or change it to more specifically refer to AKT1?

A:  No.  Doing so may disrupt working applications, such as Katie's screening application.  However, it would be helpful if his URI declaration offered a pointer to information on AKT1, AKT2 and AKT3.

Q: How can Luke indicate the relationship between AKT1, AKT2 and AKT3 and Jann's AKT?

A. As described below, he can write something like:
"http://jann.example#akt"^^xsd:anyURI s:isBroaderThan 
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .


 @@@@

Broadening or narrowing a URI declaration

There are many ways that one might indicate that one URI declaration is broader (less constraining) than another, and experience will show which ways turn out to be most convenient. 

Properties s:isBroaderThanDeclaration and s:isNarrowerThanDeclaration

These properties relate two decl:UriDeclarations directly.
@prefix s: <http://t-d-b.org?http://dbooth.org/2007/splitting/#> .
@prefix decl: <http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .

s:isBroaderThanDeclaration a rdfs:Property ;
rdf:label "isBroaderThanDeclaration" ;
rdf:comment """s:isBroaderThanDeclaration indicates that the subject
URI declaration is broader than the object URI declaration,
which means that any resource that satisfies the constraints
expressed by the object URI declaration also satisfies
the constraints of the subject URI declaration.
See http://dbooth.org/2007/uri-decl/#precise-def-uri-decl .""" ;
rdfs:domain decl:UriDeclaration ;
rdfs:range decl:UriDeclaration .

s:isNarrowerThanDeclaration a rdfs:Property ;
rdf:label "isNarrowerThanDeclaration" ;
rdf:comment """isNarrowerThanDeclaration is the inverse of
s:isBroaderThanDeclaration.""" ;
rdfs:domain decl:UriDeclaration ;
rdfs:range decl:UriDeclaration .

Properties s:isBroaderThan and s:isNarrowerThan

Since it may be inconvenient to represent decl:UriDeclarations explicitly, it may be easier in practice to express broader/narrow relations in terms of URIs.  However, because a URI could potentially have multiple URI declarations, statements that relate URI declarations indirectly through URIs are weaker than those made directly on URI declarations.
s:isBroaderThan a rdfs:Property ;
rdf:label "isBroaderThan" ;
rdf:comment """s:isBroaderThan indicates that the subject URI
has a URI declaration that is broader than some URI declaration
of the object URI. (See s:isBroaderThanDeclaration.)
This is a convenience property:
Since a URI could have more than one URI declaration,
this property makes weaker statements than
s:isBroaderThanDeclaration. """ ;
rdfs:domain xsd:anyURI ;
rdfs:range xsd:anyURI .

s:isNarrowerThan a rdfs:Property ;
rdf:label "narrows" ;
rdf:comment """isNarrowerThan is the inverse of s:isBroaderThan.""" ;
rdfs:domain xsd:anyURI ;
rdfs:range xsd:anyURI .
Example:
"http://jann.example#akt"^^xsd:anyURI s:isBroaderThan
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .

Properties s:isBroaderThanResource and s:isNarrowerThanResource

Just as s:isBroaderThan indirectly relate URI declarations through assertions on URIs, it would also be possible to indirectly relate URI declarations through assertions on resources, such as:
@prefix s: <http://t-d-b.org?http://dbooth.org/2007/splitting/#> .
@prefix decl: <http://t-d-b.org?http://dbooth.org/2007/uri-decl/#> .

s:isBroaderThanResource a rdfs:Property ;
rdf:label "isBroaderThanResource" ;
rdf:comment """s:isBroaderThanResource indicates that the subject
has a URI whose URI declaration is broader than the URI declaration
of some URI that denotes the object.
""" ;
rdfs:domain rdf:Resource ;
rdfs:range rdf:Resource .

s:isNarrowerThanResource a rdfs:Property ;
rdf:label "isNarrowerThanResource" ;
rdf:comment """isNarrowerThanResource is the inverse of
s:isBroaderThanResource.""" ;
rdfs:domain rdf:Resource ;
rdfs:range rdf:Resource .
However, since many URIs can denote the same resource, these represent much weaker assertions than s:isBroaderThan and s:isNarrowerThan.  Therefore, use of s:isBroaderThanResource and s:isNarrowerThanResource is not recommended.  Properties s:isBroaderThan and s:isNarrowerThan should be used instead.

Multiple URI declarations and URI collision

The Architecture of the World Wide Web defines URI collision as "Using the same URI to directly identify different resources".  URI collision may occur if a URI has more than one URI declaration.  However, different declarations of a URI do not necessarily cause URI collision, because the constraints they express could be equivalent even though they are written differently.

How should multiple URI declarations for a URI be interpreted?  If one has a way to preferentially select one over another -- perhaps one is more recent (thus implicitly obsoleting others), or perhaps the evidence of the act of declaration is more compelling for one than another, or perhaps one can determine which URI declaration was intended when a statement author made a statement using the URI (see slide 2 at http://dbooth.org/2008/irsw/slides.ppt ) -- then it probably makes the most sense to use that URI declaration to interpret the meaning of the URI in an RDF statement.  Otherwise, one could think of the complete URI declaration for the URI as consisting of the disjunction of the individual URI declarations.

URI substitution by reference

Consider this scenario:

SCENARIO 6: Gary publishes an ontology of building materials, and includes a URI for concrete: http://gary.example/lumber#concrete.  Helen creates an ontology of colors, which she publishes at http://helen.example/colors , using http://gary.example/lumber#concrete to denote concrete.  Ian finds Helen's assertions and wishes to use them, but he notices that Gary's ontology at http://gary.example/lumber erroneously asserts that "30.49 inches = 1 foot", and Ian's application cannot withstand that erroneous assertion.  Ian is aware that Pat has published an ontology at http://pat.example/lumber that is equivalent to Gary's ontology except that it does not contain this erroneous assertion. 

Question: What should Ian do?

My answer: Ian should effectively rewrite Helen's assertions to use  http://pat.example/lumber#concrete instead of http://gary.example/lumber#concrete throughout.  He can either do this by modifying a copy of Helen's assertions, or by reference, using special expressions to indicate proper URI substitution in Helen's graph.

If Ian does not have permission to copy and modify Helen's assertions, how can he effectively modify Helen's assertions by reference?   Tim Berners-Lee and Dan Connolly have written about Delta: an ontology for the distribution of differences between RDF graphs, and that may also be useful in performing graph manipulations.  Nathan Rixham suggests that these graph operations on Helen's graph could be easily done with SPARQL:
 CONSTRUCT { <http://pat.example/lumber#concrete> ?p ?o } where {
<http://gary.example/lumber#concrete> ?p ?o
}
or with N3 diff:
 @prefix diff: <http://www.w3.org/2004/delta#>.
{ <http://gary.example/lumber#concrete> ?p ?o }
diff:replacement
{ <http://pat.example/lumber#concrete> ?p ?o }.

httpRange-14 implications

A key argument in the httpRange-14 debate was that a person is not an information resource, and this distinction between information resources and non-information resources is important to Web architecture.   But does this mean that Mark Baker's use of his URI http://markbaker.ca/ to directly denote himself is a violation of Web architecture, given that an HTTP GET on the URI yields a 200 Okay response?  (See Dan Connolly's very nice analysis of this example.)   No.  The notion of s:isBroaderThan suggests that there is no architectural need to view this "ambiguous" use of http://markbaker.ca/ as a violation of semantic web architecture, since it is conceptually no different from the AKT scenario (above) in which the "ambiguous" URI is good enough for some applications but not for others. 

Therefore, the use of a URI to directly denote both an information resource and a non-information resource should be viewed as a violation of good practice, but not a violation of Web architecture.

Thus, in my view the TAG's httpRange-14 decision was correct -- an HTTP 200 response implies an information resource -- and the AWWW's view that a person is not an information resource is probably correct, but the AWWW notion that a URI denotes one resource may need more explanation in the context of the semantic web, given the two-step way in which the referent of a URI is determined in the semantic web.

However, the httpRange-14 decision can be interpreted as saying that if an HTTP GET on a URI u yields a 200 response, then the resource denoted by u is an information resource.  Thus, a 200 response can be treated as an implicit URI declaration for u, and presumably that declaration would include a core assertion to the effect:
<u> a awww:InformationResource .
Therefore, if the class of awww:InformationResources were declared to be disjoint with the class of sumo:Human, then a statement such as:
<http://markbaker.ca> a sumo:Human .
would immediately cause a contradiction when http://markbaker.ca yields a 200 response and its implicit URI declaration is asserted.  Thus, if this interpretation of the httpRange-14 decision is maintained, and a 200 response is viewed as providing implicit core assertions (rather than ancillary assertions), and an awww:InformationResource is not a human, then Mark Baker's URI would effectively be unusable in denoting himself as a person, even if such use would not violate the Web architecture.



Change log
24-Nov-2012: Corrected scenario 6 SPARQL and n3 diff examples, thanks to Roman Evstifeev.
6-Apr-2011: Added Nathan Rixham's suggestion to use SPARQL or n3 diff.
19-May-2009: Updated my email address.
26-Feb-2009: Updated the AKT scenario to note that modeling AKT as a class (instead of an instance) would have made it easier to relate AKT1, AKT2 and AKT3 to it (as subclasses).  Also clarified section #urisub, as part of it appears to have been accidentally deleted when I was doing other edits.

16-Jan-2009: Added s:isBroaderThanResource and s:isNarrowerThanResource.
3-Dec-2008: Corrected domain and range of s:isBroaderThanDeclaration.
26-Nov-2008: Changed namespace prefix decl: to s:.  Changed from named graphs rdfg:Graph to log:Formula, because I am using N3. Changed properties "broadens" and "narrows" to "isBroaderThan" and "isNarrowerThan".  Added properties s:isBroaderThanDeclaration and s:isNarrowerThanDeclaration.  Deleted properties hasBroaderResource and hasBroaderDenotedBy. 
Added section on multiple URI declarations and URI collision. 
4-Apr-2008: Added explanation of how blank nodes could be used instead of decl:broadens, and added more explanation of httpRange-14 implications.
27-Mar-2008: Added intro scenarios back in.  Added httpRange-14 implications section.
1-Mar-2008: Replaced draft with placeholder text on graph editing by reference.
30-Jun-2007: Initial draft (unpublished).