- UNFINISHED DRAFT! -

Splitting Identities

@@ TODO: improve title? @@

David Booth, Ph.D.
HP Software
Comments are invited: dbooth@hp.com

Latest version: @@@@
This version: @@@@

Views expressed herein are those of the author and do not necessarily reflect those of HP.

Abstract

@@ To do @@

@@ WARNING: This is document is VERY unfinished. @@

Introduction


Consider the following scenarios.

Scenario 1: AKT

A protein called AKT is discovered and described.  A URI is minted to denote it, and assertions are published using this URI.  Further exploration reveals that there are actually three different substances, AKT1, AKT2 and AKT2, that were confused as being a single substance.  What should happen to the original URI for AKT?  What should happen to its declaration?

Scenario 2: David Booth of HP

Someone mints a URI to denote the David Booth who works at HP.  But it turns out that there are three people named "David Booth" at HP.  What should be done?

Scenario 3: Mark Baker and his home page

Mark insists that http://markbaker.ca/ denotes himself[@@ref]].  But the WebArch @@ add ref@@ says that a person is not an "information resource" (in the WebArch sense).

Scenario 5: Dialects of Southern Zhuang

See John Cowan's post: http://lists.w3.org/Archives/Public/www-tag/2008Mar/0038.html


What do these scenarios have in common?  These scenarios all involve the problem that something that at one point was thought to be (or or could be modeled as) a single entity is later (or in some other context) viewed as two or more distinct things.  This paper discusses how these distinctions can be made and named using URIs, and how the relationships between them can be indicated in the Semantic Web.

@@@@

Scenario 1: AKT

[Note: Although this scenario is adapted from the actual history of AKT,  the names Jann, Luke, and Katie are fictitious.]

Jann, a doctor studying gene expression, discovers a gene he calls AKT that is involved in cellular survival pathways.  He mints a URI for this gene, http://jann.example#akt, publishes a URI declaration for it at http://jann.example, and publishes various RDF assertions based on his observations.  Others then use Jann's URI to publish assertions about AKT.  This data is very helpful to Katie, who develops a Semantic Web application that helps her screen patients.  The application improves the survival rates of thousands of people.  A few years after Jann's initial discovery, another researcher, Luke, uses more sophisticated equipment and techniques and discovers that Jann's research was flawed: what Jann thought was a single gene, AKT, is actually three distinct genes.  Luke dubs them AKT1, AKT2 and AKT3 and mints corresponding new URIs http://luke.example#akt1, http://luke.example#akt2 and http://luke.example#akt3.  Colloquially, researchers start using "AKT" to refer more specifically to AKT1.  Luke writes to Jann, informing Jann of Luke's discovery, and suggests that Jann's URI declaration for http://jann.example#akt be modified to add additional assertions that would specifically identify AKT1.

Q: Should Jann delete his URI declaration for http://jann.example#akt, or change it to more specifically refer to AKT1?

A:  No.  Doing so may disrupt working applications, such as Katie's screening application.  However, it would be helpful if his URI declaration offered a pointer to information on AKT1, AKT2 and AKT3.

Q: How can Luke indicate the relationship between AKT1, AKT2 and AKT3 and Jann's AKT?

A. He can write something like:
"http://jann.example#akt"^^xsd:anyURI decl:broadens 
"http://luke.example#akt1"^^xsd:anyURI ,
"http://luke.example#akt2"^^xsd:anyURI ,
"http://luke.example#akt3"^^xsd:anyURI .


 @@@@

Broadening or narrowing a URI declaration

The decl:broadens relationship indicates that one URI's declaration is broader (less constraining) than another.  The decl:narrows relationship is the inverse.  They are analogous to skos:narrower and skos:broader.  In particular, if '"uB"^^xsd:anyURI decl:broadens "uA"^^xsd:anyURI .', then any real world interpretation of <uA>'s referent that satisfies the core assertions of uA's URI declaration will also satisfy the core assertions of <uB>'s URI declaration.   Note that this relationship is between URIs rather than being directly between the resources that those URIs denote.  This is to avoid implying acceptance of uB's core assertions.

decl:broadens a rdf:Property ;
    rdf:label "is broader than" ;
    rdf:comment "@@" ;
    rdfs:domain xsd:anyURI ;
    rdfs:range xsd:anyURI .
Example:
"http://jann.example#akt"^^xsd:anyURI decl:broadens 
"http://luke.example#akt1"^^xsd:anyURI .

Another way to do this would be to use blank nodes:
_:akt  uri:hasURI "http://jann.example#akt"^^xsd:anyURI .

_:akt1 uri:hasURI "http://luke.example#akt1"^^xsd:anyURI .
_:akt2 uri:hasURI "http://luke.example#akt2"^^xsd:anyURI .
_:akt3 uri:hasURI "http://luke.example#akt3"^^xsd:anyURI .

_:akt1 decl:hasBroaderResource _:akt .
_:akt2 decl:hasBroaderResource _:akt .
_:akt3 decl:hasBroaderResource _:akt .

where the uri:hasURI relationship indicates that the subject resource is denoted by the object URI:
uri:hasURI a rdf:Property ;
rdf:label "hasURI" ;
rdf:comment """The subject resource is denoted by the object URI. It is
basically the same as log:uri, but has a range of xsd:anyURI,
so that a simple assertion like {r hasURI u} will cause u to be
recognized as type xsd:anyURI without having to assert it explicitly.""" ;
rdfs:subPropertyOf log:uri ;
# rdfs:domain rdfs:Resource ;
rdfs:range xsd:anyURI .
and decl:hasBroaderResource indicates that the real world interpretation of the object resource is broader than the real world interpretation of the subject resource.

These two approaches could also be combined in a hybrid relationship decl:hasBroaderDenotedBy (or inversely decl:hasNarrowerDenotedBy):
decl:hasBroaderDenotedBy a rdf:Property ;
rdf:label "hasBroaderDenotedBy" ;
rdf:comment "@@" ;
# rdfs:domain rdfs:Resource ;
rdfs:range xsd:anyURI .
This would allow Luke to express the relationship more concisely:
<http://luke.example#akt1> 
decl:hasBroaderDenotedBy
"http://jann.example#akt"^^xsd:anyURI .
This would indicate that the real world interpretation of the resource denoted by the object URI (http://jann.example#akt) is broader than the real world interpretation of the subject resource (which is denoted by http://luke.example#akt1).

@@


URI substitution by reference

Consider this scenario:

SCENARIO 6: Helen is lazy and publishes her color assertions at http://helen.example/colors anyway, using http://gary.example/lumber#concrete to denote concrete.  Ian finds Helen's assertions and wishes to use them, but he notices that Gary's ontology at http://gary.example/lumber erroneously asserts that "30.49 inches = 1 foot", and Ian's application cannot withstand that erroneous assertion.  Ian is aware that Pat has published an ontology at http://pat.example/lumber that is equivalent to Gary's ontology except that it does not contain this erroneous assertion. 

Question: What should Ian do?

My answer: Ian should effectively rewrite Helen's assertions to use  http://pat.example/lumber#concrete instead of http://gary.example/lumber#concrete throughout.  He can either do this by modifying a copy of Helen's assertions, or by reference, using special expressions to indicate proper URI substitution in Helen's graph.

If Ian does not have permission to copy and modify Helen's assertions, how can he effectively modify Helen's assertions by reference?   One way would be to have a property that, given an information resource for an RDF graph, and a list consisting of <oldUri, newUri> pairs, changes each oldUri to newUri throughout the graph.  Ian could then write something like the following:
@prefix dbooth: <http://dbooth.example/splitting#> .
# . . .
# Include Helen's assertions, but replace
# <http://gary.example/lumber#concrete>
# with <http://pat.example/lumber#concrete> .
<http://helen.example/colors> dbooth:includedReplacing
{ "http://gary.example/lumber#concrete"^^xsd:anyURI
        "http://pat.example/lumber#concrete"^^xsd:anyURI ) .

# . . .
[Note: There may be other existing/better ways to do this.  Is there a SPARQL way to do it?  Please let me know if you know of any.  Thanks! -- DBooth]

httpRange-14 implications

A key argument in the httpRange-14@@ref@@ debate was that a person is not an information resource@@ref@@, and this distinction between information resources and non-information resources is important to Web architecture.   But does this mean that Mark Baker's use of his URI http://markbaker.ca/ to directly denote himself is a violation of Web architecture, given that an HTTP GET on the URI yields a 200 Okay response?  (See Dan Connolly's very nice analysis of this example in @@ref@@.)   No.  The notion of decl:broadens suggests that there is no architectural need to view this "ambiguous" use of http://markbaker.ca/ as a violation of Semantic Web architecture, since it is conceptually no different from the AKT scenario in which the "ambiguous" URI is good enough for some applications but not for others. 

Therefore, the use of a URI to directly denote both an information resource and a non-information resource should be viewed as a violation of good practice, but not a violation of Web architecture.

Thus, in my view the TAG's httpRange-14 decision was correct -- an HTTP 200 response implies an information resource -- and the AWWW's view that a person is not an information resource is probably correct, but the AWWW notion that a URI denotes one resource may need more explanation in the context of the Semantic Web, given the two-step way@@ref@@ in which the referent of a URI is determined in the Semantic Web.

However, the httpRange-14 decision@@ref@@ can be interpreted as saying that if an HTTP GET on a URI u yields a 200 response, then the resource denoted by u is an information resource.  Thus, a 200 response can be treated as an implicit URI declaration for u, and presumably that declaration would include a core assertion to the effect:
<u> a awww:InformationResource .
Therefore, if the class of awww:InformationResources were declared to be disjoint with the class of sumo:Human, then a statement such as:
<http://markbaker.ca> a sumo:Human .
would immediately cause a contradiction when http://markbaker.ca yields a 200 response and its implicit URI declaration is asserted.  Thus, if this interpretation of the httpRange-14 decision is maintained, and a 200 response is viewed as providing implicit core assertions (rather than ancillary assertions), and an awww:InformationResource is not a human, then Mark Baker's URI would effectively be unusable in denoting himself as a person, even if such use would not violate the Web architecture.



4-Apr-2008: Added explanation of how blank nodes could be used instead of decl:broadens, and added more explanation of httpRange-14 implications.
27-Mar-2008: Added intro scenarios back in.  Added httpRange-14 implications section.
1-Mar-2008: Replaced draft with placeholder text on graph editing by reference.
30-Jun-2007: Initial draft (unpublished).