Resource Identity and owl:sameAs in Semantic Web Architecture

David Booth, Cleveland Clinic (contractor)
david@dbooth.org

Latest version of this document: http://dbooth.org/2009/sameas/

Views expressed herein are those of the author and do not necessarily reflect those of Cleveland Clinic.

Abstract. Although in web architecture a URI is intended to denote one resource, in semantic web architecture it is useful to think of resource identity in terms of sets of possible interpretations for that URI. A URI declaration establishes the identity of a resource by constraining the set of possible interpretations for the URI. However, additional assertions chosen by an application may further constrain the set of possible interpretations. The owl:sameAs predicate provides a useful example of how resource identity is constrained. When owl:sameAs is asserted between two URIs, the set of possible interpretations is limited to the intersection of the sets of interpretations prescribed by their URI declarations.

Key words: Semantic Web, RDF, identity, URI declaration, URI definition, owl:sameAs

1 Introduction: Ambiguity and interpretations

Semantic web architecture layers the use of RDF[1] and related technologies on top of existing web architecture. In web architecture, a URI denotes one resource.[2] But in RDF semantics[3] an interpretation determines the mapping from a URI to the resource denoted by that URI, and usually many interpretations are consistent with a given RDF graph. Although RDF semantics does specify how the range of possible interpretations is constrained for a given graph (through entailment rules), it is intentionally silent about how an appropriate interpretation should be chosen.

The net effect is that for semantic web applications, although a URI is only supposed to denote one resource, RDF semantics only partially tells us which resource that must be: for a given URI in an RDF graph, there will generally be a set of resources that could be denoted by that URI, corresponding to the set of possible interpretations for that graph. For brevity, we will sometimes refer to this as the set of interpretations for that URI.

2 URI declarations and constrained ambiguity

How can a URI become associated with a particular resource, such that others can determine which resource that URI is intended to denote? To side-step a long standing philosophical debate, we will simply assume that a description is used to constrain the permissible set of interpretations for that URI. But how should such a description be associated with the URI, such that it can be easily found by others? Connelly[4] has proposed that the URI be (indirectly) dereferenceable[2] to its description, and Cool URIs for the Semantic Web[5] describes best practices for minting suitable URIs and configuring one's web server accordingly. Booth[6] terms such descriptions URI declarations, and the algorithm for directly or indirectly dereferencing the URI to find its URI declaration is known colloquially as follow-your-nose (f-y-n)[6]. Regardless of whether f-y-n or another approach is used, we will simply assume that some means is used to associate a URI declaration with a URI, such that RDF authors and consumers can find it when needed.

Identity as a cloud of resource possibilities

Hayes and Halpin[7] have written about the near impossibility of describing a resource completely enough to remove all ambiguity about its identity, and advocated that this ambiguity be accepted and embraced. Indeed, there is a tension between reusability and precision: an ontology that makes fine distinctions with great precision may be essential to one application but unusable by many others. In the SKOS[8] ontology there was a conscious design decision to avoid overly constraining the definition of terms like skos:Person, in order to accommodate a wide variety of interpretations and uses. Thus, not only is ambiguity normally inescapable, in some sense it can also be valuable. But are there limits? How much ambiguity is enough, and how should those limits be indicated for a given URI?

The main idea behind a URI declaration is that it provides a simple, straightforward mechanism for prescribing the precise range of interpretations that are permissible for a given URI: the RDF semantics of the URI declaration's core assertions[6] constrain the set of possible interpretations for that URI. Any interpretation from within this set should be viewed as a valid interpretation in conjunction with the use of the URI. All applications that use the URI must choose interpetations from within this set -- and thus the resource identity is clearly constrained -- but different applications may make different choices from within this set.

Therefore, although web architecture says that a URI denotes one resource, in semantic web architecture it is more useful to think of a URI as identifying a cloud of resource possibilities, as prescribed by its URI declaration.

For example, we might have three URIs, each with a URI declaration that constrains its interpretations as follows:

http://example#apple1 has a URI declaration that constrains its interpretations to those in which it denotes an individual red, green or yellow apple -- set s1;
http://example#apple2 has a URI declaration that constrains its interpretations to those in which it denotes an individual red, pink or yellow apple -- set s2; and
http://example#apple3 has a URI declaration that constrains its interpretations to those in which it denotes an individual green or blue apple -- set s3.

This is illustrated in Figure 1.
Figure 1: owl:sameAs and resource sets for possible interpretations

Using ancillary assertions to further constrain interpretations

Of course, when a URI is used in conjunction with additional RDF assertions that have been selected for use by a particular application -- so-called ancillary assertions[6] -- the set of possible intepretations will be further constrained. Furthermore, different applications may choose different sets of ancillary assertions, thus constraining the interpretations differently.

This means that although different applications may ultimately use the same URI to denote different resources (because they have chosen different interpretations or different ancillary assertions that have resulted in different possible interpretations), those resources are all from within the set of interpretations permitted by the URI declaration.

Possible versus plausible interpretations

Thus far we have intentionally glossed over the fact that in general, a URI declaration may contain both formal assertions (expressed in RDF for example) and informal assertions, such as prose descriptions contained in the text of rdf:comments. These informal assertions, along with other background knowledge, may allow the set of possible interpretations for a given URI to be further restricted to a subset that we might call plausible interpretations. In practice, applications will choose interpretations from this subset rather than from the full set of possible interpretations. However, because the notion of plausible intepretation is inherently vague, and because informal assertions might eventually be expressed formally, for simplicity this paper focuses only on the set of possible interpretations -- those that are logically consistent with a formally expressed RDF graph. However, similar reasoning could be applied to the set of plausible interpretations.

3 Ambiguous identity and owl:sameAs

It is interesting to consider the meaning of owl:sameAs[9] in terms of the constrained ambiguity prescribed by URI declarations.

Owl:sameAs is used to indicate that two URIs denote the same resource. However, this does not mean that the two URIs have the same URI declaration. Indeed, two URIs such as http://example#apple1 and http://example#apple2 may have significantly different URI declarations, individually permitting different sets of possible interpretations. But when owl:sameAs is asserted between them, the effect is to limit the range of possible interpretations for both URIs to the intersection of their individual sets of interpretations, as illustrated by set s12 in Figure 1. (To be clear, this is the intersection of the sets of resources corresponding to these URIs in the sets of possible interpretations -- not the sets of interpretations themselves. Furthermore, this intersection is an upper bound.) In short, owl:sameAs gives the intersection of the clouds of resource possibilities.

Of course, different applications may choose different ancillary assertions to use in conjunction with the URI declarations. For example, appA may use docA containing the owl:sameAs assertion previously mentioned, which could be written in N3[10] as:

docA:

		@prefix owl: <http://www.w3.org/2002/07/owl#> .
		@prefix :  <http://example#> .
		:apple1 owl:sameAs :apple2 .

while appB may use docB, which equates :apple1 to :apple3 as follows:

docB:

		@prefix owl: <http://www.w3.org/2002/07/owl#> .
		@prefix :  <http://example#> .
		:apple1 owl:sameAs :apple3 .

docB limits the set of possible interpretations for both http://example#apple1 and http://example#apple3 to the intersection of those prescribed by their URI declarations (respectively), as also illustrated by set s13 in Figure 1. In other words, appA and appB are using different (and mutually exclusive) interpretations for http://example#apple1.

These mutually exclusive interpretations cause no harm as long as docA and docB are not used together. But what if appC wishes to use both docA and docB? In this case, the set of possible interpretations for :apple1 would be the intersection of s1, s2 and s3, and this is an empty set, because s2 and s3 do not overlap. In other words, the assertions contained in docA, docB and the URI declarations for :apple1 :apple2 and :apple3 taken together are inconsistent. This does not mean that either docA or docB are wrong. It just means that they have made incompatible assumptions about the ambiguous identities of :apple1, :apple2 and :apple3.

This illustrates a classic complaint about owl:sameAs: that the assertion is too strong because it can easily lead to contradictions when data is merged. However, this problem is not limited to owl:sameAs. The problem is rooted in the fact that identity is ambiguous, and different data sets sets may make different and contradictory sets of assumptions about the nature of the denoted resource, even though each of those sets of assumptions individually falls within the bounds of that resource's ambiguity.

Avoiding or resolving identity conflicts

One way to avoid this problem would be to define a scopedSameAs predicate[11], which would be similar to owl:sameAs except that its effect would be limited to a particular named graph.[12]

Finally even if owl:sameAs is used, but appC really wants to use both docA and docB together, then it could still do so by splitting[13] the identity of http://example#apple1 to avoid this logical inconsistency, such that docA and docB effectively refer to different resources, as sketched in slides 15-18 of Why URI Declarations? A comparison of architectural approaches[14].

4 Conclusions

Since identity is almost always necessarily ambiguous, it is helpful to think of it as a cloud of possible resources. With this view, asserting owl:sameAs amounts to taking the intersection of two clouds of possible resources. When two applications make different sets of assumptions about the identity of an ambiguous resource, those sets of assumptions may be mutually inconsistent even though they are individual consistent with the ambiguous resource definition.

5 Acknowledgements

Thanks to Jonathan Rees for suggesting the term plausible interpretations and for his always valuable discussion.

6 References

1. Klyne, G. and Carroll, J: Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation, 10-Feb-2004, http://www.w3.org/TR/rdf-concepts/

2. Jacobs, Ian and Walsh, Norman: Architecture of the World Wide Web, Volume 1. 15-Dec-2004, http://www.w3.org/TR/webarch/

3. Hayes, P: RDF Semantics, W3C Recommendation 10-Feb-2004, http://www.w3.org/TR/rdf-mt/

4. Connelly, D: A Pragmatic Theory of Reference for the Web, IRW 2006, 23-May-2006, http://www.w3.org/2006/04/irw65/urisym.html

5. Sauermann, L and Cyganiak, R: Cool URIs for the Semantic Web, W3C Working Draft 21-Mar-2009, http://www.w3.org/TR/cooluris/

6. Booth, D: URI Declaration in Semantic Web Architecture, 26-Nov-2008 http://dbooth.org/2007/uri-decl/

7. Hayes, P and Halpin, H: In Defense of Ambiguity, IRW 2006, 23-May-2006, http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html

8. Miles, A and Bechhofer, S: SKOS Simple Knowledge Organization System Reference, W3C Candidate Recommendation 17-Mar-2009, http://www.w3.org/TR/skos-reference/

9. Dean, M and Schreiber, G: OWL Web Ontology Language Reference, W3C Recommendation 10-Feb-2004, http://www.w3.org/TR/owl-ref/

10. Berners-Lee, T and Connelly, D: Notation3 (N3): A readable RDF syntax, W3C Team Submission 14 January 2008, http://www.w3.org/TeamSubmission/n3/

11. Booth, D: Re: blog: semantic dissonance in uniprot, email 26-Mar-2009, publicly archived at http://lists.w3.org/Archives/Public/public-semweb-lifesci/2009Mar/0179.html

12. Carroll, J: Named Graphs, page retrieved 23-Mar-2009, http://www.w3.org/2004/03/trix/

13. Booth, D: Splitting Identities in Semantic Web Architecture, 26-Feb-2009, http://dbooth.org/2007/splitting/

14. Booth, D: Why URI Declarations? A comparison of architectural approaches, ESWC-08, 16-Jan-2009, http://dbooth.org/2008/irsw/slides.ppt

23-Mar-2009: Initial version