Canonical forms for isomorphic and equivalent RDF graphs: algorithms for leaning and labelling blank nodes
Author
dc.contributor.author
Hogan, Aidan
Admission date
dc.date.accessioned
2018-05-09T17:02:49Z
Available date
dc.date.available
2018-05-09T17:02:49Z
Publication date
dc.date.issued
2017
Cita de ítem
dc.identifier.citation
ACM Trans. Web (Sep 2017), Vol 11,No. 4
es_ES
Identifier
dc.identifier.other
10.1145/3068333
Identifier
dc.identifier.uri
https://repositorio.uchile.cl/handle/2250/147590
Abstract
dc.description.abstract
Existential blank nodes greatly complicate a number of fundamental operations on RDF graphs. In particular,
the problems of determining if two RDF graphs have the same structure modulo blank node labels (i.e. if they
are isomorphic), or determining if two RDF graphs have the same meaning under simple semantics (i.e., if
they are simple-equivalent), have no known polynomial-time algorithms. In this paper, we propose methods
that can produce two canonical forms of an RDF graph. The rst canonical form preserves isomorphism
such that any two isomorphic RDF graphs will produce the same canonical form; this iso-canonical form is
produced by modifying the well-known canonical labelling algorithm Nauty for application to RDF graphs.
The second canonical form additionally preserves simple-equivalence such that any two simple-equivalent
RDF graphs will produce the same canonical form; this equi-canonical form is produced by, in a preliminary
step, leaning the RDF graph, and then computing the iso-canonical form. These algorithms have a number
of practical applications, such as for identifying isomorphic or equivalent RDF graphs in a large collection
without requiring pair-wise comparison, for computing checksums or signing RDF graphs, for applying
consistent Skolemisation schemes where blank nodes are mapped in a canonical manner to IRIs, and so forth.
Likewise a variety of algorithms can be simpli ed by presupposing RDF graphs in one of these canonical
forms. Both algorithms require exponential steps in the worst case; in our evaluation we demonstrate that
there indeed exist di cult synthetic cases, but we also provide results over 9.9 million RDF graphs that suggest
such cases occur infrequently in the real world, and that both canonical forms can be e ciently computed in
all but a handful of such cases.
es_ES
Patrocinador
dc.description.sponsorship
Millennium Nucleus Center for Semantic Web Research, NC120004
Fondecyt, 11140900