Canonical forms for isomorphic and equivalent RDF graphs: algorithms for leaning and labelling blank nodes
Artículo
Open/ Download
Publication date
2017Metadata
Show full item record
Cómo citar
Hogan, Aidan
Cómo citar
Canonical forms for isomorphic and equivalent RDF graphs: algorithms for leaning and labelling blank nodes
Author
Abstract
Existential blank nodes greatly complicate a number of fundamental operations on RDF graphs. In particular,
the problems of determining if two RDF graphs have the same structure modulo blank node labels (i.e. if they
are isomorphic), or determining if two RDF graphs have the same meaning under simple semantics (i.e., if
they are simple-equivalent), have no known polynomial-time algorithms. In this paper, we propose methods
that can produce two canonical forms of an RDF graph. The rst canonical form preserves isomorphism
such that any two isomorphic RDF graphs will produce the same canonical form; this iso-canonical form is
produced by modifying the well-known canonical labelling algorithm Nauty for application to RDF graphs.
The second canonical form additionally preserves simple-equivalence such that any two simple-equivalent
RDF graphs will produce the same canonical form; this equi-canonical form is produced by, in a preliminary
step, leaning the RDF graph, and then computing the iso-canonical form. These algorithms have a number
of practical applications, such as for identifying isomorphic or equivalent RDF graphs in a large collection
without requiring pair-wise comparison, for computing checksums or signing RDF graphs, for applying
consistent Skolemisation schemes where blank nodes are mapped in a canonical manner to IRIs, and so forth.
Likewise a variety of algorithms can be simpli ed by presupposing RDF graphs in one of these canonical
forms. Both algorithms require exponential steps in the worst case; in our evaluation we demonstrate that
there indeed exist di cult synthetic cases, but we also provide results over 9.9 million RDF graphs that suggest
such cases occur infrequently in the real world, and that both canonical forms can be e ciently computed in
all but a handful of such cases.
Patrocinador
Millennium Nucleus Center for Semantic Web Research, NC120004
Fondecyt, 11140900
Indexation
Artículo de publicación ISI
Quote Item
ACM Trans. Web (Sep 2017), Vol 11,No. 4
Collections
The following license files are associated with this item: