Technical Report MSC-2006-18

TR#:MSC-2006-18
Class:MSC
Title: On XML Schema Identity Constraints
Authors: Sharon Krisher
Supervisors: Oded Shmueli
PDFMSC-2006-18.pdf
Abstract: As XML becomes an increasingly popular format of representing information, it becomes more and more important to verify the integrity of this information. XML Schema is a W3C standard of defining constraints the XML documents must conform to. A schema can define the structure of documents that conform to it (i.e., which elements and attributes appear in the document, and the parent-child relationships between them), and the data types of attributes and of simple elements. One of the useful features of XML Schema is the ability to define identity constraints of three kinds: 'key', 'keyref', and 'unique'. In this work we focus on the 'key' and 'keyref' constraints. These constraints are similar to the key and foreign key constraints used in databases. However, the hierarchical nature of XML documents complicates the semantics of these constraints. These semantics are not apparent at first reading of the formal specification XMLSchema, and they are explained in detail in the introduction to this thesis.

It is very important to be able to correctly and efficiently validate a document with respect to key and keyref constraints. In our experience, few XML Schema validators validate these constraints in accordance with the formal specification. Those that do, provide only the ability to validate a whole document and not the ability to change a document and check efficiently whether the change violates identity constraints. We explain how to statically validate identity constraints for a given document, and review how it is done (correctly) in the open-source validator XSV. Then, we define several simple operations that change a document (change a value of a node or several nodes, add or delete a sub-tree), and present efficient algorithms for incrementally validating these operations, i.e., algorithms that check whether changing the document will cause identity constraints to be violated. We do this by defining data structures that capture the state of the document with respect to the identity constraints defined in an XML Schema. As we perform a change operation, we update the relevant data structures accordingly. If the operation is found to violate a constraint, all updates to the data structures are rolled back. Otherwise, the data structures are updated, so that they reflect the state of the changed document.

We present an implementation of these algorithms. This implementation is written is Python. It is based on the open-source validator XSV, and extends it with incremental validation capabilities. We perform experiments with our implementation, and show its superiority to validation of identity constraints from scratch using XSV.

Another part of this work involves XPath, a language for navigating in XML documents. XPath includes axes that enable navigation according to axes. These axes navigate from a node to its parent, children or siblings. We suggest adding foreign-key navigation F axes. Given a schema that defines a keyref KR, we suggest adding two axes, one that allows navigation from a node to a node it references (i.e., its 'parent' according to the keyref), and another that allows navigation from a node to nodes that reference it (i.e., its 'children' according to the keyref). We present a formal definition of such axes. Then, we show an algorithm for evaluating an XPath expression that uses such axes. We define a fragment of XPath that is enriched with such axes, and explore its expressive power. We also discuss the special case in which the keyref definition contains only one field.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2006/MSC/MSC-2006-18), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2006
To the main CS technical reports page

Computer science department, Technion
admin