Technical Report PHD-2020-03

TR#:PHD-2020-03
Class:PHD
Title: The Complexity of Database Inconsistency Measures
Authors: Ester Livshits
Supervisors: Benny Kimelfeld
PDFPHD-2020-03.pdf
Abstract: Managing data inconsistency has been one of the major challenges in the research and practice of database management. Database inconsistency arises for different reasons and in different applications. Nowadays, many applications obtain information from imprecise sources (e.g., social networks) via imprecise procedures (e.g., natural-language processing). Inconsistency may also arise when integrating conflicting data from different sources. During the past two decades, researchers have established, developed and investigated a principled approach to managing database inconsistency via the notion of database repairs. A repair of an inconsistent database is traditionally defined as a consistent database that differs from the inconsistent one in a "minimal" way.

We investigate various problems arising in the challenge of measuring how inconsistent a database is. The problem of measuring inconsistency has been studied extensively by the Knowledge Representation and Logic communities, and has been recently acknowledged by the database community. Inconsistency measures are important for estimating the extent to which a database is trustworthy, and the effort required to clean it. Specifically, we explore the computational complexity of two basic inconsistency measures. The first measure is based on the cost of a minimal repair (i.e., the minimal number of operations required to obtain consistency), and the second is based on the number of repairs. We focus on data complexity (where the schema is considered fixed and the input consists of a database instance) and establish dichotomies in (i.e., a full classification of) data complexity for the entire space of sets of functional dependencies.

Finally, repairs are often not equally legitimate, as it is desired to prefer one over another; for example, one tuple is regarded more reliable than another, or a more recent tuple should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs that incorporates preferences among database tuples. We revisit the second measure in the presence of preferences among tuples in the database. We show that the presence of preferences significantly affects the computational complexity.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information
DisclaimerRecent theses may have not yet been approved by the Technion Senate, and are provided here for the purpose of fast dissemination of knowledge only. Final approval of the Technion Senate is needed for a thesis to be used for the partial fulfillment of the requirements for the degree of M.Sc. or Ph.D.

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2020/PHD/PHD-2020-03), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2020
To the main CS technical reports page

Computer science department, Technion
admin