The Complexity of Database Inconsistency Measures

Speaker:
Ester Livshits, Ph.D. Thesis Seminar
Date:
Wednesday, 23.10.2019, 13:30
Place:
Room 601 Taub Bld.
Advisor:
Prof. Benny Kimelfeld

Managing data inconsistency has been one of the major challenges in the research and practice of database management. Database inconsistency arises for different reasons and in different applications. Nowadays, many applications obtain information from imprecise sources (e.g., social encyclopedias, social networks, and sensors attached to appliances) via imprecise procedures (e.g., natural-language processing, signal processing, and image analysis). Inconsistency may also arise when integrating conflicting data from different (possibly consistent) sources. During the past two decades, researchers have established, developed and investigated a principled approach to managing database inconsistency via the notion of database repairs. A repair of an inconsistent database is traditionally defined as a consistent database that differs from the inconsistent one in a "minimal" way. We explore various problems arising in the challenge of measuring how inconsistent a database is. Inconsistency measures are important for estimating the extent to which a database is trustworthy, and the effort required to clean it. Such measures are useful for implementing progress bars in data cleaning systems, as well as for recommending next steps in interactive systems, and estimating the potential usefulness and cost of incorporating databases for downstream analytics. Specifically, we explore the computational complexity of two inconsistency measures. The first measure is based on the cost of a minimal repair (i.e., the minimal number of operations required to obtain consistency), and the second is based on the number of repairs. Then, we revisit the second measure in the presence of preferences among tuples in the database. Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one tuple is regarded more reliable than another, or a more recent tuple should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs that incorporates preferences among database tuples. We show that the presence of preferences significantly affects the computational complexity.

Back to the index of events