Declarative Data Cleaning via Preferred Repairs

אסתר ליבשיץ, הרצאה סמינריונית למגיסטר
יום שני, 31.10.2016, 12:30
טאוב 301
Prof. Benny Kimelfeld

An inconsistent database is a database that violates some of the integrity constraints assumed to hold. Managing data inconsistency has been one of the major challenges in the research and practice of database management. In the Big Data era, applications often collect and integrate information from a large collection of information sources, containing imprecise measures, natural-language text, and so on. Sometimes these data sources contain inconsistent and imprecise information or they disagree with one another, which may result in inconsistency. During the past two decades, researchers have established, developed and investigated a principled approach to managing database inconsistency via the notion of database repairs. In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." In this approach, all of the repairs are equally treated. Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. In our work we investigate the complexity of three problems related to this framework. The first is that of deciding whether the priority relation suffices to clean the database unambiguously (that is, whether there is exactly one optimal repair). The other two problems, those of counting and enumerating database repairs, are a generalization of this problem.

בחזרה לאינדקס האירועים