Rojeh Hayek, M.Sc. Thesis Seminar
Sunday, 17.11.2019, 15:30
Advisor: Prof. Oded Shmueli
Traditional query optimizer is crucially dependent on cardinality estimation, which enables choosing among different plan alternatives by using the cardinality estimation of intermediate results within query execution plans. Therefore, the query optimizer must use reasonably good estimates. However, estimates produced by all widely-used database cardinality estimation models are routinely significantly wrong, resulting in not choosing the best plans, leading to slow executions.
In this work we define a new problem, that of estimating the containment rate between pairs of SQL queries, and we introduce a novel technique for estimating queries’ result-cardinalities using estimated queries’ containment rates. For estimating containment rates between SQL queries, we propose a specialized deep learning model, CRN, which enables us to express query features using sets and vectors.
We show that using the proposed technique for estimating cardinalities with the CRN model, we improve current cardinality estimation techniques significantly. This is especially the case when there are multiple joins, where the known cardinality estimation techniques suffer from under-estimated results and errors that grow exponentially as the number of joins increases. Our technique estimates the cardinalities more robustly x150/x175 with 4 joins queries, and x1650/x120 with 5 joins queries, compared with the PostgreSQL cardinality estimation component, and MSCN, a state-of-the-art cardinality estimation model, respectively.
We also show that by employing any existing cardinality estimation method such as PostgreSQL, for containment estimation, and then embedding it in our cardinality estimation technique, we improve on its cardinality estimates as well, without changing the method itself. Additionally, we propose two techniques for extending any “limited” cardinality estimation model that only estimates cardinalities with duplicates of conjunctive queries, to support general queries that use the AND, OR, and NOT operators, and to be able to estimate the set-theoretic cardinalities (i.e., cardinalities without duplicates).
-The talk will be given in Hebrew.