Yair Feldman, M.Sc. Thesis Seminar
Question Answering (QA) is one of the core tasks in natural language understanding. This task requires the ability to process and understand natural language questions and documents, as well as to extract the answers to these questions.
There are several settings for the QA task, which provide varying levels of difficulty. In the most simple setting, a question is paired with a context which is guaranteed to contain sufficient evidence to answer the question. A more difficult setting is called multi-hop QA, in which multiple hops of reasoning are required in order to derive the correct answer from the given context. Another interesting setting is open-domain QA, in which a question is given without an accompanying context, and instead the relevant context must be retrieved from a large knowledge source, e.g. Wikipedia.
In this work, we are concerned with the task of multi-hop open-domain QA. This task is particularly challenging since it requires the simultaneous performance of textual reasoning and efficient searching. We present a method for retrieving multiple supporting paragraphs, nested amidst a large knowledge base, which contain the necessary evidence to answer a given question. Our method iteratively retrieves supporting paragraphs by forming a joint vector representation of both a question and a paragraph. The retrieval is performed by considering contextualized sentence-level representations of the paragraphs in the knowledge source.
Our method achieves state-of-the-art performance over two well-known datasets, SQuAD-Open and HotpotQA, which serve as our single- and multi-hop open-domain QA benchmarks, respectively.