Asaf Yeshurun, M.Sc. Thesis Seminar
For password to lecture, please contact: firstname.lastname@example.org
Advisor: Prof. B. Kimelfeld
The Hebrew Bible (Tanach) has been extensively quoted by historical religious text and commentaries throughout history.
Nowadays, many of these text resources are publicly available online. Yet, the Bible quotations within them are often partially identified if at all.
Knowing the exact quotations may be highly beneficial to scholars interested in studying or investigating the Bible.
We have developed and empirically analyzed a machine-learning solution for this task.
End-to-end, our model is comprised of three main stages: (a) rule-based candidate generation, (b) context extraction using available historical commentary, and (c) an artificial neural-network for candidate scoring.
To evaluate our models, we have constructed labeled data based on the Hebrew Bible commentary known as Midrash Raba, which contains more than half a million words and over 30,000 quotations.
Our solution scores over 80% F-score, and considerably outperforms several state-of-the-art approaches for tasks of similar nature. As a contribution of independent interest, our solution includes of a novel word-embedding method that seeks to utilize the nature of our text and its context.