Hebrew Acronyms: Identification, Expansion, and Disambiguation
Kayla Jacobs, M.Sc. Thesis Seminar
Monday, 9.12.2013, 16:30
Taub 601
Acronyms are words formed from the initial letters of a phrase. For example, "CIA" is an acronym that usually means "Central Intelligence Agency," but in other contexts could mean "Culinary Institute of America." Understanding acronyms is important for many natural language processing applications, including search and machine translation. While hand-crafted acronym dictionaries exist, they are limited and require frequent updates. We developed a new machine learning method to automatically build a Hebrew acronym dictionary from unstructured text documents. This is the first acronym dictionary construction technique, in any language, to specifically include acronyms whose expansions do not necessarily appear in the same document. We also enhanced the dictionary with contextual information to help during acronym disambiguation. Additionally, while acronyms have a long history in Hebrew, and have previously been investigated from a qualitative linguistic perspective, they have never before been studied quantitatively. We'll share some new statistically-based linguistic insights about acronym usage in modern Hebrew texts, of interest to Hebrew language aficionados and developers of Hebrew natural language processing systems.
