HEBREW COMPUTATIONAL LINGUISTICS - A SURVEY The following survey is my view of the past activity in the area of Hebrew Computational Linguistics. It is based on the best of my knowledge and is not claimed to be neither complete nor objective. The ideas listed below are my own and do not reflect in any way the Technion's ideas or those of the Department of Computer Science. I would appreciate any comment, addition, correction etc. Please send comments to me directly. Shuly Wintner shuly@cs.technion.ac.il Computer Science shuly@techunix.bitnet Technion, Israel Institute of Technology tel: +972-4-294315 Haifa 32000, Israel fax: +972-4-294353 http://www.cs.technion.ac.il/~shuly PHONOLOGY Sorry, nothing I am aware of. MORPHOLOGY Hebrew HAS morphology; unlike English, the problem of parsing a word (to get its base and added derivative features), as well as the problem of generating a word (from a base and syntactic features) is far from trivial. What makes things more problematic is that the Hebrew script is either very redundant ("full script", where the vowels are represented by small punctuation marks above, inside or under the letters) or very under-specified ("un-punctuated script"), where most of the vowels are missing. The second type of script is widely used; the first one is used mostly in children books etc. So the problem of morphology has to be separated from that of the script. To that effect, Prof. Uzzi Ornan has designed an alternative script ("phonemic script") that is highly phonemic in the sense that most of the information carried in the word is represented, without redundancy (see [ornan86]). Much work has been done within this framework. Algorithms for converting phonemic script to Hebrew script (of both types) exist. The reverse direction is underway. More information on these algorithms can be obtained from Uzzi Ornan. Work on phonemic script involves both analysis and generation, and separate projects dealt with Nouns and with Verbs. Most of the work was done in the department of Computer Science at the Technion, Haifa. Lyor Goldstein worked on part of the nouns (see [goldstein]); Michal Shany-Klein dealt with the rest (see [shany]); and Alon Lavie took care of the verbs using a Koskeniemi-style two-level technique ([lavie88,lavie89]). Outside the Technion work was done mainly on the regular Hebrew Script; in Bar Ilan University Prof. Choueka has developed a system that involves both an analyzer and a generator for Hebrew. Contact him for more details. Of course, such a system necessarily produces highly ambiguous output. A similar system was developed in IBM Scientific Center in Haifa by a team leaded by Mori Rimon (see [ibm]). Both systems are being used commercially for spelling checking in Hebrew word processors. A different work was done by Moshe Levinger. He used short-context information for disambiguating Hebrew words written in the regular script. This work is described in [levinger]. SYNTAX The first work I'm aware of dealing with Hebrew Syntax is Cohen's (see [cohen]). He tried to deal with the syntax and the morphology in one system, having as input Hebrew Script, and as output - Context Free trees representing the sentences' structure. The work was done from scratch - no use was made of any linguistic theory or application. The results were rather promising but this project was never continued and is quite archaic today. I couple of years ago a more comprehensive parser for Hebrew was designed by Uzzi Ornan and myself, having as input Hebrew sentences represented in Phonemic Script, where the lexicon consisted of derivations, and producing as output trees representing both the syntactic structure and the functional structure of the input. This parser was developed using Tomita's Generalized LR Parser/Compiler. The work is described in [wintner91a,wintner91b]. Part of our work involved a survey of current linguistic theories and implementations and assessment of their fitness for Hebrew. In this framework we developed small Hebrew grammars with PATR (see [wintner92]) and with McCord's Slot Grammar. The results are described in [wintner91b, wintner91c]). The other two works I'm aware of are Elana Cohen-Zamir's syntactic analyzer, that uses a Fillmore-style dictionary to decode verb phrases ([cohen-zamir]) and Orly Albeck's algorithm for an expectation-based parsing. A minor attempt was done in IBM Scientific Center to design a Hebrew grammar under McCord's Slot Grammar framework. I designed the grammar, which was connected to a morphological analyzer, and was very small in scope. Some results of this experiment can be found in [wintner91b]. OTHER WORKS An Israeli company, Tovna, is developing systems for machine translation. As far as I know, they do not work on translation to or from Hebrew. I'm not aware of their success. The ministry of Science and Technology organized a yearly symposium dedicated to Hebrew Computational Linguistics. Such symposia were held 1988,1989 and 1990, and some of the papers presented there were collected in [hcl92]. REFERENCES *cohen: Cohen,Daniel: "Mechanical Syntactic Analysis of a Hebrew Sentence", PhD dissertation, Hebrew University, Jerusalem, 1984. *cohen-zamir: Cohen-Zamir, Elana: "Syntactic Semantic Parser for a Hebrew Sentence with no Context", MSc thesis, Technion, 1991. *goldstein: Goldstein, Lyor: "Generation and Analysis of the Possession Inflection of Hebrew Nouns", MSc thesis, Technion, 1991. *hcl92: Ornan, Uzzi, Gideon Ariely and Edit Doron, Eds.: "Hebrew Computational Linguistics", Ministry of Science and Technology, 1992. *ibm: Ester Bentur, Aviela Angel, Danit Segev and Alon Lavie: "Analysis and Generation of the Nouns inflection in Hebrew", in "Hebrew Computational Linguistics", 1992. lavie88: Lavie, Alon, Alon Itai, Uzzi Ornan and Mori Rimon: "On the Applicability of Two-level Morphology to the Inflection of Hebrew Verbs", TR 513, Computer Science, Technion; also in Proc. of the Intl. Conf. of the ALLC, Jerusalem, 1988. *lavie89: Lavie, Alon: "Two-level Morphology for Hebrew", MSc thesis, Technion, 1989. *levinger: Levinger, Moshe: "Morphologic Disambiguation in Hebrew", MSc thesis, Technion, 1992. ornan86: Ornan, Uzzi: "Phonemic Script: A Central Vehicle for Processing Natural Language - The Case of Hebrew", IBM TR 88.181, IBM Haifa Scientific Center, 1986. *shany-klein: Shany-Klein, Michal: "Generation and Analysis of Segolate Noun Inflection in Hebrew", MSc thesis, Technion, 1990. wintner91a: Wintner, Shuly and Ornan, Uzzi: "Syntactic Analysis of Hebrew Sentences", in Proceedings of the 8th Israeli Symposium on Artificial Intelligence and Computer Vision, 1991. *wintner91b: Wintner, Shuly: "Syntactic Analysis of Hebrew Sentences", MSc thesis, Technion, 1991. wintner91c: Wintner, Shuly and Uzzi Ornan: "Computational Models for Syntactic Analysis -- Their Fitness for Writing a Computational Grammar for Hebrew", CIS report 9103, Center for Intelligent Systems, Technion, Israel Institute of Technology, 1991. *wintner92: Wintner, Shuly: "Syntactic Analysis of Hebrew Sentences Using PATR", in "Hebrew Computational Linguistics", 1992. Papers marked with an asterisk are in Hebrew.