Laboratory for
Computational Linguistics
Current Projects (under construction...)
-
Probabilistic Morphological Analyzer for Hebrew Undotted Texts
Morphological analysis of words in a text is the first stage of most natural
language applications. Due to the rich morphology of the Hebrew language and
the inadequacy of the undotted script which results in a great degree of
morphological ambiguity, the problem has not yet found a satisfactory solution.
We notice that the problem of morphological analysis of Hebrew texts is similar
to the well-studied problem of part-of-speech tagging in English, and thus can
apply some of the approaches used to solve that problem.
The work results in a morphological analysis which is correct for about 96% of
the words. This result approaches results reported for English probabilistic
part-of-speech tagging. It does so by using a very small training corpus - 5000
words only, in contrast to million-word corpora used for English tagging.
Researchers: Erel Segal, Alon Itai
Additional information
-
Corpus Based Analysis of Hebrew
The project's aim is to study ways to arrive at better tools for morphological
and syntactic analysis of Hebrew using corpus based techniques.
The project is part of a joint project conducted in collaboration with the
corpus based NLP group of the Computer Science Institute of of the Hebrew
University, Jerusalem, headed by Prof. Eli Shamir. The project is sponsored by
a grant from the Israel Ministry of Science.
Principle investigators: Alon Itai and Yoad Winter
Additional information.
-
Extensions and Implementations of Natural Logic:
This project develops an inference system for natural language that is based
on Natural Logic: a logic that works directly on syntactic representations
with no intermediate translation to a more familiar logical formalism.
We use the proposal of Van Benthem (1987) and Sanchez (1991) as a starting
point for a system that deals with monotonicity-based reasoning. This
system is extended for natural language coordination and non-monotonic
expressions in Categorial Grammar. Most of the recent developments in the
project appear in an ICOS-2 paper.
The next stages of the project will involve a construction of a prototype
system computing inferences in natural language, extensions to more
linguistic phenomena and a general platform for extending various grammar
formalism into (fragments of) natural logic.
The project is a joint enterprise of the Technion CS department and the
UCLA Linguistics department, and it is sponsored by a grant from the
Binationational Science Foundation (BSF).
Researchers:
Nissim Francez, Yaroslav Fyodorov, Yoad Winter (Technion)
Ed Keenan, Henk Harkema, Shannon Madsen (UCLA)
-
Semantics of Natural Language Temporal Questions and Interfaces to
Temporal Database Systems:
Traditional database systems retain snapshot information, valid at
a given moment in time. Much research has been devoted to develop temporal
databases which allow users to store time-dependent information.
In recent years, there is an effort to consolidate and standardize this
research into a single consensus temporal model and temporal query language,
called SQL/Temporal.
SQL/Temporal adds several temporal constructs to standard SQL, and is
therefore expected to be difficult to use for non-experts. We are
developing a natural language interface for temporal
databases, based on a semantic treatment of temporal questions.
Using the interface, the user may express questions in
natural language and have them automatically translated and submitted
to the database.
The project is sponsored by FIRST, administered by the Isreali Acedmy of
science.
Researchers: Rani Nelken, Yoad Winter, Nissim Francez
- Distributional Clustering using Large Corpora:
A system that constructs a knowledge base describing semantic
similarity between texts by distributional clustering of expressions
extracted from large coprpora using NLP techniques.
Researchers: Ron Bekkerman, Ran El-Yaniv, Yoad Winter
If you have comments or suggestions, email us at lcl@cs.technion.ac.il