236802 Seminar in Computer Science II
Statistical and Corpora Based Methods for Processing Natural
Languages
Corpora based methods
attempt to learn usage patterns from large bodies of texts,
and then utilize this knowledge to solve problems in Natural Language
processing.
For example:
- Word sense disambiguation -
How to distinguish between river bank and bank as
a financial institution.
- Part of speech tagging -
How to distinguish between book the noun, and
book the verb.
- Alignment -
Find the correspondence between words, sentences and paragraphs
of a source text and its translation.
- Morphological disambiguation in Hebrew -
How to read correctly regular (unvocalized) Hebrew Script.
Prerequisites
We will heavily depend on statistics and probability, so some
knowledge and intuition in these matters is a must.
- Probability (required)
- Natural Language Processing (preferred, not required)
Announcements
Syllabus
- Introduction
- Hidden Markov chains
- Probabilistic Context Free Grammars
- Word Sense Disambiguation
- Morphological Disambiguation in Hebrew
- Paragraph, sentence and word alignment
Papers
Projects
People who have not chosen a lecture can do a project.
Project list
Staff
Instructors:
Prof. Alon Itai
(
itai@cs.technion.ac.il
)
Assistants: