236802 Seminar in Computer Science II

Statistical and Corpora Based Methods for Processing Natural Languages

Corpora based methods attempt to learn usage patterns from large bodies of texts, and then utilize this knowledge to solve problems in Natural Language processing. For example:

Word sense disambiguation - How to distinguish between river bank and bank as a financial institution.
Part of speech tagging - How to distinguish between book the noun, and book the verb.
Alignment - Find the correspondence between words, sentences and paragraphs of a source text and its translation.
Morphological disambiguation in Hebrew - How to read correctly regular (unvocalized) Hebrew Script.

Prerequisites

We will heavily depend on statistics and probability, so some knowledge and intuition in these matters is a must.

Probability (required)
Natural Language Processing (preferred, not required)

Announcements

Syllabus

Introduction
Hidden Markov chains
Probabilistic Context Free Grammars
Word Sense Disambiguation
Morphological Disambiguation in Hebrew
Paragraph, sentence and word alignment

Papers

Projects

People who have not chosen a lecture can do a project. Project list

Staff

Instructors:
Prof. Alon Itai (


		
		itai@cs.technion.ac.il

)
Assistants: