Technical Report MSC-2013-05

Title: Automatic Extraction of Subcategorization Frames for Hebrew
Authors: Hanna Fadida
Supervisors: Alon Itai and Shuly Wintner
Abstract: This work automatically constructs from large text corpora the first subcategorization dictionary of Modern Hebrew. Using available resources, the corpora were morphologically analyzed and syntactically parsed. Standard collocation measures were employed to assess the degree to which potential complements tend to combine with each verb, focusing on a small set of potential complement types, which covers the vast majority of complement instances in the corpora. No attempt was made to construct full subcategorization frames; rather, each complement type was viewed in isolation, and its likelihood to combine with the verb was determined. The result is a wide-coverage dictionary of almost 3,000 verb lemmas, listing more than 6,500 verb-complement pairs, each with a statistically-derived score. The quality of the dictionary was evaluated both intrinsically and extrinsically. First, a small set of representative verbs and their canonical complements was manually constructed. The automatically-extracted dictionary achieved high precision and recall on this test set. Second, linguistic knowledge pertaining to verb subcategorization frames was incorporated in two computational tasks: reducing the ambiguity of PP-attachment and translating from Arabic to Hebrew. This demonstrated that knowledge derived from the dictionary is instrumental in significantly improving the accuracy of these two tasks. The contribution of this work is thus a digital, freely-available, wide-coverage and accurate verb subcategorization dictionary of Hebrew.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2013
To the main CS technical reports page

Computer science department, Technion