The Hebrew Corpus

The Hebrew Corpus is a project developed at the Ben-Gurion University, and can be downloaded through their site. We adapted their format to MorphTagger unambiguous file format. We also made some fixes to the corpus so that it will be more fitting with the hebrew morphological analyzer. The version we use can be downloaded here

HAMSA

As a mophological analyzer for Hebrew, we use HAMSA developed at MILA-Knowledge Center for Hebrew Processing. The analyzer can be run through a web interface , or alternatively, can be obtained (free) by submitting a request here The output of HAMSA over the Hebrew Corpus text, fixed and formatted in MorphTagger ambiguous format, can be downloaded here