Dor Zohar, M.Sc. Thesis Seminar
Thursday, 11.10.2018, 14:30
In many Natural Language Processing classification tasks, the label space consists of the entire vocabulary, and therefore might have hundreds of thousands of labels. Important tasks such as language modeling, machine translation and dialog systems all have vocabulary label sets. Due to Zipf's law, a large number of words in the vocabulary will have only a few appearances in the corpus, hindering the ability to learn proper representations for these words.
This work utilizes a prior hierarchical clustering of the words in the label set, in order to achieve better representation of the words. The hierarchical structure enables starting with a label set of coarse-grained concepts, and gradually refining it to the whole vocabulary.
In our work, we examine two tasks with vocabulary label sets - language modeling and word2vec. We present the contribution of the prior knowledge to the performance on the two tasks comparing to the baseline, both in intrinsic and extrinsic tests.