אירועים

אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב

אימון מקדים של מודלי שפה מכוון שיח

זכרי אלישע במרגר (הרצאה סמינריונית למגיסטר)

יום ראשון, 07.04.2024, 10:00

חדר 601

מנחה: Dr. Yonatan Belinkov

Language Models (LMs) excel in many tasks, but understanding discourse – how sentences connect to form coherent text – remains a challenge. This is especially true for smaller models aiming to match the abilities of their larger counterparts in handling long and complex inputs. To address this, we introduce DEPTH, a new encoder-decoder model designed to foster robust discourse-level representations during the pre-training phase. DEPTH uniquely combines hierarchical sentence representations and the “Sentence Un-shuffling” task with traditional span-corruption objective of encoder-decoder LMs. While span-corruption helps the model learn word-level dependencies, ”Sentence Un- shuffling” forces it to restore the natural order of scrambled sentences, teaching it about the logical flow of language. The encoder-decoder architecture allows DEPTH to consider words both before and after a given token, offering a more nuanced contextual understanding than decoder-only models like GPT. This is crucial for tasks that depend on how sentences interrelate.

We built a pre-training and fine-tuning framework for encoder-decoder models that facilitated our experiments with both T5 and DEPTH, and that seamlessly integrates with open-source tools for distributed training in the HuggingFace ecosystem. DEPTH’s training is remarkably computationally efficient, as it learns meaningful semantic- and discourse-level representations at a faster rate than it’s T5 counterpart. Notably, already in early stages of the pre-training phase, we find that DEPTH outperforms T5 by reaching a lower span-corruption loss. This occurs despite the fact that T5 is trained solely with this objective, and DEPTH is tasked with the additional sentence un-shuffling objective. Evaluations on GLUE and DiscoEval benchmarks demonstrate DEPTH’s ability to quickly learn downstream tasks spanning understanding of syntax (CoLA), sentinment analysis (SST2), sentence positioning (SP), discourse coherence (DC), and natural language inference (MNLI).

[בחזרה לאינדקס האירועים]