To join the email distribution list of the cs colloquia, please visit the list subscription page.
Computer Science events calendar in HTTP ICS format for of Google calendars, and for Outlook.
Academic Calendar at Technion site.
As LLMs generate increasingly long outputs, effective uncertainty estimation must identify errors at fine-grained levels rather than discard entire responses. While such methods exist, evaluating uncertainty at any resolution (token to an entire generation) is challenging and highly sensitive to label imperfections, making zero-noise benchmarks essential; yet, long-form generation benchmarks tend to rely on fallible labels rather than deterministic ground truth.
We introduce Single-answer Atomic Long-form Target (SALT), a benchmark of six procedurally generated tasks with single deterministic long textual ground truths, enabling unit-level evaluation of correctness, calibration, and ranking without external judges. Equipped with SALT, our analysis of 50+ LLMs reveals key insights: We identify which confidence functions dominate each uncertainty aspect and show that effective ranking benefits more from coarser evaluation resolutions; SALT further facilitates precise calibration tracking throughout generation, revealing a divergence in the accuracy–calibration relationship, with high- and low-performing models exhibiting degradation ($\rho=0.87$) and improvement ($\rho=-0.92$).
Finally, we demonstrate that reasoning, via Chain-of-Thought prompting or internalized through training, introduces a trade-off, improving accuracy while degrading confidence ranking. These findings directly impact risk-critical applications requiring reliable error identification and mitigation.
Protein sequences are abundant in repeating segments, both as exact copies and as approximate segments with mutations. These repeats are important for protein structure and function, motivating decades of algorithmic work on repeat identification. Recent work has shown that protein language models (PLMs) identify repeats, by examining their behavior in masked-token prediction.
To elucidate their internal mechanisms, we investigate how PLMs detect both exact and approximate repeats. We find that the mechanism for approximate repeats functionally subsumes that of exact repeats.
We then characterize this mechanism, revealing two main stages: PLMs first build feature representations using both general positional attention heads and biologically specialized components, such as neurons that encode amino-acid similarity. Then, induction heads attend to aligned tokens across repeated segments, promoting the correct answer.
Our results reveal how PLMs solve this biological task by combining language-based pattern matching with specialized biological knowledge, thereby establishing a basis for studying more complex evolutionary processes in PLMs.
Amado 814
The study of spectral graph determination is a central and fascinating topic in spectral graph theory and algebraic combinatorics. This area investigates the spectral characterization of various classes of graphs, develops methods for constructing and distinguishing cospectral nonisomorphic graphs, and analyzes the conditions under which the spectrum of a graph uniquely determines its structure. In the first part of the seminar, we present both classical results and recent advances in spectral graph determination.
The study of graph symmetries and different notions of transitivity is also of fundamental interest in algebraic graph theory. In the second part of the talk, we examine transitivity properties of Gilbert graphs and their complements, and discuss the main ideas underlying these results.