Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Events

The Taub Faculty of Computer Science Events and Talks

Pixel Club: Overcoming Critical Challenges - Towards Reliable, Enhanced, and Efficient VLMs
event speaker icon
Nimrod Shabtay (Tel-Aviv University)
event date icon
Tuesday, 02.12.2025, 11:30
event location icon
506, Zisapel Building

Vision-language models have achieved impressive performance across diverse tasks, yet they face critical challenges, for example in how we evaluate them, how they use visual context, and how efficiently they process information. This talk presents three interconnected works addressing these limitations. First, LiveXiv provides contamination-free evaluation by automatically generating benchmarks from newly published scientific papers, revealing that some reported VLM improvements may stem from test set contamination rather than genuine advances. Second, IPLoc exposes a surprising gap: current VLMs, struggle with personalized object localization, failing to learn from visual examples the way humans naturally do. By teaching models to focus on contextual cues rather than relying solely on prior knowledge, we significantly improve their few-shot localization abilities. Finally, CARES addresses efficiency by recognizing that not all queries need high-resolution images. Using a lightweight module to predict the minimal sufficient resolution per query, we reduce computational costs by up to 80% while maintaining accuracy. Together, these works demonstrate that context-awareness—in evaluation, visual reasoning, and resource allocation – is essential for building VLMs that are more reliable, capable, and practical for real-world deployment

Nimrod Shabtay is a PhD candidate at the faculty of engneering at Tel-Aviv University and a research intern at IBM-Research, supervised by Prof. Raja Giryes.
His research focuses on Large Multimodal Models (LMMs). He is particularly interested in overcoming critical challenges towards reliable, enhanced, and efficient LMMs