Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Events

The Taub Faculty of Computer Science Events and Talks

Enabling Single-Cell Foundation Models for Scarce Data via Dependency-Aware Masking
event speaker icon
Alon Hacohen (M.Sc. Thesis Seminar)
event date icon
Thursday, 16.04.2026, 14:00
event location icon
Taub 601 & Zoom
event speaker icon
Advisor: Prof. Dvir Aran

The adaptation of Large Language Model architectures to computational biology has enabled Single-Cell Foundation Models for learning from single-cell RNA-sequencing data. However, many of these models rely on masking strategies from natural language processing; unlike words in a sentence, gene expression is governed by highly correlated regulatory networks, making random masking or structure naive techniques biologically misaligned. Viewed through an information-theoretic lens, this introduces a key inefficiency: models can reconstruct masked genes from local correlations, limiting their ability to learn accurate biological representations based on higher-order structure and driving reliance on large datasets - a challenge in data-scarce settings such as rare disease cohorts or privacy-preserving environments.

To address this, we introduce domain-informed masking during pre-training. In this talk, we present CorrMask, a data-driven, dependency-aware masking scheme that leverages gene correlation structure to jointly mask related genes, encouraging learning from global cellular context. Across tissue-specific datasets, CorrMask matches baseline performance on both cell- and gene-level tasks using less data, with the strongest gains in underrepresented cell populations.

These results position CorrMask as an effective “data multiplier” for enabling efficient, biologically grounded foundation models, with broader implications for predictive modeling in our field.