Understanding how large language models store, retain, and remove knowledge is critical for interpretability, reliability, and compliance with privacy regulations.
My work introduces a geometric perspective on memorization and unlearning by analyzing loss behavior over semantically similar inputs through the Input Loss Landscape.
I show that retained, forgotten, and unseen examples exhibit distinct patterns that reflect active learning, suppressed knowledge, and ignored information.
Building on this observation, I propose REMIND (Residual Memorization In Neighborhood Dynamics), a black-box framework for diagnosing residual memorization. I further introduce a new semantic neighbor generation method that enables controlled exploration of local loss geometry.
These contributions provide interpretable insights into knowledge retention and forgetting, and offer practical tools for auditing, debugging, and enhancing transparency in large language models.