Technical Report MSC-2021-29

Title: How Debiasing Affects Internal Representations in Natural Language Understanding Models
Authors: Michael Mendelson
Supervisors: Yonatan Belinkov
PDFCurrently accessibly only within the Technion network
Abstract: Natural language processing models are known to be prone to adopting biases or spurious correlations found in the data. Relying on them when making predictions instead of the real meaning can lead to poorly performing models in the real world. As such, model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets (challenge sets). To mitigate artifacts and bias in model predictions, recent work on debiasing methods in natural language understanding (NLU) improves performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model’s inner representations.

We propose a general probing-based framework that allows for post-hoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model's representations. By defining ``bias-revealing'' properties, we are able to measure the amount of information on bias available in model representations. Our framework is easily extensible to other domains.

We experiment with several NLU datasets and known biases and find consistent results across various combinations. We analyze models trained for natural language inference and fact verification (both challenging NLU tasks) and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually available in its inner representations.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2021
To the main CS technical reports page

Computer science department, Technion