Simple and Effective Specialized Representations for Fair Classifiers

Simple and Effective Specialized
Representations for Fair Classifiers

1. Human Inspired Technology Research Center, University of Padua (Padova), Italy
2. Department of Information Engineering, University of Padua (Padova), Italy

* Equal contribution

Proceedings of the 39th Annual Conference on Neural Information Processing Systems (Neurips), San Diego, California, 2025.

            
    @inproceedings{SinigagliaSartorCecconSusto2025,
        author    = {Alberto Sinigaglia and Davide Sartor and Marina Ceccon and Gian Antonio Susto},
        title     = {Simple and Effective Specialized Representations for Fair Classifiers},
        booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
        year      = {2025},
    }

TL;DR

We learn specialized fair representations without adversarial training using a Characteristic Function Distance penalty (FmCF).

We introduce a simpler, classifier-focused variant (FmSS) that matches first/second moments and yields provable guarantees for logistic downstream models.

Across standard fairness benchmarks, our methods match or surpass prior work on accuracy while reducing sensitive-attribute leakage.

Abstract

Fair classification is a critical challenge that has gained increasing importance due to international regulations and its growing use in high-stakes decision-making settings. Existing methods often rely on adversarial learning or distribution matching across sensitive groups; however, adversarial learning can be unstable, and distribution matching can be computationally intensive. To address these limitations, we propose a novel approach based on the characteristic function distance. Our method ensures that the learned representation contains minimal sensitive information while maintaining high effectiveness for downstream tasks. By utilizing characteristic functions, we achieve a more stable and efficient solution compared to traditional methods. Additionally, we introduce a simple relaxation of the objective function that guarantees fairness in common classification models with no performance degradation. Experimental results on benchmark datasets demonstrate that our approach consistently matches or achieves better fairness and predictive accuracy than existing methods. Moreover, our method maintains robustness and computational efficiency, making it a practical solution for real-world applications.

Overview of the framework: the encoder maps inputs $X$ to representations $Z$ that retain task-relevant structure while minimizing sensitive information via (i) CF matching (FmCF) or (ii) sufficient-statistics alignment (FmSS).

Method at a Glance

FmCF — Fairness matching Characteristic Function

For each group $s \in \mathcal{S}$, we penalize the Characteristic Function Distance (CFD) between $\mathbb{P}(Z\mid S{=}s)$ and a target (e.g., standard Normal):

$$\operatorname{CFD}_{\mathbb{P}_T}^2\big(\mathbb{P}(Z\mid s), \mathbb{P}(\mathcal{N})\big)\;=\; \mathbb{E}_{T}\big[\,\lvert \varphi_{Z\mid s}(T) - e^{-\|T\|^2/2}\rvert^2\,\big].$$

Monte Carlo draws $T\!\sim\!\mathbb{P}_T$ make the objective differentiable and easy to use with any encoder.

FmSS — Fairness matching Sufficient Statistics

For classification, matching the first two moments of $\mathbb{P}(Z\mid S)$ is sufficient to limit a logistic adversary. We minimize the KL to $\mathcal{N}(0,1)$ per group:

$$\mathrm{KL}\big(\mathcal{N}(\mu_s,\sigma_s^2)\,\|\,\mathcal{N}(0,1)\big) = \tfrac{1}{2}\,\big(\sigma_s^2 + \mu_s^2 - 1 - \log \sigma_s^2\big).$$

This yields provable guarantees for logistic downstream models while remaining lightweight and stable in practice.

Fairness Guarantees & Practical Notes

Provable control (FmSS)

Matching group-wise means and variances makes $Z$ uninformative for a logistic adversary (coefficients trend to zero as moments align), delivering post-hoc fairness certificates for that model family.

Practical deployment

Unlike several competing approaches, the trained classifier does not require the sensitive attribute at inference. The penalties add little overhead and plug into standard PyTorch training loops.