Skip to Main content Skip to Navigation
Conference papers

Learning from Biased Data: A Semi-Parametric Approach

Abstract : We consider risk minimization problems where the (source) distribution P S of the training observations Z 1 ,. .. , Z n differs from the (target) distribution P T involved in the risk that one seeks to minimize. Under the natural assumption that P S dominates P T , i.e. P T < < P S , we develop a semiparametric framework in the situation where we do not observe any sample from P T , but rather have access to some auxiliary information at the target population scale. More precisely, assuming that the Radon-Nikodym derivative dP T /dP S (z) belongs to a parametric class {g(z, α), α ∈ A} and that some (generalized) moments of P T are available to the learner, we propose a two-step learning procedure to perform the risk minimization task. We first selectα so as to match the moment constraints as closely as possible and then reweight each (biased) training observation Z i by g(Z i ,α) in the final Empirical Risk Minimization (ERM) algorithm. We establish a O P (1/ √ n) generalization bound proving that, remarkably, the solution to the weighted ERM problem thus constructed achieves a learning rate of the same order as that attained in absence of any sampling bias. Beyond these theoretical guarantees, numerical results providing strong empirical evidence of the relevance of the approach promoted in this article are displayed.
Complete list of metadata

https://hal.telecom-paris.fr/hal-03559370
Contributor : Stephan Clémençon Connect in order to contact the contributor
Submitted on : Sunday, February 6, 2022 - 4:20:02 PM
Last modification on : Friday, February 18, 2022 - 3:32:50 AM
Long-term archiving on: : Saturday, May 7, 2022 - 6:09:34 PM

File

bertail21a.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03559370, version 1

Collections

Citation

Stéphan Clémençon, Patrice Bertail, Yannick Guyonvarch, Nathan Noiry. Learning from Biased Data: A Semi-Parametric Approach. 38th International Conference on Machine Learning (ICML 2021), 2021, Bilbao, Spain. ⟨hal-03559370⟩

Share

Metrics

Record views

14

Files downloads

9