AUC optimization and the two-sample problem

Stéphan Clémençon; Marine Depecker; Nicolas Vayatis

Proceedings/Recueil Des Communications Année : 2009

AUC optimization and the two-sample problem

(1, 2) , (3) , (4)

1
2
3
4

Stéphan Clémençon

Fonction : Auteur
PersonId : 174491
IdHAL : stephan-clemencon
ORCID : 0000-0002-5879-9500
IdRef : 08905203X

Laboratoire Traitement et Communication de l'Information

Département Images, Données, Signal

Marine Depecker

Fonction : Auteur

Laboratoire Traitement et Communication de l'Information

Nicolas Vayatis

Fonction : Auteur
PersonId : 848026

Centre de Mathématiques et de Leurs Applications

Résumé

The purpose of the paper is to explore the connection between multivariate ho-mogeneity tests and AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that, in the two-sample problem setup, the null assumption corresponds to the situation where the area under the optimal ROC curve is equal to 1/2, we propose a two-stage testing method based on data splitting. A nearly optimal scoring function in the AUC sense is first learnt from one of the two half-samples. Data from the remaining half-sample are then projected onto the real line and eventually ranked according to the scoring function computed at the first stage. The last step amounts to performing a standard Mann-Whitney Wilcoxon test in the one-dimensional framework. We show that the learning step of the procedure does not affect the consistency of the test as well as its properties in terms of power, provided the ranking produced is accurate enough in the AUC sense. The results of a numerical experiment are eventually displayed in order to show the efficiency of the method.

Domaines

Mathématiques [math] Probabilités [math.PR] Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

3838-auc-optimization-and-the-two-sample-problem.pdf (242.26 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Stephan Clémençon : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02107262

Soumis le : mardi 23 avril 2019-15:47:55

Dernière modification le : lundi 8 avril 2024-12:24:02

Dates et versions

hal-02107262 , version 1 (23-04-2019)

Identifiants

HAL Id : hal-02107262 , version 1

Citer

Stéphan Clémençon, Marine Depecker, Nicolas Vayatis. AUC optimization and the two-sample problem. 2009, Advances in Neural Information Processing Systems 22. ⟨hal-02107262⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS ENS-CACHAN PARISTECH LTCI IDS S2A ENS-PARIS-SACLAY

185 Consultations

239 Téléchargements

AUC optimization and the two-sample problem

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager