Skip to Main content Skip to Navigation
Book sections

On U -processes and clustering performance

Abstract : Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order O P (1/ √ n). Based on recent results related to the tail behavior of degenerate U-processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Stephan Clémençon Connect in order to contact the contributor
Submitted on : Tuesday, April 23, 2019 - 4:13:22 PM
Last modification on : Tuesday, October 19, 2021 - 11:14:12 AM


Files produced by the author(s)


  • HAL Id : hal-02107349, version 1



Stéphan Clémençon. On U -processes and clustering performance. On U -processes and clustering performance, 2011. ⟨hal-02107349⟩



Record views


Files downloads