. .. The-median-of-means-estimator, 1.3 Extension to Multidimensional Random Variables, p.119

. .. The-median-of-randomized-means-estimator,

. .. The-median-of-u--statistics-estimator,

-. The-median-of-randomized and . .. Estimator,

. .. Conclusion,

. Sc, , pp.32-35

G. Alain and Y. Bengio, What regularized auto-encoders learn from the datagenerating distribution, J. Mach. Learn. Res, vol.15, issue.1, pp.3563-3593, 2014.

N. Alon, Y. Matias, and M. Szegedy, The space complexity of approximating the frequency moments, Journal of Computer and system sciences, vol.58, issue.1, pp.137-147, 1999.

M. A. Álvarez, L. Rosasco, and N. D. Lawrence, Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, vol.4, pp.195-266, 2012.

M. A. Arcones and E. Gine, Limit theorems for u-processes. The Annals of Probability, pp.1494-1542, 1993.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.10, p.22, 1950.

J. Audibert and O. Catoni, Robust linear least squares regression, The Annals of Statistics, vol.39, issue.5, pp.2766-2794, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00522534

J. Audiffren and H. Kadri, Stability of multi-task kernel regression algorithms, Asian Conference on Machine Learning, pp.1-16, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00834994

G. Ausset, S. Clémençon, and F. Portier, Empirical Risk Minimization under Random Censorship: Theory and Practice, vol.166, p.167, 2019.

F. R. Bach and M. I. Jordan, Kernel independent component analysis, Journal of machine learning research, vol.3, pp.1-48, 2002.

P. Baldi, Autoencoders, unsupervised learning, and deep architectures, Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp.37-49, 2012.

P. L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, vol.3, pp.463-482, 2002.

H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, Springer. Pages, vol.408, p.79, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643354

A. Bellet and A. Habrard, Robustness and Generalization for Metric Learning, Neurocomputing, vol.151, issue.1, pp.259-267, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01075370

A. Bellet, A. Habrard, and M. Sebban, Similarity Learning for Provably Accurate Sparse Linear Classification, p.107, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00708401

A. Bellet, A. Habrard, and M. Sebban, A Survey on Metric Learning for Feature Vectors and Structured Data, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01666935

A. Bellet, A. Habrard, and M. Sebban, Metric Learning, p.107, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01121733

S. Ben-david, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira et al., A theory of learning from different domains, Machine Learning, vol.32, p.166, 2010.

Y. Bengio, A. Courville, P. Vincent, and V. Umanità, Representation learning: a review and new perspectives, IEEE transactions on pattern analysis and machine intelligence, vol.35, p.72, 2013.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, Advances in neural information processing systems, pp.153-160, 2007.

Y. Bengio, L. Yao, G. Alain, and P. Vincent, Generalized denoising autoencoders as generative models, Advances in neural information processing systems, pp.899-907, 2013.

A. Berlinet and C. Thomas-agnan, Reproducing kernel Hilbert spaces in probability and statistics, p.25, 2011.

P. Bertail and J. Tressou, Incomplete generalized u-statistics for food risk assessment, Biometrics, vol.62, issue.1, pp.66-74, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01068794

G. Blom, Some properties of incomplete U-statistics, Biometrika, vol.63, issue.3, pp.573-580, 1976.

B. Bohn, C. Rieger, and M. Griebel, A representer theorem for deep kernel learning, Journal of Machine Learning Research, vol.20, issue.64, pp.1-32, 2019.

T. Bolukbasi, K. Chang, J. Zou, V. Saligrama, and A. Kalai, Man is to computer programmer as woman is to homemaker? debiasing word embeddings, Advances in Neural Information Processing Systems (NIPS), pp.4349-4357, 2016.

L. Bottou, Online learning and stochastic approximations, vol.17, pp.142-154, 1998.

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification : a survey of some recent advances, ESAIM: Probability and Statistics, vol.9, pp.323-375, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00017923

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, OUP Oxford. Pages, vol.11, p.166, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

H. Bourlard and Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological cybernetics, vol.59, issue.4, p.39, 1988.

O. Bousquet and A. Elisseeff, Stability and generalization, Journal of Machine Learning Research, vol.2, pp.499-526, 2002.

S. Boyd and L. Vandenberghe, Convex optimization, p.78, 2004.

R. Brault, M. Heinonen, and F. Buc, Random fourier features for operator-valued kernels, Asian Conference on Machine Learning, vol.60, p.203, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01313005

R. Brault, A. Lambert, Z. Szabo, M. Sangnier, and F. Alché-buc, Infinite task learning in rkhss, The 22nd International Conference on Artificial Intelligence and Statistics, vol.28, p.94, 2019.

C. Brouard, F. Buc, and M. Szafranski, Semi-supervised penalized output kernel regression for link prediction, International Conference on Machine Learning (ICML), pp.593-600, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00654123

C. Brouard, H. Shen, K. Dührkop, F. Buc, S. Böcker et al., Fast metabolite identification with input output kernel regression, Bioinformatics, vol.32, issue.12, p.99, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02637720

C. Brouard, M. Szafranski, and F. Buc, Input output kernel regression: supervised and semi-supervised structured output prediction with operator-valued kernels, The Journal of Machine Learning Research, vol.17, issue.1, p.164, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01216708

C. Brownlees, E. Joly, and G. Lugosi, Empirical risk minimization for heavy-tailed losses, The Annals of Statistics, vol.43, issue.6, pp.2507-2536, 2015.

J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun, Spectral networks and locally connected networks on graphs, p.35, 2013.

S. Bubeck, N. Cesa-bianchi, and G. Lugosi, Bandits with heavy tail, IEEE Transactions on Information Theory, vol.59, issue.11, pp.7711-7717, 2013.

K. Burns, L. Hendricks, K. Saenko, T. Darrell, and A. Rohrbach, Women also snowboard: Overcoming bias in captioning models, vol.15, p.166, 2018.

H. Callaert and P. Janssen, The Berry-Esseen theorem for U-statistics, The Annals of Statistics, vol.6, issue.2, pp.417-421, 1978.

A. Caponnetto, C. A. Micchelli, M. , Y. , and Y. , Universal multitask kernels, Journal of Machine Learning Research, vol.9, pp.1615-1646, 2008.

C. Carmeli, E. De-vito, and A. Toigo, Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem, Analysis and Applications, vol.4, issue.04, pp.377-408, 2006.

C. Carmeli, E. De-vito, A. Toigo, and V. Umanitá, Vector valued reproducing kernel hilbert spaces and universality, Analysis and Applications, vol.8, issue.01, p.96, 2010.

O. Catoni, Challenging the empirical mean and empirical variance: a deviation study, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, vol.48, p.136, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00517206

F. Chatelin, Spectral approximation of linear operators, p.98, 2011.

F. Chollet, , 2015.

C. Ciliberto, L. Rosasco, and A. Rudi, A consistent regularization approach for structured prediction, Advances in Neural Information Processing Systems (NIPS) 29, pp.4412-4420, 2016.

S. Clémençon, A statistical view of clustering performance through the theory of U-processes, Journal of Multivariate Analysis, vol.124, pp.42-56, 2014.

S. Clémençon, P. Bertail, and E. Chautru, Sampling and empirical risk minimization, Statistics, vol.51, issue.1, pp.30-42, 2017.

S. Clémençon, I. Colin, and A. Bellet, Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics, Journal of Machine Learning Research, vol.17, pp.1-36, 2016.

S. Clémençon, G. Lugosi, and N. Vayatis, Ranking and scoring using empirical risk minimization, Proceedings of COLT, p.107, 2005.

S. Clémençon, G. Lugosi, and N. Vayatis, Ranking and empirical risk minimization of U-statistics, The Annals of Statistics, vol.36, issue.2, p.161, 2008.

S. Clémençon, S. Robbiano, and J. Tressou, Maximal deviations of incomplete u-statistics with applications to empirical risk sampling, Proceedings of the 2013 SIAM International Conference on Data Mining, p.147, 2013.

C. Cortes, M. Mohri, W. , and J. , A general regression technique for learning transductions, International Conference on Machine Learning (ICML), pp.153-160, 2005.

C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol.20, issue.3, p.79, 1995.

M. Cuturi, J. Vert, O. Birkenes, and T. Matsui, A kernel for time series based on global alignments, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07, vol.2, p.24, 2007.

B. Dai, B. Xie, N. He, Y. Liang, A. Raj et al., Scalable kernel methods via doubly stochastic gradients, Advances in Neural Information Processing Systems, pp.3041-3049, 2014.

V. H. De-la-peña, Decoupling and khintchine's inequalities for u-statistics. The Annals of Probability, pp.1877-1892, 1992.

V. H. De-la-peña and E. Giné, Decoupling: from dependence to independence. Probability and its Applications, Pages, vol.113, p.161, 1999.

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, p.166, 1996.

L. Devroye, L. Györfi, and G. Lugosi, A Probabislistic Theory of Pattern Recognition, p.11, 1996.

L. Devroye, M. Lerasle, G. Lugosi, and R. I. Oliveira, Sub-gaussian mean estimators, The Annals of Statistics, vol.44, issue.6, pp.2695-2725, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01204519

I. S. Dhillon, Y. Guan, and B. Kulis, Kernel k-means: spectral clustering and normalized cuts, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.551-556, 2004.

F. Dinuzzo, C. Ong, P. Gehler, and G. Pillonetto, Learning output kernels with block coordinate descent, International Conference on Machine Learning (ICML), pp.49-56, 2011.

H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V. Vapnik, Support vector regression machines, Advances in neural information processing systems, pp.155-161, 1997.

J. Dubin and D. Rivers, Selection bias in linear regression, logit and probit models, Sociological Methods & Research, vol.18, issue.2-3, pp.360-390, 1989.

D. Dubois-laforgue, S. Caillat-zucman, C. Boitard, and J. Timsit, Clinical characteristics of type 2 diabetes in patients with mutations of hfe, Diabetes & metabolism, vol.26, issue.1, pp.65-68, 2000.

M. Dudík, S. Phillips, and R. Schapire, Correcting sample selection bias in maximum entropy density estimation, Advances in neural information processing systems, pp.323-330, 2006.

R. Dudley, Uniform Central Limit Theorems, p.185, 1999.

D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel et al., Convolutional networks on graphs for learning molecular fingerprints, Advances in neural information processing systems, pp.2224-2232, 2015.

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent et al., Why does unsupervised pre-training help deep learning, Journal of Machine Learning Research, vol.11, pp.625-660, 2010.

D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, Pages, vol.1341, issue.3, p.32, 2009.

L. Fei-fei, R. Fergus, and P. Perona, One-shot learning of object categories, IEEE transactions on pattern analysis and machine intelligence, vol.28, pp.594-611, 2006.

T. Gärtner, Kernels for Structured Data, volume 72 of Series in Machine Perception and Artificial Intelligence, p.24, 2008.

B. Gholami and A. Hajisami, Kernel autoencoder for semi-supervised hashing, Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, p.35, 2016.

R. Gill, Y. Vardi, and J. Wellner, Large sample theory of empirical distributions in biased sampling models, The Annals of Statistics, vol.16, issue.3, p.179, 1988.

E. Giné, R. Lata?a, and J. Zinn, Exponential and moment inequalities for u-statistics, High Dimensional Probability II, p.113, 2000.

E. Giné and J. Zinn, Some limit theorems for empirical processes, Ann. Probab, vol.12, issue.4, pp.929-998, 1984.

X. Glorot, A. Bordes, and Y. Bengio, Domain adaptation for large-scale sentiment classification: A deep learning approach, Proceedings of the 28th international conference on machine learning (ICML-11), pp.513-520, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752091

C. Godsil and G. Royle, Algebraic Graph Theory, 2001.

I. J. Goodfellow, A. Courville, and Y. Bengio, Spike-and-slab sparse coding for unsupervised feature discovery, p.32, 2012.

M. Gori, G. Monfardini, and F. Scarselli, A new model for learning in graph domains, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol.2, p.35, 2005.

W. F. Grams and R. Serfling, Convergence rates for u-statistics and related statistics, vol.1, pp.153-160, 1973.

J. Hajek, Asymptotic normality of simple linear rank statistics under alternatives, Ann. Math. Stat, vol.39, pp.325-346, 1968.

D. R. Hardoon, S. Szedmak, and J. Shawe-taylor, Canonical correlation analysis: An overview with application to learning methods, Neural computation, vol.16, issue.12, pp.2639-2664, 2004.

M. Hardt, B. Recht, and Y. Singer, Train faster, generalize better: Stability of stochastic gradient descent, p.51, 2015.

J. Heckman, Sample selection bias as a specification error, Econometrica: Journal of the econometric society, pp.153-161, 1979.

J. Heckman, Varieties of selection bias, The American Economic Review, vol.80, issue.2, p.166, 1990.

G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural computation, vol.18, issue.7, pp.1527-1554, 2006.

G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks. science, Pages, vol.313, issue.5786, p.41, 2006.

G. E. Hinton and R. S. Zemel, Autoencoders, minimum description length and helmholtz free energy, Advances in neural information processing systems, pp.3-10, 1994.

W. Hoeffding, A class of statistics with asymptotically normal distribution, Ann. Math. Stat, vol.19, p.113, 1948.

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol.58, issue.301, p.168, 1963.

T. Hofmann, B. Schoelkopf, and A. J. Smola, Kernel methods in machine learning, Ann. Statist, vol.36, issue.3, pp.1171-1220, 2008.

S. B. Hopkins, Sub-gaussian mean estimation in polynomial time, vol.120, 2018.

D. Hsu and S. Sabato, Heavy-tailed regression with a generalized median-ofmeans, International Conference on Machine Learning, pp.37-45, 2014.

D. Hsu and S. Sabato, Loss minimization and parameter estimation with heavy tails, The Journal of Machine Learning Research, vol.17, issue.1, p.140, 2016.

J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, and A. Smola, Correcting sample selection bias by unlabeled data, Advances in neural information processing systems, pp.601-608, 2007.

P. J. Huber, Robust estimation of a location parameter, The Annals of Mathematical Statistics, vol.86, p.90, 1964.

I. J. Goodfellow, I. Bengio, Y. , A. C. Courville, and A. C. , Deep Learning. Adaptive computation and machine learning, Pages, vol.31, p.34, 2016.

M. R. Jerrum, L. G. Valiant, and V. V. Vazirani, Random generation of combinatorial structures from a uniform distribution, Theoretical Computer Science, vol.43, pp.169-188, 1986.

T. Joachims, T. Hofmann, Y. Yue, Y. , and C. , Predicting structured objects with support vector machines, Commun. ACM, vol.52, issue.11, pp.97-104, 2009.

E. Joly and G. Lugosi, Robust estimation of u-statistics, Stochastic Processes and their Applications, vol.126, p.131, 2016.

E. Joly, G. Lugosi, and R. I. Oliveira, On the estimation of the mean of a random vector, Electronic Journal of Statistics, vol.11, issue.1, pp.440-451, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01345802

H. Kadri, E. Duflos, P. Preux, S. Canu, A. Rakotomamonjy et al., Operator-valued kernels for learning from functional response data, Journal of Machine Learning Research, vol.17, issue.20, p.98, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01221329

M. Kampffmeyer, S. Løkse, F. M. Bianchi, R. Jenssen, and L. Livi, Deep kernelized autoencoders, Scandinavian Conference on Image Analysis, p.35, 2017.

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, Pages, vol.20, p.35, 2016.

T. N. Kipf and M. Welling, Variational graph autoencoders. NIPS Workshop on Bayesian Deep Learning, p.35, 2016.

R. Koenker, Quantile regression, Pages, vol.28, p.95, 2005.

A. N. Kolmogorov, The local structure of turbulence in incompressible viscous fluid for very large reynolds numbers, Cr Acad. Sci. URSS, vol.30, pp.301-305, 1941.

V. Koltchinskii and S. Mendelson, Bounding the smallest singular value of a random matrix without concentration, International Mathematics Research Notices, issue.23, pp.12991-13008, 2015.

A. Krizhevsky and G. E. Hinton, Using very deep autoencoders for contentbased image retrieval, ESANN, vol.1, pp.2-34, 2011.

P. Laforgue, S. Clémençon, and F. Alché-buc, Autoencoding any data through kernel autoencoders, Artificial Intelligence and Statistics, vol.35, p.56, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02288519

P. Laforgue and S. Clémençon, Statistical learning from biased training samples, p.171, 2019.

P. Laforgue, S. Clemencon, and P. Bertail, On medians of (Randomized) pairwise means, Proceedings of the 36th International Conference on Machine Learning, vol.97, pp.1272-1281, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02463910

P. Laforgue, A. Lambert, L. Motte, and F. Buc, On the dualization of operator-valued kernel machines, p.82, 2019.

P. L. Lai and C. Fyfe, Kernel and nonlinear canonical correlation analysis, International Journal of Neural Systems, vol.10, issue.05, pp.365-377, 2000.

H. Larochelle and Y. Bengio, Classification using discriminative restricted boltzmann machines, Proceedings of the 25th international conference on Machine learning, p.32, 2008.

G. Lecué and M. Lerasle, Robust machine learning by median-of-means: theory and practice, Pages, vol.140, p.164, 2017.

G. Lecué and M. Lerasle, Learning from mom's principles: Le cam's approach. Stochastic Processes and their applications, vol.129, pp.4385-4410, 2019.

G. Lecué, M. Lerasle, and T. Mathieu, Robust classification via mom minimization, Pages, vol.15, p.152, 2018.

G. Lecué and S. Mendelson, Learning subgaussian classes: Upper and minimax bounds, Pages, vol.141, p.155, 2013.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

M. Ledoux and M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes, p.45, 1991.

A. J. Lee, U -statistics: Theory and practice, p.113, 1990.

M. Lerasle and R. I. Oliveira, Robust empirical mean estimators, p.140, 2011.

Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, Gated graph sequence neural networks, p.35, 2015.

Y. Lin, Y. Lee, and G. Wahba, Support vector machines for classification in nonstandard situations, Machine learning, vol.46, issue.1-3, pp.191-202, 2002.

Z. Liu, J. Yang, H. Liu, W. , and W. , Transfer learning by sample selection bias correction and its application in communication specific emitter identification, JCM, vol.11, pp.417-427, 2016.

G. Lugosi, Learning with an unreliable teacher, Pattern Recognition, vol.25, issue.1, pp.79-87, 1992.

G. Lugosi and S. Mendelson, Risk minimization by median-of-means tournaments, Pages, vol.140, p.162, 2016.

G. Lugosi and S. Mendelson, Sub-gaussian estimators of the mean of a random vector, p.119, 2017.

G. Lugosi and S. Mendelson, Regularization, sparse recovery, and medianof-means tournaments, Bernoulli, vol.25, issue.3, pp.2075-2106, 2019.

P. Mahé and J. Vert, Graph kernels based on tree patterns for molecules, Machine learning, vol.75, issue.1, p.24, 2009.

J. Mairal, End-to-end kernel learning with supervised convolutional kernel networks, Advances in neural information processing systems, vol.35, p.52, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01387399

J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid, Convolutional kernel networks, Advances in neural information processing systems, vol.35, p.52, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01005489

C. Manski and S. Lerman, The estimation of choice probabilities from choice based samples, Econometrica: Journal of the Econometric Society, pp.1977-1988, 1977.

S. Matsuda, J. Vert, H. Saigo, N. Ueda, H. Toh et al., A novel representation of protein sequences for prediction of subcellular location using support vector machines, Protein Science, vol.14, issue.11, pp.2804-2813, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00433582

A. Maurer, A chain rule for the expected suprema of gaussian processes, Algorithmic Learning Theory: 25th International Conference, vol.8776, p.46, 2014.

A. Maurer, A vector-contraction inequality for rademacher complexities, International Conference on Algorithmic Learning Theory, p.45, 2016.

A. Maurer, A bernstein-type inequality for functions of bounded interaction, Bernoulli, vol.25, issue.2, pp.1451-1471, 2019.

A. Maurer and M. Pontil, Bounds for vector-valued function estimation, Pages, vol.42, p.43, 2016.

J. L. Mcclelland, D. E. Rumelhart, and P. R. Group, , vol.2, p.34, 1987.

C. Mcdiarmid, On the method of bounded differences, Surveys in combinatorics, vol.141, pp.148-188, 0198.

S. Mendelson, Learning without concentration, Conference on Learning Theory, pp.25-39, 2014.

S. Mendelson, Upper bounds on product and multiplier empirical processes, Stochastic Processes and their Applications, vol.126, pp.3652-3680, 2016.

S. Mendelson, On aggregation for heavy-tailed classes. Probability Theory and Related Fields, Pages, vol.168, issue.3-4, p.159, 2017.

J. Mercer, Xvi. functions of positive and negative type, and their connection the theory of integral equations. Philosophical transactions of the royal society of London. Series A, containing papers of a mathematical or physical character, vol.209, pp.415-446, 1909.

G. Mesnil, Y. Dauphin, X. Glorot, S. Rifai, Y. Bengio et al., Unsupervised and transfer learning challenge: a deep learning approach, Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop, vol.27, p.32, 2011.

C. A. Micchelli and M. Pontil, On learning vector-valued functions, Neural computation, vol.17, issue.1, p.64, 2005.

T. Mikolov, Q. V. Le, and I. Sutskever, Exploiting similarities among languages for machine translation, p.33, 2013.

S. Minsker, Geometric Median and Robust Estimation in Banach Spaces, Bernoulli, vol.21, issue.4, p.126, 2015.

S. Minsker and X. Wei, Robust modifications of u-statistics and applications to covariance estimation problems, 2018.

M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, Pages, vol.43, p.44, 2012.

J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l'Académie des sciences, vol.255, pp.2897-2899, 1962.
URL : https://hal.archives-ouvertes.fr/hal-01867195

A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Pages, vol.114, p.115, 1983.

J. Nocedal and S. Wright, Numerical optimization, p.78, 2006.

S. Nowozin and C. H. Lampert, Structured learning and prediction in computer vision, Foundations and Trends in Computer Graphics and Vision, vol.6, issue.3-4, pp.185-365, 2011.

G. Obozinski, B. Taskar, J. , and M. I. , Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.20, issue.2, pp.231-252, 2010.

M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, Advances in neural information processing systems, pp.1410-1418, 2009.

G. Papa, S. Clémençon, and P. Bertail, Learning from Survey Training Samples: Rate Bounds for Horvitz-Thompson Risk Minimizers, Proceedings of ACML, p.166, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02287361

F. Pedregosa, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, p.173, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

G. Pisier, Probabilistic methods in the geometry of banach spaces, Probability and analysis, p.45, 1986.

J. Quionero-candela, M. Sugiyama, A. Schwaighofer, L. , and N. , Dataset shift in machine learning, Pages, vol.15, p.167, 2009.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information processing systems, vol.60, p.101, 2008.

J. O. Ramsay and B. W. Silverman, Applied functional data analysis: methods and case studies, Pages, vol.27, p.100, 2007.

M. Ranzato, Y. Boureau, and Y. L. Cun, Sparse feature learning for deep belief networks, Advances in neural information processing systems, pp.1185-1192, 2008.

M. Ranzato, C. Poultney, S. Chopra, and Y. L. Cun, Efficient learning of sparse representations with an energy-based model, Advances in neural information processing systems, vol.33, p.34, 2007.

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, Contractive autoencoders: Explicit invariance during feature extraction, Proceedings of the 28th International Conference on International Conference on Machine Learning, p.34, 2011.

H. Robbins and S. Monro, A stochastic approximation method. The annals of mathematical statistics, pp.400-407, 1951.

R. T. Rockafellar, Convex analysis, vol.28, p.78, 1970.

S. Rosset, J. Zhu, H. Zou, and T. Hastie, A method for inferring label sampling mechanisms in semi-supervised learning, Advances in neural information processing systems, pp.1161-1168, 2005.

H. Saigo, J. Vert, N. Ueda, and T. Akutsu, Protein homology detection using string alignment kernels, Bioinformatics, vol.20, issue.11, p.24, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00433587

R. Salakhutdinov and G. Hinton, Deep boltzmann machines, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, vol.5, pp.448-455, 2009.

R. Salakhutdinov and G. Hinton, Semantic hashing, International Journal of Approximate Reasoning, vol.50, issue.7, pp.969-978, 2009.

M. Sangnier, O. Fercoq, and F. Alché-buc, Data sparse nonparametric regression with -insensitive losses, Asian Conference on Machine Learning, vol.80, p.88, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01593459

C. Saunders, A. Gammerman, and V. Vovk, Ridge regression learning algorithm in dual variables, Proc. of the 15th International Conference on Machine Learning, p.80, 1998.

B. Schölkopf, A. Smola, and K. Müller, Kernel principal component analysis, International conference on artificial neural networks, vol.24, p.40, 1997.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, vol.10, issue.5, p.51, 1998.

B. Schölkopf and A. J. Smola, Learning with kernels: Support vector machines, regularization. Optimization, and Beyond, p.25, 2002.

B. Schölkopf, K. Tsuda, and J. Vert, Support vector machine applications in computational biology, Pages, vol.21, p.25, 2004.

E. Senkene and A. Tempel'man, Hilbert spaces of operator-valued functions, Lithuanian Mathematical Journal, vol.13, issue.4, pp.665-670, 1973.

R. Serfling, Approximation Theorems of Mathematical Statistics, Wiley Series in Probability and Statistics, p.113, 1980.

R. J. Serfling, Probability inequalities for the sum in sampling without replacement. The Annals of Statistics, pp.39-48, 1974.

J. Shawe-taylor and N. Cristianini, Kernel methods for pattern analysis, p.25, 2004.

. Shehzadex, , 2017.

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, Journal of statistical planning and inference, vol.90, issue.2, pp.227-244, 2000.

R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, Zero-shot learning through cross-modal transfer, Advances in neural information processing systems, pp.935-943, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, vol.15, pp.1929-1958, 2014.

I. Steinwart and A. Christmann, Support vector machines, Pages, vol.25, p.95, 2008.

H. Su, M. Heinonen, J. ;. Rousu, E. Tsivtsivadze, E. Marchiori et al., Structured output prediction of anti-cancer drug activity, Pattern Recognition in Bioinformatics -5th IAPR International Conference, PRIB 2010, Proceedings, vol.6282, p.73, 2010.

M. Sugiyama and M. Kawanabe, Machine learning in non-stationary environments: Introduction to covariate shift adaptation, p.167, 2012.

M. Sugiyama and K. Müller, Input-dependent estimation of generalization error under covariate shift, Statistics & Decisions, vol.23, issue.4, pp.249-279, 2005.

J. A. Suykens, Least squares support vector machines, p.80, 2002.

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl, vol.109, issue.3, pp.475-494, 2001.

P. Tseng and S. Yun, Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization, J. Optim. Theory Appl, vol.140, issue.3, pp.513-89, 2009.

D. Valsesia, G. Fracastoro, and E. Magli, Learning localized generative models for 3d point clouds via graph convolution, p.35, 2018.

V. Van-belle, K. Pelckmans, J. Suykens, and S. Van-huffel, Learning transformation models for ranking and survival analysis, Journal of machine learning research, pp.44-166, 2011.

A. Van-der-vaart, Asymptotic Statistics, vol.113, p.168, 1998.

A. Van-der-vaart and J. Wellner, Weak convergence and empirical processes, 1996.

E. Van-miltenburg, Stereotyping and bias in the flickr30k dataset, Workshop on Multi-modal Corpora: Computer vision and language processing, p.166, 2016.

V. Vapnik, Statistical learning theory, Pages, vol.10, p.21, 1998.

Y. Vardi, Empirical distributions in selection bias models, Ann. Statist, vol.13, p.194, 1985.

F. Vella, Estimating models with sample selection bias: a survey, Journal of Human Resources, pp.127-169, 1998.

J. Vert, A tree kernel to analyse phylogenetic profiles, Bioinformatics, vol.18, issue.suppl_1, pp.276-284, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00433591

R. Vert and J. Vert, Consistency and convergence rates of one-class svms and related algorithms, Journal of Machine Learning Research, vol.7, pp.817-854, 2006.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res, vol.11, p.35, 2010.

R. Vogel, S. Clémençon, and A. Bellet, A Probabilistic Theory of Supervised Similarity Learning: Pairwise Bipartite Ranking and Pointwise ROC Curve Optimization, International Conference in Machine Learning, p.107, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02288518

Y. Weiss, A. Torralba, F. , and R. , Spectral hashing, Advances in neural information processing systems, pp.1753-1760, 2009.

C. K. Williams and M. Seeger, Using the nyström method to speed up kernel machines, Advances in neural information processing systems, vol.60, p.101, 2001.

C. Winship and R. Mare, Models for sample selection bias. Annual review of sociology, vol.18, pp.327-350, 1992.

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang et al., A comprehensive survey on graph neural networks, p.35, 2019.

Y. Yamanishi, J. Vert, A. Nakaya, and M. Kanehisa, Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis, Bioinformatics, vol.19, issue.suppl_1, pp.323-330, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00433589

B. Zadrozny, Learning and evaluating classifiers under sample selection bias, Proceedings of the twenty-first international conference on Machine learning, p.167, 2004.

B. Zadrozny and C. Elkan, Learning and making decisions when costs and probabilities are both unknown, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), vol.28, p.95, 2001.

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, European conference on computer vision, p.32, 2014.

J. Zhao, T. Wang, M. Yatskar, V. Ordonez, C. et al., Men also like shopping: Reducing gender bias amplification using corpus-level constraints, Proceedings of the Conference on Empirical Methods in Natural Language Processing, p.166, 2017.