Gribonval, Rémi Blanchard, Gilles Keriven, Nicolas Traonmilin, Yann

We describe a general framework –compressive statistical learning– for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk is com...

Challa, Aditya Danda, Sravan Daya Sagar, B S Najman, Laurent

The problem of clustering has been an important problem since the early 20th century and several possible solutions were proposed. With the rise of computing machines clustering has become an important part of many data mining tasks, focussed on fast implementations. An important task related to clustering is image segmentation. In the set of solut...

Akimoto, Youhei Auger, Anne Hansen, Nikolaus

Quality gain is the expected relative improvement of the function value in a single step of a search algorithm. Quality gain analysis reveals the dependencies of the quality gain on the parameters of a search algorithm, based on which one can derive the optimal values for the parameters. In this paper, we investigate evolution strategies with weigh...

Barbier, Jean Krzakala, Florent Macris, Nicolas Miolane, Léo Zdeborová, Lenka

We consider generalized linear models where an unknown $n$-dimensional signal vector is observed through the successive application of a random matrix and a non-linear (possibly probabilistic) componentwise function. We consider the models in the high-dimensional limit, where the observation consists of $m$ points, and $m/n {\to} {\alpha}$ where ${...

Klein, John Albardan, Mahmoud Guedj, Benjamin Colot, Olivier

We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowin...

Rossetti, Giulio Cazabet, Rémy

Networks built to model real world phenomena are characeterised by some properties that have attracted the attention of the scientific community: (i) they are organised according to community structure and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, g...

Lerasle, Matthieu Szabó, Zoltán Lecué, Guillaume Massiot, Gaspar Moulines, Eric

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy ; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map...

Lasserre, Jean B. Pauwels, Edouard
Published in
Advances in Computational Mathematics

We illustrate the potential applications in machine learning of the Christoffel function, or, more precisely, its empirical counterpart associated with a counting measure uniformly supported on a finite set of points. Firstly, we provide a thresholding scheme which allows approximating the support of a measure from a finite subset of its moments wi...

Guedj, Benjamin Li, Le

When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our ...

Rakotomamonjy, Alain Gasso, Gilles Salmon, Joseph