rosenkrancová, lydie
Tato práce představuje tři modely strojového učení navržené pro detekci podvodných finančních transakcí a porovnává jejich výkon. Použité modely jsou Support Vector Machine, Ran- dom Forest a jednoduchá neuronová síť. Byly trénovány na dvou různých datasetech: šifrovaném záznamu reálných transakcí a uměle generovaném datovém souboru. Práce hodnotí ...
Chen, Yizuo Huang, Haiying Darwiche, Adnan
Chiquet, Pauline Lecellier, François Carré, Philippe
Depuis une vingtaine d’années, les universités utilisent des outils numériques qui génèrent des traces numériques pour améliorer l’accessibilité de leurs cours. L’étude présentée dans cet article a pour objectif de caractériser des profils étudiants de manière automatique à partir de ces données. Cependant, en général, les études dans ce cadre sont...
Aussenac-Gilles, Nathalie
National audience
Jourdan, Astrid Le Nir, Yannick Girardin, Nicolas
International audience
Chatzimparmpas, A. Martins, R. Jusufi, I. Kucher, K. Rossi, Fabrice Kerren, A.
Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust ...
Briscik, Mitja Tazza, Gabriele Vidács, László Dillies, Marie-Agnes Dejean, Sébastien
Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, de...
Denis, Christophe Dion-Blanc, Charlotte ELLA MINTSA, Eddy Tran, Chi
We study the multiclass classification problem where the features come from a mixture of timehomogeneous diffusion. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparamateric estimators of t...
Ayme, Alexis Boyer, Claire Dieuleveut, Aymeric Scornet, Erwan
Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input may strongly differ from the true underlying data. However, recent works suggest that this bias is low in the ...
Ros, Frédéric Riad, Rabia Guillaume, Serge
The application of clustering has always been an important method for problem-solving. In the era of big data, most classical clustering methods suffer from the curse of dimensionality and scalability issues. Recently, deep clustering models have garnered more attention due to their capabilities in dealing with complex, high-dimensional, and large-...