Diverse and efficient ensembling of deep networks
- Authors
- Publication Date
- Oct 11, 2023
- Source
- HAL-Descartes
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
This thesis aims at enhancing the generalization abilities of deep neural networks, a critical step towards fair and reliable artificial intelligence. Specifically, we address the drop in performance when models are evaluated on test samples with a distribution shift with respect to the train samples. To this end, we focus on ensembling strategies: indeed, combining multiple models is a standard, simple yet potent strategy to improve robustness. After an overview of the relevant literature, we provide a new explanation of ensembling's success under distribution shifts, especially when the members of the ensemble are diverse. To foster such diversity within members, we investigate several strategies. The initial one, DICE, introduces an explicit regularization to eliminate redundant information across members. Subsequent diversity methods in this thesis are implicit, relying on diverse data augmentation (in MixMo), diverse hyperparameters (in DiWA), inter-training on auxiliary datasets (in ratatouille), and diverse objectives (in rewarded soups). The second primary challenge addressed in this thesis is the enhancement of ensemble efficiency, and aims at lessening the computational burden of combining multiple models; indeed, when considering two members, the standard ensembling by averaging of predictions doubles the computational cost, thus impeding scalability. After exploring subnetwork ensembling (in MixMo), we introduce a significant contribution of this thesis; the observed ability to average the models in weights rather than in predictions. This finding was surprising due to the non-linearities in deep architectures. We empirically demonstrate that, when weights are fine-tuned from a shared pre-trained initialization, weight averaging succeeds by approximating ensembling without any inference overhead. The empirical gains are especially important on DomainBed, the reference benchmark evaluating out-of-distribution generalization. More broadly, weight averaging facilitates effortless parallelization, enhancing machine learning updatability and data privacy. Finally, this thesis explores how ensembling can facilitate the alignment of models. This is critical to mitigate the societal ethical concerns from recent rapid scale-up in deep learning. To this end, we propose rewarded soups, a new strategy for multi-objective reinforcement learning, paving the way towards more transparent and reliable artifical intelligences, aligned with the world in all its diversity.