Affordable Access

Publisher Website

Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships

  • Mihalik, Agoston1, 2
  • Ferreira, Fabio S.1, 2
  • Moutoussis, Michael2, 3
  • Ziegler, Gabriel2, 4, 5
  • Adams, Rick A.1, 2, 3
  • Rosa, Maria J.1, 2
  • Prabhu, Gita2, 3
  • de Oliveira, Leticia6
  • Pereira, Mirtes6
  • Bullmore, Edward T.7, 8, 9, 10
  • Fonagy, Peter11
  • Goodyer, Ian M.7, 9
  • Jones, Peter B.7, 9
  • Hauser, Tobias
  • Neufeld, Sharon
  • Romero-Garcia, Rafael
  • St Clair, Michelle
  • Vértes, Petra E.
  • Whitaker, Kirstie
  • Inkster, Becky
  • And 29 more
  • 1 Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom
  • 2 Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
  • 3 Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
  • 4 Institute of Cognitive Neurology and Dementia Research, Otto von Guericke University, Magdeburg, Magdeburg, Germany
  • 5 German Center for Neurodegenerative Diseases, Bonn, Germany
  • 6 Laboratory of Neurophysiology of Behaviour, Department of Physiology and Pharmacology, Biomedical Institute, Federal Fluminense University, Niterói, Brazil
  • 7 Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
  • 8 Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, United Kingdom
  • 9 Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, United Kingdom
  • 10 ImmunoPsychiatry, GlaxoSmithKline Research and Development, Stevenage, United Kingdom
  • 11 Research Department of Clinical, Educational, and Health Psychology, University College London, London, United Kingdom
Published Article
Publication Date
Feb 15, 2020
DOI: 10.1016/j.biopsych.2019.12.001
PMID: 32040421
PMCID: PMC6970221
PubMed Central


Background In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. Methods We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain–behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. Results Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain–behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain–behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. Conclusions Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.

Report this publication


Seen <100 times