Affordable Access

Access to the full text

Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Authors
  • Wirbel, Jakob1
  • Zych, Konrad1, 2
  • Essex, Morgan1, 3
  • Karcher, Nicolai1, 4
  • Kartal, Ece1
  • Salazar, Guillem5
  • Bork, Peer1, 6, 7, 8
  • Sunagawa, Shinichi5
  • Zeller, Georg1
  • 1 European Molecular Biology Laboratory (EMBL), Heidelberg, 69117, Germany , Heidelberg (Germany)
  • 2 Present Address: Clinical Microbiomics A/S, Ole Maaløes Vej 3, København, 2200, Denmark , København (Denmark)
  • 3 Present Address: Experimental and Clinical Research Center (ECRC) of the Max Delbrück Center for Molecular Medicine and Charité University Hospital, Berlin, 13125, Germany , Berlin (Germany)
  • 4 University of Trento, Trento, 38123, Italy , Trento (Italy)
  • 5 Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, 8093, Switzerland , Zürich (Switzerland)
  • 6 Molecular Medicine Partnership Unit, Heidelberg, Germany , Heidelberg (Germany)
  • 7 Max Delbrück Centre for Molecular Medicine, Berlin, 13125, Germany , Berlin (Germany)
  • 8 University of Würzburg, Würzburg, 97074, Germany , Würzburg (Germany)
Type
Published Article
Publication Date
Mar 30, 2021
Volume
22
Issue
1
Identifiers
DOI: 10.1186/s13059-021-02306-1
Source
Springer Nature
Keywords
License
Green

Abstract

The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de.

Report this publication

Statistics

Seen <100 times