Affordable Access

deepdyve-link
Publisher Website

BnpC: Bayesian non-parametric clustering of single-cell mutation profiles.

Authors
  • Borgsmüller, Nico1, 2
  • Bonet, Jose3, 4
  • Marass, Francesco1, 2
  • Gonzalez-Perez, Abel3, 4
  • Lopez-Bigas, Nuria3, 5
  • Beerenwinkel, Niko1, 2
  • 1 Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland. , (Switzerland)
  • 2 SIB, Swiss Institute of Bioinformatics, Basel 4058, Switzerland. , (Switzerland)
  • 3 Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain. , (Spain)
  • 4 Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Catalonia 08002, Spain. , (Spain)
  • 5 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain. , (Spain)
Type
Published Article
Journal
Bioinformatics (Oxford, England)
Publication Date
Dec 08, 2020
Volume
36
Issue
19
Pages
4854–4859
Identifiers
DOI: 10.1093/bioinformatics/btaa599
PMID: 32592465
Source
Medline
Language
English
License
Unknown

Abstract

The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intratumor heterogeneity (ITH) by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq datasets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here, we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq datasets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime and scalability. Its inferred genotypes were the most accurate, especially on highly heterogeneous data, and it was the only method able to run and produce results on datasets with 5000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by Supplementary Experimental Data. With ever growing scDNA-seq datasets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve ITH but also as a preprocessing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC. Supplementary data are available at Bioinformatics online. © The Author(s) 2020. Published by Oxford University Press.

Report this publication

Statistics

Seen <100 times