Affordable Access

Access to the full text

Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study

Authors
  • Ambrosini, Giovanna1, 2
  • Vorontsov, Ilya3, 4
  • Penzar, Dmitry3, 5, 6
  • Groux, Romain1, 2
  • Fornes, Oriol7
  • Nikolaeva, Daria D.5
  • Ballester, Benoit8
  • Grau, Jan9
  • Grosse, Ivo9, 10
  • Makeev, Vsevolod3, 6, 11
  • Kulakovskiy, Ivan3, 4, 11
  • Bucher, Philipp1, 2
  • 1 Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, CH-1015, Switzerland , Lausanne (Switzerland)
  • 2 Swiss Institute of Bioinformatics (SIB), Lausanne, CH-1015, Switzerland , Lausanne (Switzerland)
  • 3 Russian Academy of Sciences, Gubkina 3, Moscow, 119991, Russia , Moscow (Russia)
  • 4 Russian Academy of Sciences, Institutskaya 4, Pushchino, 142290, Russia , Pushchino (Russia)
  • 5 Lomonosov Moscow State University, Leninskiye gory 1-73, Moscow, 119234, Russia , Moscow (Russia)
  • 6 Moscow Institute of Physics and Technology (State University), Institutskiy per. 9, Dolgoprudny, 141700, Russia , Dolgoprudny (Russia)
  • 7 University of British Columbia, Vancouver, BC V5Z 4H4, Canada , Vancouver (Canada)
  • 8 Aix Marseille Université, INSERM, TAGC, Marseille, France , Marseille (France)
  • 9 Martin Luther University Halle-Wittenberg, Halle (Saale), Germany , Halle (Saale) (Germany)
  • 10 German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany , Leipzig (Germany)
  • 11 Russian Academy of Sciences, Vavilova 32, Moscow, 119991, Russia , Moscow (Russia)
Type
Published Article
Publication Date
May 11, 2020
Volume
21
Issue
1
Identifiers
DOI: 10.1186/s13059-020-01996-3
Source
Springer Nature
Keywords
License
Green

Abstract

BackgroundPositional weight matrix (PWM) is a de facto standard model to describe transcription factor (TF) DNA binding specificities. PWMs inferred from in vivo or in vitro data are stored in many databases and used in a plethora of biological applications. This calls for comprehensive benchmarking of public PWM models with large experimental reference sets.ResultsHere we report results from all-against-all benchmarking of PWM models for DNA binding sites of human TFs on a large compilation of in vitro (HT-SELEX, PBM) and in vivo (ChIP-seq) binding data. We observe that the best performing PWM for a given TF often belongs to another TF, usually from the same family. Occasionally, binding specificity is correlated with the structural class of the DNA binding domain, indicated by good cross-family performance measures. Benchmarking-based selection of family-representative motifs is more effective than motif clustering-based approaches. Overall, there is good agreement between in vitro and in vivo performance measures. However, for some in vivo experiments, the best performing PWM is assigned to an unrelated TF, indicating a binding mode involving protein-protein cooperativity.ConclusionsIn an all-against-all setting, we compute more than 18 million performance measure values for different PWM-experiment combinations and offer these results as a public resource to the research community. The benchmarking protocols are provided via a web interface and as docker images. The methods and results from this study may help others make better use of public TF specificity models, as well as public TF binding data sets.

Report this publication

Statistics

Seen <100 times