Affordable Access

Publisher Website

Conditional generative modeling for de novo protein design with hierarchical functions.

Authors
  • Kucera, Tim1
  • Togninalli, Matteo2
  • Meng-Papaxanthos, Laetitia3
  • 1 Department of Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland. , (Switzerland)
  • 2 Visium, Lausanne 1015, Switzerland. , (Switzerland)
  • 3 Google Research, Brain Team, Zurich 8002, Switzerland. , (Switzerland)
Type
Published Article
Journal
Bioinformatics
Publisher
Oxford University Press (OUP)
Publication Date
Jun 27, 2022
Volume
38
Issue
13
Pages
3454–3461
Identifiers
DOI: 10.1093/bioinformatics/btac353
PMID: 35639661
Source
Medline
Language
English
License
Unknown

Abstract

Protein design has become increasingly important for medical and biotechnological applications. Because of the complex mechanisms underlying protein formation, the creation of a novel protein requires tedious and time-consuming computational or experimental protocols. At the same time, machine learning has enabled the solving of complex problems by leveraging large amounts of available data, more recently with great improvements on the domain of generative modeling. Yet, generative models have mainly been applied to specific sub-problems of protein design. Here, we approach the problem of general-purpose protein design conditioned on functional labels of the hierarchical Gene Ontology. Since a canonical way to evaluate generative models in this domain is missing, we devise an evaluation scheme of several biologically and statistically inspired metrics. We then develop the conditional generative adversarial network ProteoGAN and show that it outperforms several classic and more recent deep-learning baselines for protein sequence generation. We further give insights into the model by analyzing hyperparameters and ablation baselines. Lastly, we hypothesize that a functionally conditional model could generate proteins with novel functions by combining labels and provide first steps into this direction of research. The code and data underlying this article are available on GitHub at https://github.com/timkucera/proteogan, and can be accessed with doi:10.5281/zenodo.6591379. Supplemental data are available at Bioinformatics online. © The Author(s) 2022. Published by Oxford University Press.

Report this publication

Statistics

Seen <100 times