Price, Morgan N Arkin, Adam P
Automated annotations of protein functions are error-prone because of our lack of knowledge of protein functions. For example, it is often impossible to predict the correct substrate for an enzyme or a transporter. Furthermore, much of the knowledge that we do have about the functions of proteins is missing from the underlying databases. We discuss...
Vallat, Brinda Webb, Benjamin Westbrook, John Goddard, Thomas Hanke, Christian Graziadei, Andrea Peisach, Ezra Zalevsky, Arthur Sagendorf, Jared Tangmunarunkit, Hongsuda
...
IHMCIF (github.com/ihmwg/IHMCIF) is a data information framework that supports archiving and disseminating macromolecular structures determined by integrative or hybrid modeling (IHM), and making them Findable, Accessible, Interoperable, and Reusable (FAIR). IHMCIF is an extension of the Protein Data Bank Exchange/macromolecular Crystallographic In...
Nasti, Lucia Vecchiato, Giacomo Heuret, Patrick Rowe, Nick P Palladino, Michele Marcati, Pierangelo
A plant's structure is the result of constant adaptation and evolution to the surrounding environment. From this perspective, our goal is to investigate the mass and radius distribution of a particular plant organ, namely the searcher shoot, by providing a Reinforcement Learning (RL) environment, that we call Searcher-Shoot, which considers the mec...
Hong, Zhenchen Shimagaki, Kai Barton, John
SUMMARY: Deep mutational scanning (DMS) experiments provide a powerful method to measure the functional effects of genetic mutations at massive scales. However, the data generated from these experiments can be difficult to analyze, with significant variation between experimental replicates. To overcome this challenge, we developed popDMS, a computa...
Talwar, James V Klie, Adam Pagadala, Meghana S Carter, Hannah
SummaryHarmonizing variant indexing and allele assignments across datasets is crucial for data integrity in cross-dataset studies such as multi-cohort genome-wide association studies, meta-analyses, and the development, validation, and application of polygenic risk scores. Ensuring this indexing and allele consistency is a laborious, time-consuming...
Yazdani, Kimia Mousapour, Reza Hayes, Wayne
MOTIVATION: Protein-protein interaction (PPI) networks provide valuable insights into the function of biological systems. Aligning multiple PPI networks may expose relationships beyond those observable by pairwise comparisons. However, assessing the biological quality of multiple network alignments is a challenging problem. RESULTS: We propose two ...
Sehgal, Aarushi Ziaei Jam, Helyaneh Shen, Andrew Gymrek, Melissa
MOTIVATION: Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6 bp and comprise >1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion dis...
Hie, Brian Kim, Soochi Rando, Thomas Bryson, Bryan Berger, Bonnie
Merging diverse single-cell RNA sequencing (scRNA-seq) data from numerous experiments, laboratories and technologies can uncover important biological insights. Nonetheless, integrating scRNA-seq data encounters special challenges when the datasets are composed of diverse cell type compositions. Scanorama offers a robust solution for improving the q...
Camargo, Antonio Pedro Roux, Simon Schulz, Frederik Babinski, Michal Xu, Yan Hu, Bin Chain, Patrick SG Nayfach, Stephen Kyrpides, Nikos C
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences...
Stcherbinine, Aurélien Langevin, Yves Carter, John Vincendon, Mathieu Leseigneur, Yann Barraud, Océane
International audience