Affordable Access

deepdyve-link
Publisher Website

CAMPAREE: a robust and configurable RNA expression simulator

Authors
  • Lahens, Nicholas F.1
  • Brooks, Thomas G.1
  • Sarantopoulou, Dimitra1, 2
  • Nayak, Soumyashant3
  • Lawrence, Cris1
  • Mrčela, Antonijo1
  • Srinivasan, Anand4
  • Schug, Jonathan1
  • Hogenesch, John B.5
  • Barash, Yoseph1
  • Grant, Gregory R.1, 1
  • 1 University of Pennsylvania,
  • 2 National Institutes of Health,
  • 3 Indian Statistical Institute,
  • 4 Enterprise Research Applications and High Performance Computing, Penn Medicine Academic Computing Services, University of Pennsylvania,
  • 5 Cincinnati Children’s Hospital Medical Center,
Type
Published Article
Journal
BMC Genomics
Publisher
Springer (Biomed Central Ltd.)
Publication Date
Sep 25, 2021
Volume
22
Identifiers
DOI: 10.1186/s12864-021-07934-2
PMID: 34563123
PMCID: PMC8467241
Source
PubMed Central
Keywords
Disciplines
  • Software
License
Unknown

Abstract

Background The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA. Results To fill this need, we developed the C onfigurable A nd M odular P rogram A llowing R NA E xpression E mulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature. Conclusions Separating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE . Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07934-2.

Report this publication

Statistics

Seen <100 times