Affordable Access

Access to the full text

SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500

Authors
  • Zhou, Yanqiu1
  • Liu, Chen1
  • Zhou, Rongfang1
  • Lu, Anzhi1
  • Huang, Biao1
  • Liu, Liling1
  • Chen, Ling1
  • Luo, Bei1
  • Huang, Jin1
  • Tian, Zhijian1
  • 1 BGI-Wuhan Clinical Laboratories, Building B2, No.666 Gaoxin Road, Wuhan East lake Hi-tech Development zone, Wuhan, 430074, China , Wuhan (China)
Type
Published Article
Journal
BioData Mining
Publisher
BioMed Central
Publication Date
Nov 15, 2019
Volume
12
Issue
1
Identifiers
DOI: 10.1186/s13040-019-0209-9
Source
Springer Nature
Keywords
License
Green

Abstract

BackgroundThe sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500.ResultsA total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R2 of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R2 of 0.96. These results indicate that our simulation model was reliable and effective.ConclusionsData sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website http://seqBEACON.genomics.cn:443/home.html.

Report this publication

Statistics

Seen <100 times