Affordable Access

Access to the full text

SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500

  • Zhou, Yanqiu1
  • Liu, Chen1
  • Zhou, Rongfang1
  • Lu, Anzhi1
  • Huang, Biao1
  • Liu, Liling1
  • Chen, Ling1
  • Luo, Bei1
  • Huang, Jin1
  • Tian, Zhijian1
  • 1 BGI-Wuhan Clinical Laboratories, Building B2, No.666 Gaoxin Road, Wuhan East lake Hi-tech Development zone, Wuhan, 430074, China , Wuhan (China)
Published Article
BioData Mining
BioMed Central
Publication Date
Nov 15, 2019
DOI: 10.1186/s13040-019-0209-9
Springer Nature


BackgroundThe sequencing platform BGISEQ-500 is based on DNBSEQ technology and provides high throughput with low costs. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and performance of this system is essential for stabilizing the sequencing process, accurately interpreting sequencing results and efficiently solving sequencing problems. To address these concerns, a comprehensive database, SEQdata-BEACON, was constructed to accumulate the run performance data in BGISEQ-500.ResultsA total of 60 BGISEQ-500 instruments in the BGI-Wuhan lab were used to collect sequencing performance data. Lanes in paired-end 100 (PE100) sequencing using 10 bp barcode were chosen, and each lane was assigned a unique entry number as its identification number (ID). From November 2018 to April 2019, 2236 entries were recorded in the database containing 65 metrics about sample, yield, quality, machine state and supplies information. Using a correlation matrix, 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The distributions of the metrics also delivered information about patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of a total of 200 cycles, a linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing, and the corresponding R2 of the 200th and 15th cycle models were 0.97 and 0.81, respectively. The model was run with the test sets obtained from May 2019 to predict the yield, which resulted in an R2 of 0.96. These results indicate that our simulation model was reliable and effective.ConclusionsData sources, statistical findings and application tools provide a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ technology, solve sequencing problems and optimize run performance. These resources are available on our website

Report this publication


Seen <100 times