Affordable Access

deepdyve-link
Publisher Website

High dimensional surrogacy: computational aspects of an upscaled analysis.

Authors
  • Sengupta, Rudradev1, 2
  • Perualila, Nolen Joy3
  • Shkedy, Ziv1, 4
  • Biecek, Przemyslaw5
  • Molenberghs, Geert1, 4
  • Bijnens, Luc1, 2
  • 1 Center for Statistics (CenStat), Hasselt University, Hasselt, Belgium. , (Belgium)
  • 2 Nonclinical Statistics, Janssen Pharmaceutical companies of Johnson and Johnson, Beerse, Belgium. , (Belgium)
  • 3 HEMAR EMEA, Janssen Pharmaceutical companies of Johnson and Johnson, Beerse, Belgium. , (Belgium)
  • 4 Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), Hasselt, Belgium. , (Belgium)
  • 5 Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland. , (Poland)
Type
Published Article
Journal
Journal of Biopharmaceutical Statistics
Publisher
Informa UK (Taylor & Francis)
Publication Date
Jan 01, 2020
Volume
30
Issue
1
Pages
104–120
Identifiers
DOI: 10.1080/10543406.2019.1657128
PMID: 31462134
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

Identification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integration. High-Performance Computing (HPC) has become an important tool for everyday research tasks. In the context of drug discovery, high dimensional multi-source data needs to be analyzed to identify the biological pathways related to the new set of drugs under development. In order to process all information contained in the datasets, HPC techniques are required. Even though R packages for parallel computing are available, they are not optimized for a specific setting and data structure. In this article, we propose a new framework, for data analysis, to use R in a computer cluster. The proposed data analysis workflow is applied to a multi-source high dimensional drug discovery dataset and compared with a few existing R packages for parallel computing.

Report this publication

Statistics

Seen <100 times