Affordable Access

Publisher Website

Forward regression for Cox models with high-dimensional covariates.

Authors
  • Hong, Hyokyoung G1
  • Zheng, Qi2
  • Li, Yi3
  • 1 Department of Statistics and Probability, Michigan State University, 19 Red Cedar Road, East Lansing, MI 48823, USA.
  • 2 Department of Bioinformatics and Biostatistics, University of Louisville, 485 East Gray Street, Louisville, KY 40202, USA.
  • 3 Department of Biostatistics, University of Michigan, 1415 Washington Heights Ann Arbor, MI 48109-2029, USA.
Type
Published Article
Journal
Journal of multivariate analysis
Publication Date
Sep 01, 2019
Volume
173
Pages
268–290
Identifiers
DOI: 10.1016/j.jmva.2019.02.011
PMID: 31007300
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

Forward regression, a classical variable screening method, has been widely used for model building when the number of covariates is relatively low. However, forward regression is seldom used in high-dimensional settings because of the cumbersome computation and unknown theoretical properties. Some recent works have shown that forward regression, coupled with an extended Bayesian information criterion (EBIC)-based stopping rule, can consistently identify all relevant predictors in high-dimensional linear regression settings. However, the results are based on the sum of residual squares from linear models and it is unclear whether forward regression can be applied to more general regression settings, such as Cox proportional hazards models. We introduce a forward variable selection procedure for Cox models. It selects important variables sequentially according to the increment of partial likelihood, with an EBIC stopping rule. To our knowledge, this is the first study that investigates the partial likelihood-based forward regression in high-dimensional survival settings and establishes selection consistency results. We show that, if the dimension of the true model is finite, forward regression can discover all relevant predictors within a finite number of steps and their order of entry is determined by the size of the increment in partial likelihood. As partial likelihood is not a regular density-based likelihood, we develop some new theoretical results on partial likelihood and use these results to establish the desired sure screening properties. The practical utility of the proposed method is examined via extensive simulations and analysis of a subset of the Boston Lung Cancer Survival Cohort study, a hospital-based study for identifying biomarkers related to lung cancer patients' survival.

Report this publication

Statistics

Seen <100 times