Fan, Yingying Demirkaya, Emre Li, Gaorong Lv, Jinchi
Published in
Journal of the American Statistical Association
Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candès, Fan, Janson and Lv (2018) in high-dimensio...
Sun, Qiang Zhou, Wen-Xin Fan, Jianqing
Published in
Journal of the American Statistical Association
Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, di...
Zhao, Hui Wu, Qiwei Li, Gang Sun, Jianguo
Published in
Journal of the American Statistical Association
The simultaneous estimation and variable selection for Cox model has been discussed by several authors (Fan and Li, 2002; Huang and Ma, 2010; Tibshirani, 1997) when one observes right-censored failure time data. However, there does not seem to exist an established procedure for interval-censored data, a more general and complex type of failure time...
Wilson, Douglas R Ibrahim, Joseph G Sun, Wei
Published in
Journal of the American Statistical Association
The study of gene expression quantitative trait loci (eQTL) is an effective approach to illuminate the functional roles of genetic variants. Computational methods have been developed for eQTL mapping using gene expression data from microarray or RNA-seq technology. Application of these methods for eQTL mapping in tumor tissues is problematic becaus...
Mejia, Amanda F. Nebel, Mary Beth Wang, Yikai Caffo, Brian S. Guo, Ying
Published in
Journal of the American Statistical Association
Large brain imaging databases contain a wealth of information on brain organization in the populations they target, and on individual variability. While such databases have been used to study group-level features of populations directly, they are currently underutilized as a resource to inform single-subject analysis. Here, we propose leveraging th...
Rashid, Naim U. Li, Quefeng Yeh, Jen Jen Ibrahim, Joseph G.
Published in
Journal of the American Statistical Association
In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regar...
Sun, Ryan Lin, Xihong
Published in
Journal of the American Statistical Association
Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases like breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic associ...
Guan, Qian Reich, Brian J. Laber, Eric B. Bandyopadhyay, Dipankar
Published in
Journal of the American Statistical Association
Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that reco...
Wilson, Douglas R. Jin, Chong Ibrahim, Joseph G. Sun, Wei
Published in
Journal of the American Statistical Association
Immunotherapies have attracted lots of research interests recently. The need to understand the underlying mechanisms of immunotherapies and to develop precision immunotherapy regimens has spurred great interest in characterizing immune cell composition within the tumor microenvironment. Several methods have been developed to estimate immune cell co...
Kamm, Jack Terhorst, Jonathan Durbin, Richard Song, Yun S.
Published in
Journal of the American Statistical Association
The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently...