Affordable Access

Publisher Website

Reproducibility of R-fMRI metrics on the impact of different strategies for multiple comparison correction and sample sizes.

  • Chen, Xiao1, 2
  • Lu, Bin1, 2
  • Yan, Chao-Gan1, 2, 3, 4
  • 1 CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China. , (China)
  • 2 Department of Psychology, University of Chinese Academy of Sciences, Beijing, China. , (China)
  • 3 Magnetic Resonance Imaging Research Center, Institute of Psychology, Chinese Academy of Sciences, Beijing, China. , (China)
  • 4 Department of Child and Adolescent Psychiatry, NYU Langone Medical Center, School of Medicine, New York, NY, USA.
Published Article
Human Brain Mapping
Wiley (John Wiley & Sons)
Publication Date
Oct 11, 2017
DOI: 10.1002/hbm.23843
PMID: 29024299


Concerns regarding reproducibility of resting-state functional magnetic resonance imaging (R-fMRI) findings have been raised. Little is known about how to operationally define R-fMRI reproducibility and to what extent it is affected by multiple comparison correction strategies and sample size. We comprehensively assessed two aspects of reproducibility, test-retest reliability and replicability, on widely used R-fMRI metrics in both between-subject contrasts of sex differences and within-subject comparisons of eyes-open and eyes-closed (EOEC) conditions. We noted permutation test with Threshold-Free Cluster Enhancement (TFCE), a strict multiple comparison correction strategy, reached the best balance between family-wise error rate (under 5%) and test-retest reliability/replicability (e.g., 0.68 for test-retest reliability and 0.25 for replicability of amplitude of low-frequency fluctuations (ALFF) for between-subject sex differences, 0.49 for replicability of ALFF for within-subject EOEC differences). Although R-fMRI indices attained moderate reliabilities, they replicated poorly in distinct datasets (replicability < 0.3 for between-subject sex differences, < 0.5 for within-subject EOEC differences). By randomly drawing different sample sizes from a single site, we found reliability, sensitivity and positive predictive value (PPV) rose as sample size increased. Small sample sizes (e.g., < 80 [40 per group]) not only minimized power (sensitivity < 2%), but also decreased the likelihood that significant results reflect "true" effects (PPV < 0.26) in sex differences. Our findings have implications for how to select multiple comparison correction strategies and highlight the importance of sufficiently large sample sizes in R-fMRI studies to enhance reproducibility. Hum Brain Mapp 00:000-000, 2017. © 2017 Wiley Periodicals, Inc.

Report this publication


Seen <100 times