Affordable Access

deepdyve-link
Publisher Website

Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts.

Authors
  • Veronesi, Giovanni1
  • Grassi, Guido2
  • Savelli, Giordano3
  • Quatto, Piero4
  • Zambon, Antonella5
  • 1 Research Center in Epidemiology and Preventive Medicine, Department of Medicine and Surgery, University of Insubria, Varese, Italy. , (Italy)
  • 2 Clinica Medica, Department of Medicine and Surgery, University of Milano-Bicocca, Milano, Italy. , (Italy)
  • 3 U.O. Medicina Nucleare, Fondazione Poliambulanza Istituto Ospedaliero, Brescia, Italy. , (Italy)
  • 4 Department of Economics, Management and Statistics.
  • 5 Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milano, Italy. , (Italy)
Type
Published Article
Journal
International Journal of Epidemiology
Publisher
Oxford University Press
Publication Date
Jun 01, 2020
Volume
49
Issue
3
Pages
876–884
Identifiers
DOI: 10.1093/ije/dyz206
PMID: 31620789
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

An increasing number of observational studies combine large sample sizes with low participation rates, which could lead to standard inference failing to control the false-discovery rate. We investigated if the 'empirical calibration of P-value' method (EPCV), reliant on negative controls, can preserve type I error in the context of survival analysis. We used simulated cohort studies with 50% participation rate and two different selection bias mechanisms, and a real-life application on predictors of cancer mortality using data from four population-based cohorts in Northern Italy (n = 6976 men and women aged 25-74 years at baseline and 17 years of median follow-up). Type I error for the standard Cox model was above the 5% nominal level in 15 out of 16 simulated settings; for n = 10 000, the chances of a null association with hazard ratio = 1.05 having a P-value < 0.05 were 42.5%. Conversely, EPCV with 10 negative controls preserved the 5% nominal level in all the simulation settings, reducing bias in the point estimate by 80-90% when its main assumption was verified. In the real case, 15 out of 21 (71%) blood markers with no association with cancer mortality according to literature had a P-value < 0.05 in age- and gender-adjusted Cox models. After calibration, only 1 (4.8%) remained statistically significant. In the analyses of large observational studies prone to selection bias, the use of empirical distribution to calibrate P-values can substantially reduce the number of trivial results needing further screening for relevance and external validity. © The Author(s) 2019. Published by Oxford University Press on behalf of the International Epidemiological Association.

Report this publication

Statistics

Seen <100 times