It has recently been demonstrated that metrics of structural validity are severely underreported in social and personality psychology. We comprehensively assessed structural validity in a uniquely large and varied data set ( N = 144,496 experimental sessions) to investigate the psychometric properties of some of the most widely used self-report measures ( k = 15 questionnaires, 26 scales) in social and personality psychology. When the scales were assessed using the modal practice of considering only internal consistency, 88% of them appeared to possess good validity. Yet when validity was assessed comprehensively (via internal consistency, immediate and delayed test-retest reliability, factor structure, and measurement invariance for age and gender groups), only 4% demonstrated good validity. Furthermore, the less commonly a test was reported in the literature, the more likely the scales were to fail that test (e.g., scales failed measurement invariance much more often than internal consistency). This suggests that the pattern of underreporting in the field may represent widespread hidden invalidity of the measures used and may therefore pose a threat to many research findings. We highlight the degrees of freedom afforded to researchers in the assessment and reporting of structural validity and introduce the concept of validity hacking ( v-hacking), similar to the better-known concept of p-hacking. We argue that the practice of v-hacking should be acknowledged and addressed.