Universal developmental screening is widely recommended, yet studies of the accuracy of commonly used questionnaires reveal mixed results, and previous comparisons of these questionnaires are hampered by important methodological differences across studies. To compare the accuracy of 3 developmental screening instruments as standardized tests of developmental status. This cross-sectional diagnostic accuracy study recruited consecutive parents in waiting rooms at 10 pediatric primary care offices in eastern Massachusetts between October 1, 2013, and January 31, 2017. Parents were included if they were sufficiently literate in the English or Spanish language to complete a packet of screening questionnaires and if their child was of eligible age. Parents completed all questionnaires in counterbalanced order. Participants who screened positive on any questionnaire plus 10% of those who screened negative on all questionnaires (chosen at random) were invited to complete developmental testing. Analyses were weighted for sampling and nonresponse and were conducted from October 1, 2013, to January 31, 2017. The 3 screening instruments used were the Ages & Stages Questionnaire, Third Edition (ASQ-3); Parents' Evaluation of Developmental Status (PEDS); and Survey of Well-being of Young Children (SWYC): Milestones. Reference tests administered were Bayley Scales of Infant and Toddler Development, Third Edition, for children aged 0 to 42 months, and Differential Ability Scales, Second Edition, for older children. Age-standardized scores were used as indicators of mild (80-89), moderate (70-79), or severe (<70) delays. A total of 1495 families of children aged 9 months to 5.5 years participated. The mean (SD) age of the children at enrollment was 2.6 (1.3) years, and 779 (52.1%) were male. Parent respondents were primarily female (1325 [88.7%]), with a mean (SD) age of 33.4 (6.3) years. Of the 20.5% to 29.0% of children with a positive score on each questionnaire, 35% to 60% also received a positive score on a second questionnaire, demonstrating moderate co-occurrence. Among younger children (<42 months), the specificity of the ASQ-3 (89.4%; 95% CI, 85.9%-92.1%) and SWYC Milestones (89.0%; 95% CI, 86.1%-91.4%) was higher than that of the PEDS (79.6%; 95% CI, 75.7%-83.1%; P < .001 and P = .002, respectively), but differences in sensitivity were not statistically significant. Among older children (43-66 months), specificity of the ASQ-3 (92.1%; 95% CI, 85.1%-95.9%) was higher than that of the SWYC Milestones (70.7%; 95% CI, 60.9%-78.8%) and the PEDS (73.7%; 95% CI, 64.3%-81.3%; P < .001), but sensitivity to mild delays of the SWYC Milestones (54.8%; 95% CI, 38.1%-70.4%) and of the PEDS (61.8%; 95% CI, 43.1%-77.5%) was higher than that of the ASQ-3 (23.5%; 95% CI, 9.0%-48.8%; P = .012 and P = .002, respectively). Sensitivity exceeded 70% only with respect to severe delays, with 73.7% (95% CI, 50.1%-88.6%) for the SWYC Milestones among younger children, 78.9% (95% CI, 55.4%-91.9%) for the PEDS among younger children, and 77.8% (95% CI, 41.8%-94.5%) for the PEDS among older children. Attending to parents' concerns was associated with increased sensitivity of all questionnaires. This study found that 3 frequently used screening questionnaires offer adequate specificity but modest sensitivity for detecting developmental delays among children aged 9 months to 5 years. The results suggest that trade-offs in sensitivity and specificity occurred among the questionnaires, with no one questionnaire emerging superior overall.