Summary Objective To study the reliability and validity of a scoring instrument for the assessment of neonatal resuscitation skills in a training setting. Methods Fourteen paediatric residents performed a neonatal resuscitation on a manikin, while being recorded with a video camera. The videotapes were analysed using an existing scoring instrument with an established face and content validity, adjusted for use in a training setting. Intra- and inter-rater reliability were assessed by comparing the ratings of the videotapes of three raters, one of who rated the videotapes twice. Intra-class coefficients (ICC) were calculated for the sum score, percentages of agreement and kappa coefficients for the individual items. To study construct validity, the performance of a second resuscitation of by residents was assessed after they had received feedback on their first performance. Results The ICC were 0.95 and 0.77 for intra- and inter-rater reliability, respectively. The median percentage of intra-rater agreement was 100%; inter-rater agreement 78.6–84.0%. The median kappa was 0.85 for intra-rater reliability, and 0.42–0.59 for inter-rater reliability. Residents showed a 10% improvement (95% confidence interval −4; 23%) on performance of a second resuscitation, which supports the instrument's construct validity. Conclusion A useful and valid instrument with good intra-rater and reasonable inter-rater reliability is now available for the assessment of neonatal resuscitation skills in a training setting. Its reliability can be improved by using a more advanced manikin and by training of the raters.