Abstract The aim of the present study was to develop reliable and easy-to-use tests for on-farm assessment of sow reactivity to human in different housing systems. A total of 123 gestating sows at different parities and stages of pregnancy were successively subjected to two tests according to their housing systems: an approach (AS) test and a handling (HS) test for the stall-housed sows and an approach (AG) test and a handling (HG) test for the group-housed sows. The tests were video-recorded. Intra- and inter-observer reliabilities, test–retest reliability, and reproducibility according to the experience of the observer were assessed and the effects of testing order and time of the day were studied. Intra-observer reliability was assessed by double video observation of the same tests by one observer and was high to very high (kappa coefficient κ > 0.73). Inter-observer reliability between two observers was moderate to very high ( κ > 0.61). Test–retest reliability (i.e. consistency of the sow response over time) was medium to very high for the AS and HS tests. The effect of observer experience in pig behaviour and/or management on the reliability of video observation of the AS and HS tests was also studied. The naive group obtained higher reliability than the other groups, which were more experienced in behavioural observation and/or pig management. High to very high intra- and inter-observer reliabilities were obtained regardless of the experience of the observers for continuous variables ( κ > 0.83) but not for categorical ones ( κ < 0.71). There was no significant effect of testing order on the sow response in the AG and HG tests, and time of the day had no impact on the sow response to the AS and HS tests. Because they are quick and easy to use in various housing systems and are also reproducible and reliable, those tests are promising tools for assessing sow welfare on farms. To ensure that the reliability of the tests is high, however, appropriate training to the observers and precise definitions of the behavioural responses are required. Moreover, the ability of those tests to discriminate animals according to their reactivity to humans needs to be further investigated. Finally, studying those tests on a higher number of sows with a higher variability in their individual characteristics and housing conditions would help precise their applicability for on-farm use.