Abstract This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires specific and constrained tasks to produce results that mimic human judgements. The conclusion of a National Assessment of Educational Progress (NAEP) report on the online assessment of writing that ‘the automated scoring of essay responses did not agree with the scores awarded by human readers’ is discussed. The article presents the results of a trial in which two software programmes for scoring writing test responses were compared with the results of the human scoring of a broad and open writing test. The trial showed that ‘automated essay scoring’ (AES) did not grade the broad and open writing task responses as reliably as human markers.