BackgroundIn occupational safety research, narrative text analysis has been combined with coded surveillance, data to improve identification and understanding of injuries and their circumstances. Injury data give, information about incidence and the direct cause of an injury, while near-miss data enable the, identification of various hazards within an organization or industry. Further, near-miss data provide an, opportunity for surveillance and risk reduction. The National Firefighter Near-Miss Reporting System, (NFFNMRS) is a voluntary reporting system that collects narrative text data on near-miss and injurious, events within the fire and emergency services industry. In recent research, autocoding techniques, using Bayesian models have been used to categorize/code injury narratives with up to 90% accuracy, thereby reducing the amount of human effort required to manually code large datasets. Autocoding, techniques have not yet been applied to near-miss narrative data. MethodsWe manually assigned mechanism of injury codes to previously un-coded narratives from the, NFFNMRS and used this as a training set to develop two Bayesian autocoding models, Fuzzy and Naïve. We calculated sensitivity, specificity and positive predictive value for both models. We also evaluated, the effect of training set size on prediction sensitivity and compared the models’ predictive ability as, related to injury outcome. We cross-validated a subset of the prediction set for accuracy of the model, predictions. ResultsOverall, the Fuzzy model performed better than Naïve, with a sensitivity of 0.74 compared to 0.678., Where Fuzzy and Naïve shared the same prediction, the cross-validation showed a sensitivity of 0.602., As the number of records in the training set increased, the models performed at a higher sensitivity, suggesting that both the Fuzzy and Naïve models were essentially “learning”. Injury records were, predicted with greater sensitivity than near-miss records. ConclusionWe conclude that the application of Bayesian autocoding methods can successfully code both near misses, and injuries in longer-than-average narratives with non-specific prompts regarding injury. Such, coding allowed for the creation of two new quantitative data elements for injury outcome and injury, mechanism.