Background and Purpose— The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke trials. However, substantial interobserver variability in mRS scoring has been reported. These studies likely underestimate the variability present in multicenter clinical trials, because exploratory work has only been undertaken in single centers by a few observers, all of similar training. We examined mRS variability across a large cohort of international observers using data from a video training resource. Methods— The mRS training package includes a series of “real-life” patient interviews for grading. Training data were collected centrally and analyzed for variability using kappa statistics. We examined variability against a standard of “correct” mRS grades; examined variability by country; and for UK assessors, examined variability by center and by professional background of the observer. Results— To date, 2942 assessments from 30 countries have been submitted. Overall reliability for mRS grading has been moderate to good with substantial heterogeneity across countries. Native English language has had little effect on reliability. Within the United Kingdom, there was no significant variation by profession. Conclusion— Our results confirm interobserver variability in mRS assessment. The heterogeneity across countries is intriguing because it appears not to be related solely to language. These data highlight the need for novel strategies to improve reliability.