Objective: Hypoglycemia occurs in 20-60% of patients with diabetes mellitus. Identifying at-risk patients can facilitate interventions to lower risk. We sought to develop a hypoglycemia prediction model. Methods: In this retrospective cohort study, urban adults prescribed a diabetes drug between 2004 and 2013 were identified. Demographic and clinical data were extracted from an electronic medical record (EMR). Laboratory tests, diagnostic codes and natural language processing (NLP) identified hypoglycemia. We compared multiple logistic regression, classification and regression trees (CART), and random forest. Models were evaluated on an independent test set or through cross-validation. Results: The 38,780 patients had mean age 57 years; 56% were female, 40% African-American and 39% uninsured. Hypoglycemia occurred in 8128 (539 identified only by NLP). In logistic regression, factors positively associated with hypoglycemia included infection, non-long-acting insulin, dementia and recent hypoglycemia. Negatively associated factors included long-acting insulin plus sulfonylurea, and age 75 or older. The models' area under curve was similar (logistic regression, 89%; CART, 88%; random forest, 90%, with ten-fold cross-validation). Conclusions: NLP improved identification of hypoglycemia. Non-long-acting insulin was an important risk factor. Decreased risk with age may reflect treatment or diminished awareness of hypoglycemia. More complex models did not improve prediction.