Federal and state agencies are considering ICU performance assessment and public reporting; however, an accurate method for measuring performance must be selected. In this study, we determine whether a substantial variation in ICU mortality performance still exists in modern ICUs, and compare the predictive accuracy, reliability, and data burden of existing ICU risk-adjustment models. A retrospective chart review of 11,300 ICU patients from 35 California hospitals from 2001 to 2004 was performed. We calculated standardized mortality ratios (SMRs) for each hospital using the mortality probability model III (MPM(0) III), the simplified acute physiology score (SAPS) II, and the acute physiology and chronic health evaluation (APACHE) IV risk-adjustment models. We compared discrimination, calibration, data reliability, and abstraction time for the models. Regardless of the model used, there was a large variation in SMRs among the ICUs studied. The discrimination and calibration were adequate for all risk-adjustment models. APACHE IV had the best discrimination (area under the receiver operating characteristic curve [AUC], 0.892) compared to MPM(0) III (AUC, 0.809), and SAPS II (AUC, 0.873; p < 0.001). The models differed substantially in data abstraction times, as follows: MPM(0)III, 11.1 min (95% confidence interval [CI], 8.7 to 13.4); SAPS II, 19.6 min (95% CI, 17.0 to 22.2); and APACHE IV, 37.3 min (95% CI, 28.0 to 46.6). We found substantial variation in the ICU risk-adjusted mortality rates that persisted regardless of the risk-adjustment model. With unlimited resources, the APACHE IV model offers the best predictive accuracy. If constrained by cost and manual data collection, the MPM(0) III model offers a viable alternative without a substantial loss in accuracy.