We develop a new goodness-of-fit test for validating the performance of probability forecasts. Our test statistic is particularly powerful under sparseness and dependence in the observed data. To build our test statistic, we start from a formal definition of calibrated forecasts, which we operationalize by introducing two components. The first component tests the level of the estimated probabilities; the second validates the shape, measuring the differentiation between high and low probability events. After constructing test statistics for both level and shape, we provide a global goodness-of-fit statistic, which is asymptotically \chi 2 distributed. In a simulation exercise, we find that our approach is correctly sized and more powerful than alternative statistics. In particular, our shape statistic is significantly more powerful than the Kolmogorov-Smirnov test. Under independence, our global test has significantly greater power than the popular Hosmer-Lemeshow's \chi 2 test. Moreover, even under dependence, our global test remains correctly sized and consistent. As a timely and important empirical application of our method, we study the validation of a forecasting model for credit default events. This paper was accepted by Wei Xiong, finance.