Background Randomized trials have long been the gold-standard for evaluating clinical practice. There is growing recognition that rigorous studies are similarly needed to assess the effects of policy. However, these studies are rarely conducted. We report on the Quality Improvement Demonstration Study (QIDS), an example of a large randomized policy experiment, introduced and conducted in a scientific manner to evaluate the impact of large-scale governmental policy interventions. Methods In 1999 the Philippine government proposed sweeping reforms in the National Health Sector Reform Agenda. We recognized the unique opportunity to conduct a social experiment. Our ongoing goal has been to generate results that inform health policy. Early on we concentrated on developing a multi-institutional collaborative effort. The QIDS team then developed hypotheses that specifically evaluated the impact of two policy reforms on both the delivery of care and long-term health status in children. We formed an experimental design by randomizing matched blocks of three communities into one of the two policy interventions plus a control group. Based on the reform agenda, one arm of the experiment provided expanded insurance coverage for children; the other introduced performance-based payments to hospitals and physicians. Data were collected in household, hospital-based patient exit, and facility surveys, as well as clinical vignettes, which were used to assess physician practice. Delivery of services and health status were evaluated at baseline and after the interventions were put in place using difference-in-difference estimation. Results We found and addressed numerous challenges conducting this study, namely: formalizing the experimental design using the existing health infrastructure; securing funding to do research coincident with the policy reforms; recognizing biases and designing the study to account for these; putting in place a broad data collection effort to account for unanticipated findings; introducing sustainable policy interventions based on the reform agenda; and providing results in real-time to policy makers through a combination of venues. Conclusion QIDS demonstrates that a large, prospective, randomized controlled policy experiment can be successfully implemented at a national level as part of sectoral reform. While we believe policy experiments should be used to generate evidence-based health policy, to do this requires opportunity and trust, strong collaborative relationships, and timing. This study nurtures the growing attitude that translation of scientific findings from the bedside to the community can be done successfully and that we should raise the bar on project evaluation and the policy-making process.