Objective Medical classification accuracy studies often yield continuous data based on predictive models for treatment outcomes. A popular method for evaluating the performance of diagnostic tests is the receiver operating characteristic (ROC) curve analysis. The main objective was to develop a global statistical hypothesis test for assessing the goodness-of-fit (GOF) for parametric ROC curves via the bootstrap. Design A simple log (or logit) and a more flexible Box-Cox normality transformations were applied to untransformed or transformed data from two clinical studies to predict complications following percutaneous coronary interventions (PCIs) and for image-guided neurosurgical resection results predicted by tumor volume, respectively. We compared a non-parametric with a parametric binormal estimate of the underlying ROC curve. To construct such a GOF test, we used the non-parametric and parametric areas under the curve (AUCs) as the metrics, with a resulting p value reported. Results In the interventional cardiology example, logit and Box-Cox transformations of the predictive probabilities led to satisfactory AUCs (AUC = 0.888; p = 0.78, and AUC = 0.888; p = 0.73, respectively), while in the brain tumor resection example, log and Box-Cox transformations of the tumor size also led to satisfactory AUCs (AUC = 0.898; p = 0.61, and AUC = 0.899; p = 0.42, respectively). In contrast, significant departures from GOF were observed without applying any transformation prior to assuming a binormal model (AUC = 0.766; p = 0.004, and AUC=0.831; p = 0.03), respectively. Conclusions In both studies the p values suggested that transformations were important to consider before applying any binormal model to estimate the AUC. Our analyses also demonstrated and confirmed the predictive values of different classifiers for determining the interventional complications following PCIs and resection outcomes in image-guided neurosurgery.