Abstract Objective. Because external validation of the present models has not been reported, the purpose of the present study was to assess existing diagnostic models that are used to distinguish malignant from benign masses. Methods. We tested the performance of existing models in a prospectively assembled data set of 170 patients with an adnexal mass. Twenty-one models that have been reported previously were assessed. The models were based on combinations of ultrasound findings, color Doppler tests, CA-125 measurement, age, and/or menopausal status. For each model, we constructed ROC curves and calculated an area under the ROC curve. Results. Of the 170 adnexal masses that were operated on, 30 (18%) were malignant. The area under the ROC curve of 21 models that were externally validated varied between 0.69 and 0.90. We found the performance of the existing models to be inferior to the performance reported in the initial studies. Even models that incorporated multiple diagnostic tools and that were developed using logistic regression models or neural networks had an area under the ROC curve of 0.86 at maximum. In the case where we focused on almost perfect sensitivity, the highest specificities varied between 0.45 and 0.60. Conclusion. Although diagnostic models might be of value in the preoperative assessment of the adnexal mass, their diagnostic performance is not as good as that reported in the original publications.