To reduce the variability in estradiol (E2) testing and to assure better patient care, standardization of E2 measurements has been recommended. This study aims to assess the accuracy and variability of E2 measurements performed by 11 routine immunological methods and 6 mass spectrometry methods using single donor serum materials and to compare the results to a reference method. The contribution of calibration bias, specificity or matrix effects, and imprecision on the overall variability of individual assays was evaluated. This study showed substantial variability in serum E2 measurements in samples from men and pre- and post-menopausal women. The mean bias across all samples, for each participant, ranged between -2.4% and 235%, with 3 participants having a mean bias of over 100%. The data suggest that calibration bias is the major contributor to the overall variability for nine assays. The analytical performances of most assays measuring E2 concentrations do not meet current needs in research and patient care. Three out of 17 assays would meet performance criteria derived from biological variability of +/- 12.5% bias at concentrations >= 20 pg/mL, and a maximum allowable bias of +/- 2.5 pg/mL at concentrations <20 pg/mL. The sensitivity differs highly between assays. Most assays are not able to measure E2 levels below 10 pg/mL. Standardization, specifically calibration to a common standard by using panels of individual patient samples, can reduce the observed variability and improve the utility of E2 levels in clinical settings.