Monitoring of flows in sewer systems is increasingly applied to calibrate urban drainage models used for long-term simulation. However, most often models are calibrated without considering the uncertainties. The generalized likelihood uncertainty estimation (GLUE) methodology is here applied to assess parameter and flow simulation uncertainty using a simplified lumped sewer model that accounts for three separate flow contributions: wastewater, fast runoff from paved areas, and slow infiltrating water from permeable areas. Recently GLUE methodology has been critisised for generating prediction limits without statistical coherence and consistency and for the subjectivity in the choice of a threshold value to distinguish "behavioural" from "non-behavioural" parameter sets. In this paper we examine how well the GLUE methodology performs when the behavioural parameter sets deduced from a calibration period are applied to generate prediction bounds in validation periods. By retaining an increasing number of parameter sets we aim at obtaining consistency between the GLUE generated 90% prediction limits and the actual containment ratio (CR) in calibration. Due to the large uncertainties related to spatiooral rain variability during heavy convective rain events, flow measurement errors, possible model deficiencies as well as epistemic uncertainties, it was not possible to obtain an overall CR of more than 80%. However, the GLUE generated prediction limits still proved rather consistent, since the overall CRs obtained in calibration corresponded well with the overall CRs obtained in validation periods for all proportions of retained parameter sets evaluated. When focusing on wet and dry weather periods separately, some inconsistencies were however found between calibration and validation and we address here some of the reasons why we should not expect the coverage of the prediction limits to be identical in calibration and validation periods in real-world applications. The large uncertainties result in wide posterior parameter limits, that cannot be used for interpretation of, for example, the relative size of paved area vs. the size of infiltrating area. We should therefore try to learn from the significant discrepancies between model and observations from this study, possibly by using some form of non-stationary error correction procedure, but it seems crucial to obtain more representative rain inputs and more accurate flow observations to reduce parameter and model simulation uncertainty. © Author(s) 2013.