Background In most clubfoot studies, the outcome instruments used are designed to evaluate classification or long-term cross-sectional results. Variables deal mainly with factors on body function/structure level. Wide scorings intervals and total sum scores increase the risk that important changes and information are not detected. Studies of the reliability, validity and responsiveness of these instruments are sparse. The lack of an instrument for longitudinal follow-up led the investigators to develop the Clubfoot Assessment Protocol (CAP). The aim of this article is to introduce and describe the CAP and evaluate the items inter- and intra reliability in relation to patient age. Methods The CAP was created from 22 items divided between body function/structure (three subgroups) and activity (one subgroup) levels according to the International Classification of Function, Disability and Health (ICF). The focus is on item and subgroup development. Two experienced examiners assessed 69 clubfeet in 48 children who had a median age of 2.1 years (range, 0 to 6.7 years). Both treated and untreated feet with different grades of severity were included. Three age groups were constructed for studying the influence of age on reliability. The intra- rater study included 32 feet in 20 children who had a median age of 2.5 years (range, 4 months to 6.8 years). The Unweighted Kappa statistics, percentage observer agreement, and amount of categories defined how reliability was to be interpreted. Results The inter-rater reliability was assessed as moderate to good for all but one item. Eighteen items had kappa values > 0.40. Three items varied from 0.35 to 0.38. The mean percentage observed agreement was 82% (range, 62 to 95%). Different age groups showed sufficient agreement. Intra- rater; all items had kappa values > 0.40 [range, 0.54 to 1.00] and a mean percentage agreement of 89.5%. Categories varied from 3 to 5. Conclusion The CAP contains more detailed information than previous protocols. It is a multi-dimensional observer administered standardized measurement instrument with the focus on item and subgroup level. It can be used with sufficient reliability, independent of age, during the first seven years of childhood by examiners with good clinical experience. A few items showed low reliability, partly dependent on the child's age and /or varying professional backgrounds between the examiners. These items should be interpreted with caution, until further studies have confirmed the validity and sensitivity of the instrument.