Affordable Access

On the distribution of source code file sizes

E.T.S.I. Caminos, Canales y Puertos (UPM)
Publication Date
  • Informática


Source code size is an estimator of software effort. Size is also often used to calibrate models and equations to estimate the cost of software. The distribution of source code file sizes has been shown in the literature to be a lognormal distribution. In this paper, we measure the size of a large collection of software (the Debian GNU/Linux distribution version 5.0.2), and we find that the statistical distribution of its source code file sizes follows a double Pareto distribution. This means that large files are to be found more often than predicted by the lognormal distribution, therefore the previously proposed models underestimate the cost of software.

There are no comments yet on this publication. Be the first to share your thoughts.