Improving the Software Estimation Models Based on Functional Size

through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form

Valdés-Souto, F. & Naranjo-Albarrán, L.

For more than six decades, software cost/effort estimation has been a relevant topic for research because of its impact on the industry. Although there are many estimation models, regression-based estimation approaches have been used in the literature. However, a lack of correct statistical principles and assumptions is observed, especially when wadge-shape databases are considered to create the models, making the estimation performance comparison between estimation models valueless, obtaining low accuracy, and lacking information about the accuracy and confidence. Objective: propose a statistical analysis, principles, and assumptions when a regression-based model is generated using a database that presents a wedge-shape form to improve the estimation model. Method: we use the Mexican Software Metrics Association (AMMS) reference database that presents the wedge-shape form to develop a case study to demonstrate the estimation model improvement when statistical principles and assumptions are applied correctly. The methods are: including categorical variables; applying transformations to improve the assumptions about homoscedasticity and normality, and to avoid influential observations; and applying variable selection methods to avoid multicollinearity problems. Results: a case study was developed using the well-accepted statistics practices defined in the procedure proposed. The results show the improvement in the original estimation model accuracy considering the R‑squared, but most important regarding the accomplish of the principles required to apply a linear regression model even when the database presets a wedge-shaped form. The gain is also observed with the intervals between which the estimates are into with the 95% confidence level, enabling a better decision in software projects considering additional information about the estimation confidence, tackling the scientifically valueless in the estimation models, when the regression techniques are applied incorrectly.

Available through:

Publishing date
URL (external source)
Programming and Computer Software volume 47, pages673–693 (2021)
COSMIC version
Download option
Allow direct download