Oct20 '15 by Admin

Dr. Menzies submits his paper titled “Negative Results for Software Effort Estimation” to Empirical Software Engineering. This is a joint work with Dr. Ye Yang, Mr. George Mathew, Dr. Barry Boehm and Dr. Jairus Hihn.

Title: Negative Results for Software Effort Estimation


Context: More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Objective: To check if new SEE methods generated better estimates than older methods.

Method: Firstly, collect effort estimation methods ranging from “classical” COCOMO (parametric estimation over a pre-determined set of attributes) to “modern” (reasoning via analogy using spectral-based clustering plus instance and feature selection). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010), run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out “small” differences and (ii) a 99% confident bootstrap procedure to check for statistically different groupings of treatments). Sixthly, repeat the above for some non-COCOMO data sets.

Results: For the non-COCOMO data sets, our newer estimation methods performed better than older methods. However, the major negative result of this paper is that for the COCOMO data sets, nothing we used did any better than Boehm’s original procedure.

Conclusions: In some projects, it is not possible to collect effort data in the COCOMO format recommended by Boehm. For those projects, we recommend using newer effort estimation methods. However, when COCOMO-style attributes are available, we strongly recommend using that data since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.