Advanced Statistical Models For Software Data

Giancarlo Succi,Milorad Stefanovic,Witold Pedrycz
2001-01-01
Abstract:In this paper, we provide a framework for investigation and quantification of impact of object-oriented design choices on the defects in software systems. We report the initial results of an extensive case study, which strongly reinforce earlier, mainly anecdotal, evidence that design aspects related to inheritance and communication between classes can be used as indicators of the most defect-prone classes.To deal with specifics of software metrics data, statistical models applicable for the non-normally distributed count data are used, such as Poisson regression, negative binomial regression, and zero-inflated negative binomial regression. Alberg diagrams are applied to assess the models' ability to identify the most critical classes in the system.The zero-inflated negative binomial regression model, designed to explicitly model the occurrence of zero counts in the dataset, shows the best ability to describe the high variability in the dependent variable.
What problem does this paper attempt to address?