25 August 2016

Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Akinori Ihara, Kenichi Matsumoto

Proceedings of the International Symposium on Software Reliability Engineering (ISSRE)

Defect prediction models can help Software Quality Assurance (SQA) teams understand their past pitfalls that lead to defective modules. However, the conclusions that are derived from defect prediction models without mitigating redundant metrics issues may be misleading. In this paper, we set out to investigate if redundant metrics issues are affecting defect prediction studies, and its degree and causes of redundancy. Through a case study of 101 publicly-available defect datasets of systems that span both proprietary and open source domains, we observe that (1) 10%-67% of metrics of the studied defect datasets are redundant, and (2) the redundancy of metrics has to do with the aggregation functions of metrics. These findings suggest that researchers should be aware of redundant metrics prior to constructing a defect prediction model in order to maximize internal validity of their studies.


    Author={Jiarpakdee, Jirayus and Tantithamthavorn, Chakkrit and Ihara, Akinori and Matsumoto, Kenichi},
    Title = {A Study of Redundant Metrics in Defect Prediction Datasets},
    Booktitle = {Proceedings of the International Symposium on Software Reliability Engineering (ISSRE)},
    Pages = {To Appear},
    Year = {2016}