01 June 2017

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto

Proceedings of the High Performance Computing Symposium (HPCS)

Software analytics (for software quality purposes) is a statistical or machine learning classifier that is trained to identify defect-prone software modules. The goal of software quality analytics is to help software engineers prioritize their software testing effort on the most risky modules and understand past pitfalls that lead to defective code. While the adoption of software quality analytics enables software organizations to distill actionable insights, there are still many barriers to broad and successful adoption of such analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the correctness of the predictions and the insights that are derived from analytics systems is one of the most important challenges of data science in software engineering. In this work, we conduct a series of empirical investigation to better understand the impact of experimental components (i.e., class mislabelling, parameter optimization of classification techniques, and model validation techniques) on the performance and interpretation of software quality analytics. To accelerate the large amount of compute-intensive experiment, we leverage the High-Performance-Computing (HPC) resources of Centre for Advanced Computing (CAC) from Queen's University, Canada. Through case studies of systems that span both proprietary and open-source domains, we demonstrate that (1) class mislabelling does not impact the precision of software quality analytics; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of software quality analytics; and (3) the out-of-sample bootstrap validation technique produces a good balance between bias and variance of performance estimates. Our results lead us to conclude that the experimental components of analytics modelling impact the predictions and associated insights that are derived from software quality analytics. Empirical investigations on the impact of overlooked experimental components are needed to derive practical guidelines for analytics modelling.