Address: Faculty of Information Technology, Monash University, Australia. Email: firstname.lastname@example.org
November 03, 2021
I am invited to serve an Editorial Board member for Journal of Software and Systems (JSS), Ranked #2 in Google Scholar.
October 13, 2021
I am glad to receive an ACM SIGSOFT Distinguished Paper Award for our PyExplainer paper published at ASE2021.
October 01, 2021
I am serving a PC member of MSR 2022, ASE 2022, ICSE 2023 Technical Track, and a co-chair of the MSR 2022 Tool/Data Showcase Track.
September 22, 2021
I was invited to give a seminar talk at Oracle Labs, Brisbane, Australia on the topic of AIBugHunter 2.0: Automated Defect Prediction, Explanation, Localization, and Repair. Thanks Paddy Krishnan for your kind invitation!
Our society is now driven by software. However, software defects and technology glitches are very annoying and expensive, and they’re hard to detect and prevent. Errors in safety-critical systems could result in serious injuries and even death (e.g., massive overdose of radiotherapy of Therac-25, to an explosion of the Ariane 5 rocket). We want to prevent this as much as possible.
Defect Prediction Technologies: Catching Software Defects Before Deployment.
Imagine you are a developer working on a software project with million lines of code. Developers have to spend years and years reviewing and testing every single line of code, which is very time-consuming and inefficient. This leads to project overruns and high costs.
To address this problem, I develop defect prediction technologies, i.e., an AI/ML model that is trained on historical data to predict which files/commits are likely to be defective in the future. To date, defect prediction technologies are widely-used in many top software companies (e.g., Microsoft, Google, Facebook, Amazon, Blackberry, Sony, Huawei, Alibaba, Cisco, Ubisoft).
Goal: The goal of my research is to invent the next-generation defect prediction technologies that are practical, explainable, and actionable for practitioners, enabling developers to find software defects faster (e.g., find 20% more defects before the testing phase begins) and enabling managers to better develop software quality improvement plans to prevent defects in the future. Potential benefits include the optimal cost saving of software quality assurance activities that are expensive and time-consuming.
How to make defect prediction models more practical, explainable, and actionable?
My research team aims to make defect prediction models more practical (i.e., precisely locate which lines are defective), more explainable (i.e., accurately explain why a file/commit is predicted as defect), and more actionable (i.e., accurately guide what developers should do or should not do in the future to mitgiate the risk of having defects).
Line-Level Defect Prediction Techniques. The current granularity level of defect predictions is still coarse-grained at the file level, leaving practitioners to spend unnecessary effort on inspecting 97%-99% clean lines that are actually not defective. Practitioners often asked which lines are actually defective.
Explainable Defect Prediction Techniques. Existing defect prediction models have empowered software companies to support a wide range of improved decision-making and policy-making. However, such predictions made by defect models to date have not been explained and well-justified. Specifically, current defect prediction models still fail to explain why models make such a prediction and fail to uphold the privacy laws in terms of the requirement to explain any decision made by an algorithm. A lack of explainability of the predictions of defect models, hindering the adoption of defect models in practice. Practitioners often asked why a file is predicted as defective.
*This research project is financially supported by Australian Research Council’s Discovery Early Career Researcher Award (DECRA 2020-2023).
How to build defect prediction models that produce the most accurate predictions and reliable insights?**
The successful deployment of defect prediction models relies heavily on an in-depth understanding of many intricate details that are associated with the analytical modelling process. However, due to the ubiquitous access to statistical and machine learning toolkits (e.g., R, Weka, Scikit-learn), many users of such modelling toolkits have limited knowledge about many important details (e.g., often missing to deal with correlated variables in defect models). Such limited knowledge often leads to major problems which in turn invalidate the results of software engineering studies and lead to the failure of defect prediction projects in practice.
To address this problem, I develop many practical guidelines on how to build defect prediction models through empirical studies. In particular, I investigate how each of the key experimental components will impact the performance and the interpretation of defect prediction models.
Techniques for Mining Software Defects. Poor quality or noisy defect datasets could lead to inaccurate predictions and insights. We found that techniques for generating ground-truth data is often not accurate, impacting the quality of defect datasets.
Techniques for Predicting Software Defects. There exist a large number of off-the-shelf classification techniques that can be used with a large number of possible combination of hypereparameter settings that can be configured. Sadly, practitioners often asked which techniques and which settings should be used.