research

My research is focused on developing AI-enabled software development techniques and tools in order to improve developers’ productivity, make better data-informed decisions, and better improve the quality of software systems. His resaerch have contributed to various activities in software development, including AI for Software Defects and Cybersecurity, AI for Code Review, AI for Agile, and Explainable AI for Software Engineering.

Our society is now driven by software. However, software defects and technology glitches are very annoying and expensive, and they are very hard to detect and prevent. Errors in safety-critical systems could result in serious injuries and even death (e.g., massive overdose of radiotherapy of Therac-25, to an explosion of the Ariane 5 rocket). We want to prevent this as much as possible.

Defect Prediction Technologies: Catching Software Defects Before Deployment.

Imagine you are a developer working on a software project with million lines of code. Developers have to spend years and years reviewing and testing every single line of code, which is very time-consuming and inefficient. This leads to project overruns and high costs.

To address this problem, I develop defect prediction technologies, i.e., an AI/ML model that is trained on historical data to predict which files/commits are likely to be defective in the future. To date, defect prediction technologies are widely-used in many top software companies (e.g., Microsoft, Google, Facebook, Amazon, Blackberry, Sony, Huawei, Alibaba, Cisco, Ubisoft).

Goal: The goal of my research is to invent the next-generation defect prediction technologies that are practical, explainable, and actionable for practitioners, enabling developers to find software defects faster (e.g., find 20% more defects before the testing phase begins) and enabling managers to better develop software quality improvement plans to prevent defects in the future. Potential benefits include the optimal cost saving of software quality assurance activities that are expensive and time-consuming.

Making defect prediction models more practical, explainable, and actionable.

My research team aims to make defect prediction models more practical (i.e., precisely locate which lines are defective), more explainable (i.e., accurately explain why a file/commit is predicted as defect), and more actionable (i.e., accurately guide what developers should do or should not do in the future to mitgiate the risk of having defects).

  1. Line-Level Defect Prediction Techniques. The current granularity level of defect predictions is still coarse-grained at the file level, leaving practitioners to spend unnecessary effort on inspecting 97%-99% clean lines that are actually not defective. Practitioners often asked which lines are actually defective.
  2. Explainable Defect Prediction Techniques. Existing defect prediction models have empowered software companies to support a wide range of improved decision-making and policy-making. However, such predictions made by defect models to date have not been explained and well-justified. Specifically, current defect prediction models still fail to explain why models make such a prediction and fail to uphold the privacy laws in terms of the requirement to explain any decision made by an algorithm. A lack of explainability of the predictions of defect models, hindering the adoption of defect models in practice. Practitioners often asked why a file is predicted as defective.

*This research project is financially supported by Australian Research Council’s Discovery Early Career Researcher Award (DECRA 2020-2023).

Making defect prediction models that produce the most accurate predictions and reliable insights.

The successful deployment of defect prediction models relies heavily on an in-depth understanding of many intricate details that are associated with the analytical modelling process. However, due to the ubiquitous access to statistical and machine learning toolkits (e.g., R, Weka, Scikit-learn), many users of such modelling toolkits have limited knowledge about many important details (e.g., often missing to deal with correlated variables in defect models). Such limited knowledge often leads to major problems which in turn invalidate the results of software engineering studies and lead to the failure of defect prediction projects in practice.

To address this problem, I develop many practical guidelines on how to build defect prediction models through empirical studies. In particular, I investigate how each of the key experimental components will impact the performance and the interpretation of defect prediction models.

  1. Techniques for Mining Software Defects. Poor quality or noisy defect datasets could lead to inaccurate predictions and insights. We found that techniques for generating ground-truth data is often not accurate, impacting the quality of defect datasets.
  2. Techniques for Analysing Software Defects. Defect datasets are highly imbalanced with a defective ratio of <10%. Defect models trained on class imbalance datasets often produce inaccurate models.
  3. Techniques for Predicting Software Defects. There exist a large number of off-the-shelf classification techniques that can be used with a large number of possible combination of hypereparameter settings that can be configured. Sadly, practitioners often asked which techniques and which settings should be used.
  4. Techniques for Explaining Software Defects.

**This research project is financially supported by Japan Society for the Promotion of Science’s Research Fellowship (JSPS DC2 2016-2018, 4,800,000 JPY), NEC C&C Research Grant for Non-Japanese Researchers (2014-2015, 1,500,000 JPY).

His research is focused on developing AI-enabled software development techniques and tools in order to improve developers’ productivity, make better data-informed decisions, and better improve the quality of software systems. His resaerch have contributed to various activities in software development, including AI for Software Defects and Cybersecurity, AI for Code Review, AI for Agile, and Explainable AI for Software Engineering.