impact

Research Impact

Recognition of Research Excellence

I have a consistent track-record of achievements, awards, and grants. To date, I receive the substantial amount of grant funding over AU$2,000,000+, demonstrating the recognition of the excellence of my research. In addition, I also invest significant effort in funding my research projects from an early career stage, which allows me to maintain a strong research profile.

  • 2023: ACM SIGSOFT Distinguished Reviewer Award at ICSE’23.
  • 2022: CSIRO’s Next Generation Graduates AI Program, Responsible AI Software Engineering (lead-CI, $1.2m).
  • 2021: ACM SIGSOFT Distinguished Paper Award 2021.
  • 2020: Australian Research Council’s Discovery Early Career Researcher Awards (DECRA) (sole-CI, AU$404,000).
  • 2019: Medical Research Future Fund (MRFF) (co-CI, AU$250,000).
  • 2019: FIT Early Career Researcher Seed Grant (sole-CI, AU$15,000).
  • 2017: Japan Society for the Promotion of Science’s Research Fellowship (sole-CI, AU$85,000).
  • 2016: NAIST Best Ph.D Student Award.
  • 2015: NEC C&C Research Award for Non-Japanese Researcher (sole-CI, JPY 200,000)
  • 2015: Outstanding Paper Award for Young NEC C&C Researchers by NEC C&C Cooperation, Tokyo, Japan.
  • 2015: ACM SIGSOFT CAPS Merit-based Award.

Impact to Australians

Successful deployment of the ED patients wait-time prediction software in many hospitals in australia to improve patient experience.

I tremendously enjoy working with this ED Wait-Time Predictive Analytics project. I contributed significantly by applying data analytics technologies to address a healthcare problem. Working closely with A/Prof Katie Walker (Emergency Physician) and Prof Burak Turhan, we developed predictive analytics to estimate patient wait-time in emergency rooms. I significantly contributed my technical expertise on regression modelling, machine learning, data analytics, explainable AI to develop a predictive model and interpret the results. I have learned that the development of predictive analytics is not just the algorithm itself, but it is more about how to deal with many software engineering challenges including data privacy, ethical concerns, data cleaning, data analysis, model analysis, testing the prediction fairness, model deployment. Currently, our work on the ED Wait-Time Predictive Analytics has been successfully deployed at many hospitals in Australia including St. Vincent Hospital, Cabrini Institute, Box Hill Hospital, Maroonda Hospital, and Angliss Hospital. This project leads to a significant improvement of patient experience of emergency care, ambulance workload distribution (by diverting appropriate patients to less busy facilities), offload time performance (and response time), and emergency staff experience.

Impact to Software Industry

Helping developers find bugs faster.

Software defects are prevalent, but hard to predict, and to prevent. In software engineering discipline, the prevention is always better than cure (i.e., the cost of defect fixing is much cheaper if software defects are early detected before launching a product to customers). Hence, Software Quality Assurance (SQA) activities (e.g., code review, software testing) are widely used in software industry to ensure that software defects are early detected and fixed. However, it is infeasible and impractical to exhaustively review and test every single lines of code of a whole software system due to limited SQA resources.

Aligned with my core research expertise (Defect Prediction), I regularly engage, involve, and provide expert advice to help practitioners in many companies to design, develop, and deploy defect prediction models in practicce, enabling developers to effectively prioritise their limited resources on the highest risk area of source code and enabling managers to chart appropriate quality improvement plans.

Media Coverage

Impact to Research Community

Developing the First Online Tutorial Book on “Explainable AI for Software Engineering”

The success of software engineering projects largely depends on complex decision-making. For example, which tasks should a developer do first, who should perform this task, is the software of high quality, is a software system reliable and resilient enough to deploy, etc. However, erroneous decision-making for these complex questions is costly in terms of money and reputation. Thus, Artificial Intelligence/Machine Learning (AI/ML) techniques have been widely used in software engineering for developing software analytics tools and techniques to improve decision-making, developer productivity, and software quality. However, the predictions of such AI/ML models for software engineering are still not practical (i.e., fine-grained), not explainable, and not actionable. These concerns often hinder the adoption of AI/ML models in software engineering practices. In addition, many recent studies still focus on improving the accuracy, while a few of them focus on improving explainability. Are we moving in the right direction? How can we better improve the SE community (both research and education)? In this book, we first provide a concise yet essential introduction to the most important aspects of Explainable AI and a hands-on tutorial of Explainable AI tools and techniques. Then, we introduce the fundamental knowledge of defect prediction (an example application of AI for Software Engineering). Finally, we demonstrate three successful case studies on how Explainable AI techniques can be used to address the aforementioned challenges by making the predictions of software defect prediction models more practical, explainable, and actionable. To date, the book has reached more than 10,000 people with more than 5,000 page views.

Developing the ScottKnott ESD R package.

The ScottKnott ESD R package is widely used with more than 14,000 downloads by students, researchers, and practitioners in the past 4 years. It is a multiple comparisons test to rank any techniques into statistically distinct ranks with non-negligible effect size (published as part of TSE’17). The ScottKnott ESD R package is proposed to address the shortcomings of the Nemenyi test which often produces overlapping groups without considering the magnitude of the differences. To date, the ScottKnott ESD R package is widely used in the software engineering community, receiving more than 14,000 downloads and more than 200 citations for TSE’17. The ScottKnott ESD R package becomes a fundamental research toolset to rank and identify (1) the most important factors from regression analysis; (2) the top-performing feature selection techniques, classification techniques, model validation techniques for software defect predictions; (3) the top-performing techniques for code clone detection, code smell detection, test smell detection (4) factors from a survey study (e.g., which factors are perceived as most useful by practitioners?).

Developing a practical guideline for developing software analytics.

I developed a practical guideline for devloping software analytics through empirical studies. In particular, I investigate the impact of the modelling components on the predictions and interpretation of defect prediction models throughout the modelling pipeline process (i.e., from mining software defects, analysing software defects, modelling defect models, evaluating defect models, explaining defect models). Each experimental component needs a deep empirical investigation to quantify the impact of each component on the predictions and insights derived from defect prediction models. Reviewer 2 of my ICSE-SEIP’18 also commented that “This is a much-needed contribution both for researchers and for practitioners: Researchers will find a checklist for the quality assurance of their defect modelling methods. Practitioners, that is software quality experts in companies, will avoid a false interpretation of their data.”

To date, these guidelines have been well-adopted by practitioners (e.g., Blackberry, Alibaba), and researchers (e.g., effort estimation, code clone analytics, code smell analytics, code review). Several of my work has been recognised at the national and the international levels. For example, Professor Ahmed E. Hassan (IEEE Fellow, 19,000+ citations) also mentioned my ICSE-SEIP’18 paper during in the keynote speech at ICSME’20 (CORE A) to highlight the importance of the rigorous statistical modelling in empirical software engineering research. Dr Thomas Zimmermann (17,000+ citations, IEEE Fellow, a Senior Principal Researcher from Microsoft Research USA) also mentioned my ICSE-SEIP’18 paper in the FSE’19 Keynote. Finally, I have been invited to give a tutorial on Software Analytics in Action: A Hands-on Tutorial on Analysing and Modelling Software Data at several conferences and universities, e.g., Mining Software Repositories (MSR’2019) and Australian Computer Science Week (ACSW’2020), the University of Wollongong, and the University of Melbourne.

My 60-min tutorial talk at MSR’19 is tremendously successful with nearly 100 attendees ranging from senior professors to young PhD students. To date, the materials have attracted more than 5,700 views (https://www.slideshare.net/klainfo/explainable-artificial-intelligence-xai-to-predict-and-explain-future-software-defects). The tutorial is highly interactive and innovative through the use of various educational technologies including Slido.io, Jupyter Notebook, Docker, Binder, where the participants can reproduce the tutorial anytime and anywhere.

Promoting the Reproducibility, Replicability, and Open Science in Software Engineering.

In 2018, I am a founding member of the Open Science Initiative of Empirical Software Engineering (IF=3.478) (https://www.springer.com/journal/10664/editors). As a member, I committed to review the replication package, ensure reproducibility and replicability, assure the quality of the replication package, provide constructive feedback for improvement, and recommend a final decision. In 2020, I’m a Co-Chair of the Artefact and ROSE (Replication and Reproducibility in Software Engineering) Festival of the International Conference of Software Maintenance and Evolution (ICSME 2020, CORE A). The goal of the ROSE community is aim at fostering research progress via transparency, reproducibility, and replicability. The ROSE Festival that I co-chaired is tremendously successful with a 200% increase of the number of artifact submissions when compared to 2019, and a large number of 50 participants with a lot of interactive discussion

Community Services

I regularly serve an editorial board member, a conference organiser, a program committee member, an external PhD examiner, and a referee of research agencies (e.g., Australian Research Council).

Editorial Member

Conference Organiser/Co-Chair

  • 2023: Co-Organiser and a Publicity and Social Media Co-Chair of the International Conference of Software Engineering (ICSE 2023), Melbourne, Australia.
  • 2021: Co-Organiser and a Publicity and Social Media Co-Chair of the International Conference of Automated Software Engineering (ASE 2021), Melbourne, Australia.
  • 2020: Co-Organiser and a Co-Chair of the Late-Breaking Results Track of the International Conference of Automated Software Engineering (ASE 2020), Melbourne, Australia.
  • 2020: Co-Organiser and a Co-Chair of the Artefact and ROSE (Replication and Reproducibility in Software Engineering) Track of the International Conference of Software Maintenance and Evolution (ICSME 2020), Adelaide, Australia

Program Committee Member / Journal Referee

  • 2022: ICSE’22,
  • 2021: ICSE’21, ASE’21, MSR’21, TSE, EMSE, JSS, IST, TOSEM.
  • 2020: FSE’20, ASE’20, MSR’20, MSR-Registered Report’20, ICSME’20, SANER-ERA’20, TSE, EMSE, ASE, JSS, IST, TOSEM.
  • 2019: SANER’19, SANER-ERA’19, ASE-ToolDemo’19, TSE, EMSE, ASE, JSS, JSEP, IST.
  • 2018: MSR Data Showcase’18, ICPC-Tool’18, PROMISE’18, ROSE@ESEC/FSE’18, JSS, AUSE, EMSE, TSE.
  • 2017: ICSME’17, SANER-Tool’17, TSE, IST, JSS.
  • 2016: SANER-Tool’16, TSE.
  • 2015: EMSE.