He is a lecturer (assistant professor) in the School of Computer Science, the University of Adelaide (founded in 1874, one of the top 125 QS World Universities Ranking, and one of the Go8 Australian leading universities). He was a postdoctoral research fellow at Queen’s University, Canada. He holds one of the most prestigious and selective sources of national funding in Japan, i.e., a JSPS Research Fellowship for Young Researchers and a Grants-in-Aid for JSPS Fellow. He won the "Best Ph.D. Student Award" for his Ph.D. study at Nara Institute of Science and Technology, Japan. During his Ph.D. study, he also spent two years as a visiting researcher at Queen’s University, Canada. His research has been published at top-tier software engineering venues, such as, IEEE Transactions on Software Engineering (TSE), Empirical Software Engineering (EMSE), and the International Conference on Software Engineering (ICSE). His Ph.D. thesis aims to improve the fundamentals of predictive modelling for software engineering (e.g., analytics systems for software quality assurance) in order to produce more accurate predictions and reliable insights.

Interests: Big Data Analytics, Data Science, Predictive Modelling in Software Engineering, Empirical Software Engineering, Mining Software Repository, Modern Statistical Analysis

July 03, 2017
I was invited to be a Journal referee of IEEE Access (Impact Factor: 3.244).
June 23, 2017
Our paper on Studying the Dialogue Between Users and Developers of Free Apps in the Google Play Store" has been accepted at International Journal of Empirical Software Engineering! (Impact Factor 2016: 3.275)
June 06, 2017
I will be giving a talk on "Using HPC Resources to Improve the Experimental Design of Software Quality Analytics" at HPCS 2017, Kingston, Ontario, Canada.
May 19, 2017
I attended MSR and ICSE 2017, Buenos Aires, Argentina during 20 - 27 May 2017.

Today software development process depends on a variety of development tools (e.g., issue tracking systems, version control systems, code review, continuous integration, continuous deployment, Q&A website). For example, Github—the largest hosting service of source code in the world—currently hosts over 35 millions software repositories, while the last million repositories were generated within 2 months. Millions of software projects also generate large quantities of unstructured software artifacts at a high frequency (so-called Big Data) in many forms, like issue reports, source code, test cases, code reviews, execution logs, app reviews, developer mailing lists, and discussion threads.

Software analytics is a field that focuses on uncovering interesting and actionable knowledge from the unprecedented amount of data in such repositories in order to improve software development, maintenance, evolution, productivity, quality, and user experience. Indeed, many software organizations are eager to be empowered to make data-driven engineering decisions, rather than relying on gut feeling. Also, they use it to identify new opportunities, leading to smarter business moves, more efficient operations, higher profits and happier customers. For example, Microsoft’s data scientists uncover frequently-used commands of Microsoft Windows, which led to an important re-design of user interfaces. Therefore, such insights give the ability to software companies to work faster – and stay agile – give software organizations a competitive edge they didn't have before.

  1. Innovations for Software Analytics | Improving quality and efficiency throughout the software-development process
  2. With an increasing amount of data on every aspect of our daily activities of software development - from what features do we develop, where bugs are fixed, to who contribute most to the project, and beyond - we are able to measure code and process characteristics, software developers behaviour, and investigate interesting correlations which can be used to predict software quality, maintenance cost, and effort.

    Broadly speaking, I'm interested in applying statistical analysis and machine learning to build predictive models, recommendation and analytics systems in order to (but not limited to):

  3. Technologies for Software Analytics Methods | Exploring new technologies to ensure highly accurate prediction models
  4. While the adoption of software analytics enables software organizations to distill actionable insights, there are still many barriers to broad and successful adoption of analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of big data in software engineering. For example, the risks of not managing and monitoring analytics systems effectively can be catastrophic, as it allows analytics systems to become outdated, leading to potentially erroneous and costly business decisions. Moreover, the use of inappropriate statistical methods could provide incomplete findings due to the veracity of big data in software engineering. Thus, important decisions that are made based on misleading insights can quickly translate into lost revenue.

    Recently, my colleagues and I investigate that (1) noise in defect datasets, (2) the choice of parameter settings of classification techniques, and (3) the choice of model validation has a large impact on the performance and interpretation of defect prediction models. Moreover, my recent work also shows that collinearity and multicollinearity have a large impact on the stability of the insights derived from prediction models.

    Broadly speaking, I'm interested in exploring the following topics to ensure highly accurate and reliable predictions, and insights derived from software analytics (but not limited to):

    • Improve algorithms for software analytics
    • Identify characterization of bias in data preparation, data preprocessing
    • Identify the most appropriate classification, model validation, and model interpretation techniques
    • Provide practical guidelines to ensure the most accurate and reliable predictions and insights

  5. Addressing Big Data Challenges of Software Analytics
  6. Over 700,000 Android apps and over 900,000 iOS apps are available for download. By analyzing the entire ecosystem of apps, we can identify patterns across these apps. However, generalization issues pose a key challenge to academics in big data analytics—each app often has different characteristics (e.g., the size of products, the numbers of developers in a team, and the used technology), which is rarely taken into a consideration.

    In order to draw conclusions that are far more generalizable, while considering the context- specific nature of each mobile app, I, together with my collaborators at Queen's University, introduce the use of mixed-effect models to empirically study the dialog that takes place between users and developers. In particular, we investigate (1) the impact of user-developer dialogs in the Google Play Store; (2) the relationship between the apps as well as reviews characteristics and the likelihood of receiving a developer reply; and (3) the drivers of developer engagement in user-developer dialogs.

    Through a case study of 4.5 millions reviews with 126,686 dialogs that contain messages between the user and the developer across 2,328 free-to-download apps in the Google Play Store, we observe that (1) 34% of the studied mobile apps have at least one developer reply; (2) developers of 78% of the apps that actively engage in the dialogs often reply to long reviews that have a low rating; and (3) ‘asking for increasing review rating’ is the most important driver of developers engagement in user-developer dialogs. These findings suggest that the mechanism for the user-developer dialog in the Google Play Store is a valuable channel for customer support. This mechanism can lead to an increase in the rating that a user gives to an app.

  7. Visualization and Infrastructure Support for Big Data Analytics in Software Engineering
  8. Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a quick, easy way to convey concepts in a universal manner – and you can experiment with different scenarios by making slight adjustments.

    I'm interested in developing visualization and infrastructure to support big data management for software analytics.

  9. Academic Software for Big Data Analytics in Software Engineering
  10. Academic software is a critical component of academic research in software engineering that we use to produce research results. However, academic software in software engineering is rarely available. With the advent of open-source software, open-access, reproducibility, replicability, I'm interested in developing academic software to support research in empirical software engineering. For example, I actively develop and maintain the ScottKnott ESD test --- a statistical test for multiple comparison of treatments.

    1. Studying the Dialogue Between Users and Developers of Free Apps in the Google Play Store

      Safwat Hassan, Chakkrit Tantithamthavorn, Cor-Paul Bezemer, and Ahmed E. Hassan
      International Journal of Empirical Software Engineering (EMSE)
      2017
      PDF
    2. Leveraging HPC Resources to Improve the Experimental Design of Software Analytics

      Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto
      Proceedings of the High Performance Computing Symposium (HPCS)
      2017
      PDF
    3. (PhD Thesis) Towards a Better Understanding of the Impact of Experimental Components on Defect Prediction Models


      Chakkrit Tantithamthavorn
      Nara Institute of Science and Technology
      2016
      PDF
    4. A Study of Redundant Metrics in Defect Prediction Datasets

      Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Akinori Ihara, Kenichi Matsumoto
      Proceedings of the International Symposium on Software Reliability Engineering (ISSRE)
      2016
      PDF
    5. An Empirical Comparison of Model Validation Techniques for Defect Prediction Models


      Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto
      IEEE Transactions on Software Engineering (TSE)
      2017
      PDF
    6. Comments on "Researcher Bias: The Use of Machine Learning in Software Defect Prediction"

      Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto
      IEEE Transactions on Software Engineering (TSE)
      2016
      PDF
    7. Automated Parameter Optimization of Classification Techniques for Defect Prediction Models

      Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto
      The International Conference on Software Engineering (ICSE)
      2016
      19% (101/530)
      PDF
    8. See more...

    Google Scholar released the 2016 version of Scholar Metrics with h5-index and h5-median for the top-20 Conferences and Journals in each area of research! In the area of Software Systems the International Conference on Software Engineering (ICSE) and IEEE Transactions on Software Engineering (TSE) are in the top two positions.

    August 25, 2017
    ICSE 2018, Gothenburg, Sweden
    October 12, 2017
    SANER 2018, Campobasso, Italy
    December 04, 2017
    APSEC 2017, Nanjing, China
    February 03, 2018
    MSR 2018, Gothenburg, Sweden
    March 09, 2018
    FSE 2018, Lake Beuna Vista, Florida, USA
    March 30, 2018
    ICSME 2018, Madrid, Spain
    April 22, 2018
    ASE 2018, Montpellier, France

    September 04, 2017
    ESEC/FSE 2017, Paderborn, Germany
    September 04, 2017
    SSBSE 2017, Paderborn, Germany
    September 17, 2017
    ICSME 2017, Shanghai, China
    September 17, 2017
    VISSOFT 2017, Shanghai, China
    September 17, 2017
    SCAM 2017, Shanghai, China
    October 30, 2017
    ASE 2017, Urbana-Champaign, Illinois, USA
    November 06, 2017
    ESEM 2017, Toronto, Canada
    February 21, 2018
    SANER 2018, Campobasso, Italy
    May 27, 2018
    ICSE 2018, Gothenburg, Sweden
    May 27, 2018
    MSR 2018, Gothenburg, Sweden
    September 03, 2018
    ASE 2018, Montpellier, France
    September 17, 2018
    ICSME 2018, Madrid, Spain
    November 04, 2018
    FSE 2018, Lake Beuna Vista, Florida, USA
    1. Journal Referee

      • IEEE Access, Impact Factor (2016): 3.22.
         2017: 1 manuscript.
    2. Program Committee (PC)

      • The International Conference on Software Maintainance and Evolution (ICSME), 2017
    3. Additional Reviewer

      • The Working Conference on Mining Software Repositories (MSR), 2015
      • The India Software Engineering Conference (ISEC), 2015
      • The IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), 2014
    4. Web (Co)-Chair

      • The International Workshop on Empirical Software Engineering in Practice (IWESEP), 2017
      • The Japan Summer School in Mining Software Repositories (MSR Asia Summit), 2015
      • The Thailand-Japan International Academic Conference (TJIA), 2013
    5. Student Volunteer

      • The International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2016
      • The International Workshop on Empirical Software Engineering in Practice (IWESEP), 2012