He is a lecturer in the School of Computer Science, the University of Adelaide (one of the top 125 QS World Universities Ranking, and a Go8 Australian leading university). He pioneered research on the pitfalls of software analytics modelling in order to provide avoidance strategies. His research has been published at top-tier software engineering venues, such as IEEE Transactions on Software Engineering (TSE), Empirical Software Engineering (EMSE), and the International Conference on Software Engineering (ICSE). During his Ph.D. study, he won one of the most prestigious and selective sources of national funding in Japan, i.e., a JSPS Research Fellowship for Young Researchers and a Grants-in-Aid for JSPS Fellow, and won the "Best Ph.D. Student Award".
Interests: Big Data Analytics, Data Science, Predictive Modelling in Software Engineering, Empirical Software Engineering, Mining Software Repository, Modern Statistical Analysis
I'm actively recruiting (self-motivated and active) Master/Ph.D. students who are interested in research about big data analytics for software engineering. Check Join Us! for more information. Please contact me if you are interested.
Research Summary | Push the boundaries of software engineering through data science
Today software development process depends on a variety of development tools (e.g., issue tracking systems, version control systems, code review, continuous integration, continuous deployment, Q&A website). For example, Github—the largest hosting service of source code in the world—currently hosts over 35 millions software repositories, while the last million repositories were generated within 2 months. Millions of software projects also generate large quantities of unstructured software artifacts at a high frequency (so-called Big Data) in many forms, like issue reports, source code, test cases, code reviews, execution logs, app reviews, developer mailing lists, and discussion threads.
Software analytics is a field that focuses on uncovering interesting and actionable knowledge from the unprecedented amount of data in such repositories in order to improve software development, maintenance, evolution, productivity, quality, and user experience. Indeed, many software organizations are eager to be empowered to make data-driven engineering decisions, rather than relying on gut feeling. Also, they use it to identify new opportunities, leading to smarter business moves, more efficient operations, higher profits and happier customers. For example, Microsoft’s data scientists uncover frequently-used commands of Microsoft Windows, which led to an important re-design of user interfaces. Therefore, such insights give the ability to software companies to work faster – and stay agile – give software organizations a competitive edge they didn't have before.
Innovations for Software Analytics | Improving quality and efficiency throughout the software-development process
With an increasing amount of data on every aspect of our daily activities of software development - from what features do we develop, where bugs are fixed, to who contribute most to the project, and beyond - we are able to measure code and process characteristics, software developers behaviour, and investigate interesting correlations which can be used to predict software quality, maintenance cost, and effort.
Broadly speaking, I'm interested in applying statistical analysis and machine learning to build predictive models, recommendation and analytics systems in order to (but not limited to):
Predict software defects, risk of software changes, cost, effort, issue delays, and bug fixing time
Understand the characteristics of software defects, software development process, and developer behaviours
Technologies for Software Analytics Methods | Exploring new technologies to ensure highly accurate prediction models
While the adoption of software analytics enables software organizations to distill actionable insights, there are still many barriers to broad and successful adoption of analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of big data in software engineering. For example, the risks of not managing and monitoring analytics systems effectively can be catastrophic, as it allows analytics systems to become outdated, leading to potentially erroneous and costly business decisions. Moreover, the use of inappropriate statistical methods could provide incomplete findings due to the veracity of big data in software engineering. Thus, important decisions that are made based on misleading insights can quickly translate into lost revenue.
Broadly speaking, I'm interested in exploring the following topics to ensure highly accurate and reliable predictions, and insights derived from software analytics (but not limited to):
Improve algorithms for software analytics
Identify characterization of bias in data preparation, data preprocessing
Identify the most appropriate classification, model validation, and model interpretation techniques
Provide practical guidelines to ensure the most accurate and reliable predictions and insights
Addressing Big Data Challenges of Software Analytics
Over 700,000 Android apps and over 900,000 iOS apps are available for download. By analyzing the entire ecosystem of apps, we can identify patterns across these apps. However, generalization issues pose a key challenge to academics in big data analytics—each app often has different characteristics (e.g., the size of products, the numbers of developers in a team, and the used technology), which is rarely taken into a consideration.
In order to draw conclusions that are far more generalizable, while considering the context- specific nature of each mobile app, I, together with my collaborators at Queen's University, introduce the use of mixed-effect models to empirically study the dialog that takes place between users and developers. In particular, we investigate (1) the impact of user-developer dialogs in the Google Play Store; (2) the relationship between the apps as well as reviews characteristics and the likelihood of receiving a developer reply; and (3) the drivers of developer engagement in user-developer dialogs.
Through a case study of 4.5 millions reviews with 126,686 dialogs that contain messages between the user and the developer across 2,328 free-to-download apps in the Google Play Store, we observe that (1) 34% of the studied mobile apps have at least one developer reply; (2) developers of 78% of the apps that actively engage in the dialogs often reply to long reviews that have a low rating; and (3) ‘asking for increasing review rating’ is the most important driver of developers engagement in user-developer dialogs. These findings suggest that the mechanism for the user-developer dialog in the Google Play Store is a valuable channel for customer support. This mechanism can lead to an increase in the rating that a user gives to an app.
Visualization and Infrastructure Support for Big Data Analytics in Software Engineering
Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a quick, easy way to convey concepts in a universal manner – and you can experiment with different scenarios by making slight adjustments.
I'm interested in developing visualization and infrastructure to support big data management for software analytics.
Academic Software for Big Data Analytics in Software Engineering
Academic software is a critical component of academic research in software engineering that we use to produce research results. However, academic software in software engineering is rarely available. With the advent of open-source software, open-access, reproducibility, replicability, I'm interested in developing academic software to support research in empirical software engineering. For example, I actively develop and maintain the ScottKnott ESD test --- a statistical test for multiple comparison of treatments.
Google Scholar released the 2016 version of Scholar Metrics with h5-index and h5-median for the top-20 Conferences and Journals in each area of research! In the area of Software Systems the International Conference on Software Engineering (ICSE) and IEEE Transactions on Software Engineering (TSE) are in the top two positions.