26 February 2021

Chanathip Pornprasit, Chakkrit Tantithamthavorn

International Conference on Mining Software Repositories (MSR)

A Just-In-Time (JIT) defect prediction model is a classifier to predict if a commit is defect-introducing, to help developers prioritize their limited Software Quality Assurance resources on the most risky commits. Recently, CC2Vec---a deep learning approach for Just-In-Time defect prediction---has been proposed. However, CC2Vec requires the whole dataset (i.e., training + unseen) for model training, which does not mimic the realistic model training scenario. Our replication study shows that, after excluding the unseen dataset for model training, the F-measure of CC2Vec is decreased by 38.5% for OpenStack and 45.7% for Qt, highlighting the importance of considering realistic model training scenario for Just-In-Time defect prediction. In addition, CC2Vec cannot perform fine-grained predictions at the line level (i.e., which lines are most risky for a given commit). In this paper, we propose JITLine---a Just-In-Time defect prediction approach for predicting defect-introducing commits and identifying lines that are associated with that defect-introducing commit (i.e., defective lines). Through a case study of 37,524 commits from OpenStack and Qt, we find that our JITLine approach is 390\%-475\% more accurate, 22\%-57\% more cost-effective, 70-300 times faster than the state-of-the-art approaches (i.e., CC2Vec and DeepJIT) and the fine-grained predictions at the line level by our approach are 100\% more accurate than the baseline NLP approach. Therefore, our JITLine approach should be considered for prioritizing defect-introducing commits and identifying defective lines.

  author = {Pornprasit, Chanathip and Tantithamthavorn, Chakkrit},
  title = ,
  booktitle = {Proceedings of the International Conference on Mining Software Repositories (MSR)},
  year = {2021},
  pages = {To Appear},