Today software development process depends on a variety of development tools (e.g., issue tracking systems, version control systems, code review, continuous integration, continuous deployment, Q&A website). For example, Github—the largest hosting service of source code in the world—currently hosts over 35 millions software repositories, while the last million repositories were generated within 2 months. Millions of software projects also generate large quantities of unstructured software artifacts at a high frequency (so-called Big Data) in many forms, like issue reports, source code, test cases, code reviews, execution logs, app reviews, developer mailing lists, and discussion threads.
Software analytics is a field that focuses on uncovering interesting and actionable knowledge from the unprecedented amount of data in such repositories in order to improve software development, maintenance, evolution, productivity, quality, and user experience. Indeed, many software organizations are eager to be empowered to make data-driven engineering decisions, rather than relying on gut feeling. Also, they use it to identify new opportunities, leading to smarter business moves, more efficient operations, higher profits and happier customers. For example, Microsoft’s data scientists uncover frequently-used commands of Microsoft Windows, which led to an important re-design of user interfaces. Therefore, such insights give the ability to software companies to work faster – and stay agile – give software organizations a competitive edge they didn't have before.