What is it about?

In this work, we show that a machine-learning model built just using the first 150 data points of a software engineering project and two features predict software defects just as (and better) than those that are built using thousands of data points and many features (data-hungry models). The data-lite method devised in this work augments transfer learning methods speeds up the transfer learning processes, and identifies defective changes in software projects than machine learning methods that use complex ensemble and tuning (hyper-parameter optimization) approaches.

Featured Image

Why is it important?

Fixing software defects is not cheap, therefore it is very useful to prevent them. The techniques used to predict software defects using machine learning are data-hungry. Data-hungry methods are computing heavy (more memory and processing power) and are not explainable. As we march towards the end of Moore's Law it is essential to build models that not only consume less memory and processing power but are also green (friendly to the environment) and fairer (explainable).


A prevalent misconception is to think that more data is inherently better to make an accurate prediction. Very soon there will be much traction in the Artificial Intelligence space to seek methods to build models very cheaply. In this paper, we offer shortcuts to simplify software analytics. We can achieve similar performance with data-lite machine learning models than using data-hungry machine learning models.

Shrikanth N.C.
North Carolina State University

Read the Original

This page is a summary of: Assessing the Early Bird Heuristic (for Predicting Project Quality), ACM Transactions on Software Engineering and Methodology, July 2023, ACM (Association for Computing Machinery), DOI: 10.1145/3583565.
You can read the full text:




The following have contributed to this page