What is it about?

Modern artificial intelligence systems rely on extremely large datasets, but training on all available data is expensive, time‑consuming, and energy‑intensive. This research explores whether neural networks really need to see all training data at all times, or whether performance can be maintained by carefully selecting which data is used at different stages of training. The paper introduces a new method called dynamic data inclusion with a sliding window. Instead of training a model on the full dataset every time, the training data is ordered according to how “easy” or “hard” each example is to learn. This difficulty is estimated using a dimensionality‑reduction technique that measures how close each data point is to the centre of its class. Easier examples are closer to the centre, while harder examples lie further away. During training, only a portion of the data is used in each epoch, and this portion gradually changes over time as the “window” slides across the dataset. The method was tested on five widely used image datasets, ranging from simple handwritten digits to complex real‑world images. The results show that large reductions in training time and computational cost are possible—sometimes exceeding 80% runtime reduction—while maintaining almost the same classification accuracy.

Featured Image

Why is it important?

This work is important because the environmental and financial cost of training AI models is rapidly increasing. As models grow larger and datasets expand, efficiency is becoming just as important as accuracy. What makes this research distinctive is that it moves beyond random data removal. Instead, it introduces a structured, explainable way of deciding which data matters most at different stages of learning. The findings show that many models can learn effectively by starting with easier examples and gradually introducing harder ones, reducing unnecessary computation. These insights are especially relevant for sustainable AI, edge computing, and organisations with limited computational resources. By showing how training can be made both faster and greener with minimal loss of performance, this work contributes to the growing effort to make machine learning more responsible and scalable.

Perspectives

I found this work particularly interesting because it challenges the assumption that “more data is always better.” The results show that when data is used can be just as important as how much data is used. I hope this paper encourages researchers and practitioners to think more critically about data efficiency, not only to improve performance but also to reduce the environmental impact of AI training. Making machine learning smarter rather than simply bigger is an exciting direction for future research.

Prof Tatiana Kalganova
Brunel University

Read the Original

This page is a summary of: Dynamic Data Inclusion with Sliding Window, January 2024, Springer Science + Business Media,
DOI: 10.1007/978-981-99-7886-1_44.
You can read the full text:

Read

Contributors

The following have contributed to this page