What is it about?
Modern artificial intelligence systems are becoming increasingly complex, and this often means they require ever‑larger datasets to achieve small improvements in accuracy. Collecting, storing, and processing such large volumes of data is expensive, time‑consuming, and energy‑intensive. This research explores whether neural networks really need to use all available training data at every stage of learning. The study investigates data reduction techniques for image classification by first reducing the dimensionality of image datasets. Using a dimensionality‑reduction method, images are mapped into a lower‑dimensional space where their similarity can be measured more easily. This makes it possible to identify images that are very close to the centre of a class (more typical examples) and those that are further away (more distinctive examples). The research compares static data reduction, where the same reduced dataset is used throughout training, with dynamic data reduction, where a smaller dataset is used at early stages of training and the full dataset is introduced later. Experiments were conducted on moderately complex datasets, including printed circuit board (PCB) images and a general image‑classification benchmark. The results show that dynamic data reduction can significantly reduce training time while maintaining almost the same accuracy—and in some cases even slightly improving it. Overall, the work demonstrates that carefully selecting when and which data is used during training can make machine‑learning systems more efficient and environmentally sustainable, without compromising performance.
Featured Image
Photo by Sebastian Schuster on Unsplash
Why is it important?
This work is timely because the rapid growth of AI is increasingly linked to rising computational cost and carbon emissions. Rather than focusing only on building larger models, this research contributes to the growing shift toward data‑centric AI, where improving how data is used becomes just as important as improving algorithms. What makes this study distinctive is the combination of dimensionality reduction with dynamic data usage. Instead of randomly discarding data or pruning it based on model behaviour, the approach identifies less critical data before training begins and controls how much of it is used at different stages. The findings show that data far from a class’s centre often contains more useful information for classification, and that using such insights can reduce runtime while preserving accuracy. By demonstrating that smarter data selection can reduce energy consumption with minimal performance loss, this work supports more sustainable and responsible AI development, particularly for image‑based applications in engineering and manufacturing.
Perspectives
Working on this paper reinforced the idea that progress in AI does not always come from adding more data or larger models. I found it particularly interesting that reducing the amount of training data—when done carefully—can actually maintain or even improve performance. I hope this work encourages researchers and practitioners to think more critically about the data they use, not just how much of it they have. Small changes in training strategy can make a meaningful difference to efficiency and sustainability.
Prof Tatiana Kalganova
Brunel University
Read the Original
This page is a summary of: Dimensionality Reduction to Dynamically Reduce Data, October 2022, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/paine56030.2022.10014786.
You can read the full text:
Contributors
The following have contributed to this page







