Dimensionality Reduction to Dynamically Reduce Data

Dominic Sanderson; Ben Malin; Tatiana Kalganova; Richard Ott

doi:10.1109/paine56030.2022.10014786

What is it about?

Modern artificial intelligence systems are becoming increasingly complex, and this often means they require ever‑larger datasets to achieve small improvements in accuracy. Collecting, storing, and processing such large volumes of data is expensive, time‑consuming, and energy‑intensive. This research explores whether neural networks really need to use all available training data at every stage of learning. The study investigates data reduction techniques for image classification by first reducing the dimensionality of image datasets. Using a dimensionality‑reduction method, images are mapped into a lower‑dimensional space where their similarity can be measured more easily. This makes it possible to identify images that are very close to the centre of a class (more typical examples) and those that are further away (more distinctive examples). The research compares static data reduction, where the same reduced dataset is used throughout training, with dynamic data reduction, where a smaller dataset is used at early stages of training and the full dataset is introduced later. Experiments were conducted on moderately complex datasets, including printed circuit board (PCB) images and a general image‑classification benchmark. The results show that dynamic data reduction can significantly reduce training time while maintaining almost the same accuracy—and in some cases even slightly improving it. Overall, the work demonstrates that carefully selecting when and which data is used during training can make machine‑learning systems more efficient and environmentally sustainable, without compromising performance.

Photo by Sebastian Schuster on Unsplash

Why is it important?

This work is timely because the rapid growth of AI is increasingly linked to rising computational cost and carbon emissions. Rather than focusing only on building larger models, this research contributes to the growing shift toward data‑centric AI, where improving how data is used becomes just as important as improving algorithms. What makes this study distinctive is the combination of dimensionality reduction with dynamic data usage. Instead of randomly discarding data or pruning it based on model behaviour, the approach identifies less critical data before training begins and controls how much of it is used at different stages. The findings show that data far from a class’s centre often contains more useful information for classification, and that using such insights can reduce runtime while preserving accuracy. By demonstrating that smarter data selection can reduce energy consumption with minimal performance loss, this work supports more sustainable and responsible AI development, particularly for image‑based applications in engineering and manufacturing.

Perspectives

Working on this paper reinforced the idea that progress in AI does not always come from adding more data or larger models. I found it particularly interesting that reducing the amount of training data—when done carefully—can actually maintain or even improve performance. I hope this work encourages researchers and practitioners to think more critically about the data they use, not just how much of it they have. Small changes in training strategy can make a meaningful difference to efficiency and sustainability.
Prof Tatiana Kalganova
Brunel University

This page is a summary of: Dimensionality Reduction to Dynamically Reduce Data, October 2022, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/paine56030.2022.10014786.
You can read the full text:

Read

Contributors

The following have contributed to this page

Prof Tatiana Kalganova
Brunel University

Training AI faster by using only the most informative images

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Training AI faster by using only the most informative images

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management