Optimizing the Training Diet: Data Mixture Search for Robust Time Series Forecasting

Federico Pennino; Maurizio Gabbrielli

doi:10.1145/3748522.3779754

What is it about?

In machine learning, the common assumption is that more data always leads to better models. However, real-world sensor data is often full of redundancy and noise, meaning that not all data points contribute equally to learning. Our research introduces a novel framework that treats data curation like a "training diet". Instead of changing the AI model, we fix the architecture and optimize the composition of the training data itself. By selecting only the most informative behavioral patterns, our approach significantly improves forecasting accuracy on industrial motor datasets while using less than half of the original data. This challenges the traditional data-maximization paradigm and opens new doors for highly efficient AI training.

Photo by Luke Chesser on Unsplash

Why is it important?

While data selection and optimization strategies have been extensively explored in fields such as computer vision and natural language processing, time-series data selection remains a critical research gap due to the continuous, highly complex nature of time-series data. Our work directly addresses this gap by showing that sheer data maximization is no longer the optimal path for time-series forecasting. By demonstrating that intelligent, data-centric selection can simultaneously boost prediction accuracy and reduce training data volume by 2.3x, this research provides a scalable blueprint for more efficient and high-performance AI implementations in real-world industrial settings.

Perspectives

I hope this article makes people think about the quality of the data they use to train AI, rather than just the quantity. In the machine learning community, we are often obsessed with collecting massive datasets, but our work shows that a carefully selected "data diet" can yield much better, more accurate forecasting models while cutting data volume in half.
Federico Pennino
Universita degli Studi di Bologna

This page is a summary of: Optimizing the Training Diet: Data Mixture Search for Robust Time Series Forecasting, March 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3748522.3779754.
You can read the full text:

Read

Contributors

The following have contributed to this page

Federico Pennino
Universita degli Studi di Bologna

Finding the optimal "data diet": How smart data selection trains better AI models

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Finding the optimal "data diet": How smart data selection trains better AI models

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management