What is it about?
Modern industrial systems rely on time‑series data from sensors to detect faults and abnormal behaviour. In practice, this data often comes from multiple sources that are similar but not identical—for example, measurements collected from different machines, sites, or experiments. Traditional neural‑network approaches usually train separate models for each dataset, which increases training time, energy use, and maintenance costs. This research introduces a new method called Dataset Fusion, which combines multiple homogeneous and periodic time‑series datasets into a single training dataset. The key idea is to carefully align, normalise, and interleave signals from different sources so that a single neural network can learn the overall behaviour of the system, rather than memorising the quirks of one dataset. The method was tested on real industrial data from electric motors, using current signals from two different fault datasets. The fused dataset preserved important features from each source and enabled a single anomaly‑detection model to generalise effectively across all datasets. Remarkably, the model maintained strong performance even when trained on a small fraction of the data. Using only 6.25% of the training data, the approach reduced computational cost by 93.7%, while performance dropped by just 4.04%. By showing that multiple datasets can be fused into one efficient training resource, this work demonstrates a practical way to build robust, data‑efficient anomaly‑detection systems for real‑world applications.
Featured Image
Photo by Google DeepMind on Unsplash
Why is it important?
This work is important because it directly addresses two growing challenges in AI: generalisation and sustainability. Many models perform well only on the dataset they were trained on, failing when applied to new but similar data. At the same time, training large models on vast datasets consumes increasing amounts of energy. The Dataset Fusion algorithm offers a novel alternative to conventional training and transfer‑learning approaches. Instead of training multiple models or relying on sequential retraining, it enables a single model to learn from multiple data sources at once, improving robustness to data‑distribution shifts. The results show that more data is not always better—better‑structured data can deliver comparable performance at a fraction of the cost. This makes the approach especially relevant for Green AI, industrial monitoring, and safety‑critical systems, where computational efficiency, reliability, and adaptability are essential. The method also aligns with broader sustainability goals by reducing energy consumption during model training.
Perspectives
I found this work particularly rewarding because it bridges academic research and real industrial needs. In many practical settings, engineers must work with imperfect, limited, or imbalanced datasets, and retraining models for every new data source is often unrealistic. I hope this research encourages wider adoption of data‑centric approaches that focus not only on model architecture, but also on how training data is constructed. Making AI models more general, efficient, and sustainable is critical for their long‑term impact in industry.
Prof Tatiana Kalganova
Brunel University
Read the Original
This page is a summary of: A Dataset Fusion Algorithm for Generalised Anomaly Detection in Homogeneous Periodic Time Series Datasets, IEEE Access, January 2023, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/access.2023.3326725.
You can read the full text:
Contributors
The following have contributed to this page







