What is it about?
We consider a geographically distributed ML scenario, where local streaming data are processed at edge computing infrastructure, and local updates are synchronized through a central parameter server. Here, major challenges to ML accuracy are: 1) Low diversity of data feeding each model (i.e. insularity) 2) Intermittent network connectivity 3) Frequent changes in input data (i.e. non-stationarity).
Featured Image
Photo by Christopher Burns on Unsplash
Why is it important?
We propose an efficient, reinforcement learning based algorithm, SCEDA, which optimizes the schedule and content of the model updates from parameter server to edge servers in order to avoid stale ML models. It makes online scheduling decisions by learning individual network connectivity trends of edge servers as well as the significance of their updates. To the best of our knowledge, SCEDA is the first staleness control mechanism, where the synchronization period is not defined by static thresholds but learned from data and adapted to the environmental changes over time. The impact of this work goes far beyond our initial use case scenarios of electric vehicles or virtual reality and it is possibly applicable to many stateful learning tasks on distributed and streaming big data, in general.
Perspectives
Writing this article was a great pleasure as it has co-authors with whom I have initialized strong collaborations on an emerging topic.
Atakan Aral
Technische Universitat Wien
Read the Original
This page is a summary of: Staleness Control for Edge Data Analytics, Proceedings of the ACM on Measurement and Analysis of Computing Systems, June 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3392156.
You can read the full text:
Resources
Video Presentation
This is a video presentation of this work for the ACM Sigmetrics conference by Atakan Aral.
Extended Abstract
This is the freely accessible two-page extended abstract of this work published in the proceedings of the ACM Sigmetrics conference.
Full-Text PDF
This is the full-text PDF of this article published with the open-access option.
RUCON Project Website
This work has been partially funded through the Rucon project (Runtime Control in Multi Clouds), FWF Y 904 START-Programm 2015.
Contributors
The following have contributed to this page







