What is it about?
Reinforcement Learning (RL) has emerged as a promising solution for task offloading due to its adaptability to dynamic environments and ability to reduce online computational overhead. Thereby, this paper explores RL for optimizing periodic Directed Acyclic Graph (DAG) task offloading in multi-user Mobile Edge Computing (MEC) systems, aiming to minimize overall costs, including user device energy consumption and server computational charges. A key contribution of this work is the explicit modeling of user competition for limited edge resources, where concurrent access leads to dynamic contention, significantly affecting offloading latency and energy usage. However, this optimization task faces two main challenges: the high dimensionality of task states and the large action space, both of which increase learning complexity. To address this, we propose a dynamic and distributed Proximal Policy Optimization (PPO)-based offloading framework. An encoder is employed to map DAG node features and structural information into a lower-dimensional representation, reducing computational overhead and improving learning efficiency. Additionally, we incorporate behavioral cloning to imitate greedy policies as the PPO agent's initial behavior, effectively narrowing the action space and accelerating convergence. By combining representation learning and imitation-based initialization, our method enables the PPO agent to quickly adapt to environmental dynamics, leveraging both prior knowledge and real-time feedback to make informed offloading decisions. Simulation results confirm that our approach achieves rapid convergence and outperforms existing baselines in cost reduction, demonstrating its effectiveness for periodic task offloading in MEC scenarios.
Featured Image
Photo by Growtika on Unsplash
Why is it important?
While existing studies have investigated generic task offloading challenges, the structural complexity of DAG-structured tasks introduces two distinctive challenges that have not been sufficiently addressed in multi-user MEC environments: 1) DAG-Driven State Space Explosion: Unlike conventional linear task models, the topological ordering and inter-task dependencies in DAG workflows exponentially expand the state representation dimension. Specifically, the state space must encode not only the conventional UE-device parameters but also precedence constraints between vertices and dynamic branch execution probabilities, resulting in a combinatorial state space complexity of $O(N^D)$ where $N$ denotes concurrent UEs and $D$ represents average DAG depth. 2) The Expansive Action Search Space: The parallel execution constraints imposed by DAG edges fundamentally alter the action space characteristics. Each offloading decision must simultaneously satisfy: a) parent node execution precedence, b) branch synchronization requirements, and c) heterogeneous resource contention across multi-user edge servers. This creates a cascading action space where local decisions at one node propagate constraints through the entire DAG graph. With an increasing number of users generating real-time computation tasks, the endeavor to minimize offloading expenditures for the collective user base leads to a substantial prolongation of the episode time required for the optimization of RL algorithms. This augmentation of the environmental action search space presents a formidable challenge for the training of RL algorithms.
Perspectives
Experimental results demonstrate the effectiveness of this algorithm in optimizing joint periodic task offloading for multiple UEs in MEC environments, highlighting improvements in task management efficiency.
Yan Wang
Guangzhou University
Read the Original
This page is a summary of: Cost-Optimized Periodic DAG-Structured Task Offloading in Multi-User MEC Systems Using Reinforcement Learning, ACM Transactions on Internet Technology, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3762993.
You can read the full text:
Contributors
The following have contributed to this page







