ICLR Poster Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

Poster

Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

Yifan Yang · Gang Chen · Hui Ma · Cong Zhang · Zhiguang Cao · Mengjie Zhang

Hall 3 + Hall 2B #453

[ Abstract ] [ Project Page ]

Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Dynamic workflow scheduling (DWS) in cloud computing presents substantial challenges due to heterogeneous machine configurations, unpredictable workflow arrivals/patterns, and constantly evolving environments. However, existing research often assumes homogeneous setups and static conditions, limiting flexibility and adaptability in real-world scenarios. In this paper, we propose a novel Graph assisted Offline-Online Deep Reinforcement Learning (GOODRL) approach to building an effective and efficient scheduling agent for DWS. Our approach features three key innovations: (1) a task-specific graph representation and a Graph Attention Actor Network that enable the agent to dynamically assign focused tasks to heterogeneous machines while explicitly considering the future impact of each machine on these tasks; (2) a system-oriented graph representation and a Graph Attention Critic Network that facilitate efficient processing of new information and understanding its impact on the current state, crucial for managing unpredictable workflow arrivals/patterns in real-time; and (3) an offline-online method that utilizes imitation learning for effective offline training and applies gradient control and decoupled high-frequency critic training techniques during online learning to sustain the agent’s robust performance in rapidly changing environments. Experimental results demonstrate that GOODRL significantly outperforms several state-of-the-art algorithms, achieving substantially lower mean flowtime and high adaptability in various online and offline scenarios.

Live content is unavailable. Log in and register to view live content