Bilevel Scheduling in Downstream Oil Supply Chain: Integrating Reinforcement Learning with Mathematical Programming

Published in Computers & Chemical Engineering, 2026

With the growth of global energy demand, optimizing the oil supply chain has become crucial. This paper proposes a hybrid reinforcement learning (RL) and mathematical programming (MP) scheduling approach to optimize downstream oil supply chain operations, including refinery production scheduling, logistics distribution, and inventory management. This approach decomposes the complex problem into multiple sub-problems using a Rolling-Horizon method (RH), enhancing computational efficiency and flexibility. We conduct a comparative analysis to evaluate two RL training algorithms with RH: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) denoted as SAC-RH and PPO-RH respectively. Experimental results from the simulation-based evaluation demonstrate that the SAC version excels in handling complex dynamic environments and continuous action space problems, significantly reducing the number of early warnings and improving overall optimization results. This study demonstrates the applicability of RL in industrial automation and identifies potential avenues for future research.

Recommended citation: Qipeng Yang, Wentian Fan, Nan Ma, Shu Lin, Jiawen Chang, Zhiqiang Zou, Liang Sun and Haifeng Zhang, "Bilevel Scheduling in Downstream Oil Supply Chain: Integrating Reinforcement Learning with Mathematical Programming," Computers & Chemical Engineering, Volume 204, January, 2026.
Download Paper | Download Bibtex