Abstract
This study aims to solve the multi-product Capacitated Lot-Sizing problem (CLSP) under the stochastic demand condition, with the main objective of minimizing the total cost, containing the production, inventory, setup, and shortage costs. To this end, this study proposes an optimization algorithm based on deep reinforcement learning (DRL), which incorporates a proximal policy optimization (PPO) algorithm based on a continuous action space. The considered problem is first described in detail and then transformed into the corresponding reinforcement learning model using the Markov decision process (MDP). In addition, a new state space and a continuous-discrete conversion process are designed to handle demand uncertainty and applied to the PPO algorithm. The proposed algorithm is verified by experiments. The experimental results demonstrate that the improved PPO algorithm can minimize the total cost and is applicable to instances under various conditions. The proposed algorithm is also compared with Aggregate modified base-stock (AMBS) heuristic algorithms. The comparison results prove the effectiveness of the proposed algorithms in solving instances of different scales.
Keywords
Get full access to this article
View all access options for this article.
