Production Planning with Markovian Production Relationships

Abstract

We study a production planning problem with linear, nonlinear, deterministic, and/or stochastic production relationships between the production plans and actual production quantities. We start by introducing a stochastic dynamic programing formulation of the problem with a Markovian assumption on the production relationships. Under specific conditions, we establish the convexity of the optimal cost-to-go function and closed forms of optimal policies. To solve the original problem in the general case, we propose a solution framework based on sequential policy optimization and deep reinforcement learning. We discuss the theoretical properties of the framework and evaluate its numerical performance with linear and nonlinear production relationships. In the linear case, our framework performs in line with the state-of-the-art optimization-based methods with improved computational efficiency. In the nonlinear case, our framework achieves an optimality gap near 10%–20%. We also illustrate that the proposed methodology can also be extended to the problem of joint production planning and scheduling.

Keywords

Production Planning Stochastic Dynamic Programming Deep Reinforcement Learning

Get full access to this article

View all access options for this article.

References

Armbruster

Uzsoy

(2014) Continuous dynamic models, clearing functions, and discrete-event simulation in aggregate production planning. Tutorials in Operations Research: New Directions in Informatics, Optimization, Logistics, and Production 18(1): 103–126.

Asmundsson

Rardin

Uzsoy

(2006) Tractable nonlinear production planning models for semiconductor wafer fabrication facilities. IEEE Transactions on Semiconductor Manufacturing 19(1): 95–111.

Balseiro

Brown

Chen

(2018) Static routing in stochastic scheduling: Performance guarantees and asymptotic optimality. Operations Research 66(6): 1641–1660.

Brown

Hanschke

Meents

, et al. (2010) Queueing model improves IBM’s semiconductor capacity and lead-time management. Interfaces 40(5): 397–407.

Caldentey

Wein

(2003) Analysis of a decentralized production-inventory system. Manufacturing & Service Operations Management 5(1): 1–17.

Chen

(2010) Integrated production and outbound distribution scheduling: Review and extensions. Operations Research 58(1): 130–148.

Clark

Scarf

(1960) Optimal policies for a multi-echelon inventory problem. Management Science 6(4): 475–490.

Gershkov

Moldovanu

Strack

(2018) Revenue-maximizing mechanisms with strategic customers and unknown, markovian demand. Management Science 64(5): 2031–2046.

Graves

(1986) A tactical planning model for a job shop. Operations Research 34(4): 522–533.

10.

Graves

Schoenmeyr

(2016) Strategic safety-stock placement in supply chains with capacity constraints. Manufacturing & Service Operations Management 18(3): 445–460.

11.

Graves

Willems

(2008) Strategic inventory placement in supply chains: Nonstationary demand. Manufacturing & Service Operations Management 10(2): 278–287.

12.

Hewitt

Pantuso

(2025) Production planning under demand and endogenous supply uncertainty. INFORMS Journal on Computing 37(4): 831–855.

13.

Ivanov

D’yakonov

(2019) Modern deep reinforcement learning algorithms https://arxiv.org/abs/1906.10025.

14.

Karmarkar

(1989) Capacity loading and release planning with work-in-progress (WIP) and leadtimes. Journal of Manufacturing and Operations Management 2(37): 105–123.

15.

Kotzab

Seuring

Müller

, et al. (2005) Research Methodologies in Supply Chain Management. Berlin: Springer.

16.

Lan

(2022) Complexity of stochastic dual dynamic programming. Mathmatical Programming 191(2): 717–754.

17.

Liu

Peng

, et al. (2025) Multi-agent deep reinforcement learning for multi-echelon inventory management. Production and Operations Management 34(7): 1836–1856.

18.

Shi

Chen

, et al. (2025) Overcoming the curse of dimensionality in reinforcement learning through approximate factorization. Proceedings of the 42nd International Conference on Machine Learning 267: 40614–40664.

19.

McKinsey and Company (2022) What are industry 4.0, the fourth industrial revolution, and 4IR? https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-are-industry-4-0-the-fourth-industrial-revolution-and-4ir#/.

20.

Mehrotra

Dawande

Gavirneni

, et al. (2011) OR practice—production planning with patterns: A problem from processed food manufacturing. Operations Research 59(2): 267–282.

21.

Mickein

Koch

Haase

(2022) A decision support system for brewery production planning at feldschlösschen. INFORMS Journal on Applied Analytics 52(2): 158–172.

22.

Missbauer

(2002) Aggregate order release planning for time-varying demand. International Journal of Production Research 40(3): 699–718.

23.

Mnih

Badia

Mirza

, et al. (2016) Asynchronous methods for deep reinforcement learning. Proceedings of The 33rd International Conference on Machine Learning 48: 1928–1937.

24.

Nieuwenhuyse

Boeck

Lambrecht

, et al. (2011) Advanced resource planning as a decision support module for ERP. Computers in Industry 62(1): 1–8.

25.

Papadopoulos

Heavey

Browne

(1993) Queueing Theory in Manufacturing Systems Analysis and Design. Berlin: Springer Science & Business Media.

26.

Parker

Kapuscinski

(2004) Optimal policies for a capacitated two-echelon inventory system. Operations Research 52(5): 739–755.

27.

Pochet

Wolsey

(2006) Production Planning by Mixed Integer Programming Vol 149. Berlin: Springer.

28.

Quezada

Gicquel

Kedad-Sidhoum

, et al. (2020) A multi-stage stochastic integer programming approach for a multi-echelon lot-sizing problem with returns and lost sales. Computers & Operations Research 116: 104–865.

29.

Rajendran

Holthaus

(1999) A comparative study of dispatching rules in dynamic flowshops and jobshops. European Journal of Operations Research 116: 156–170.

30.

Rustogi

Strusevich

(2013) Parallel machine scheduling: Impact of adding extra machines. Operations Research 61(5): 1243–1257.

31.

Schulman

Wolski

Dhariwal

, et al. (2017) Proximal policy optimization algorithms http://arxiv.org/abs/1707.06347.

32.

Silver

Huang

Maddison

, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587): 484–489.

33.

Srinivasan

Carey

Morton

(1988) Resource pricing and aggregate scheduling in manufacturing systems. Technical report, Carnegie Mellon University, Pittsburgh, Pennsylvania.

34.

Thevenin

Adulyasak

Cordeau

(2021) Material requirements planning under demand uncertainty using stochastic optimization. Production and Operations Management 30(2): 475–493.

35.

Thevenin

Adulyasak

Cordeau

(2022) Stochastic dual dynamic programming for multiechelon lot sizing with component substitution. INFORMS Journal on Computing 34(6): 3151–3169.

36.

uit het Broek

MAJ

Teunter

de Jonge

, et al. (2020) Condition-based production planning: Adjusting production rates to balance output and failure risk. Manufacturing & Service Operations Management 22(4): 792–811.

37.

Wang

Yao

(2023) Production planning with risk hedging under a conditional value at risk objective. Operations Research 71(4): 1055–1072.

38.

Willems

(2008) Data set–real-world multiechelon supply chains used for inventory optimization. Manufacturing & Service Operations Management 10(1): 19–23.

39.

Zhao

Zhang

(2020) Multiechelon lot sizing: New complexities and inequalities. Operations Research 68(2): 534–551.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

2.77 MB

0.00 MB