Sage Journals: Discover world-class research

Abstract

In modern complicated and changing manufacturing environments, unforeseen dynamic events such as machine breakdown or unexpected job arrival make required production resources unpredictable. The scheduling scheme is desired to maintain high stability in dynamic manufacturing environments. To cope with the classic disturbance of machine breakdown, a robust pro-active scheduling scheme is proposed by inserting the repair time into a disjunctive graph for reinforcement learning (IRDRL) in this paper. Firstly, a new mathematical model is developed to predict the machine fault which is assumed to be determined by service time and bearing load. Secondly, a disjunctive graph with breakdown information is designed to express the dynamic scheduling status. Then, an online scheduling framework is built based on the well-trained model through the proximal policy optimization (PPO) algorithm. Finally, compared with the classical methods such as the right-shift strategy and static model of reinforcement learning (RL), the proposed robust pro-active scheduling scheme is verified with high robustness, stability, and short running time.

Keywords

Dynamic manufacturing environment machine breakdown reinforcement learning online scheduling framework robust pro-active scheduling

Introduction

In the modern complex and variable manufacturing industry, scheduling plays an essential role in improving efficiency and competitiveness. The job shop scheduling problem (JSSP) has been extensively studied in academia and industry.^1–4 Traditionally, it is assumed that all resources in production are deterministic in JSSP. However, various random disturbances will inevitably occur in actual manufacturing,^5,6 resulting in reduced productivity and delayed deliveries, etc.

Therefore, as one of the most extensively studied scheduling schemes for dynamical job shop scheduling problems (DJSP),^7–9 the robust pro-active scheduling has been researched by many scholars to minimize the impact of disturbances on scheduling scheme stability, which is designed to consider disturbances in advance, and absorb the influence of disturbances during the execution of the scheduling.^10,11 To the best knowledge of the authors, most researchers used meta-heuristic algorithms and heuristic rules to solve robust pro-active scheduling schemes. The hybrid local-search algorithm, which is combined with the tabu technique and the simulated-annealing¹² or the genetic algorithm,¹³ has been used to address the robust pro-active scheduling in uncertain processing times scenarios. In Xiong et al.,¹⁴ the robust scheduling for dual objective optimization with random machine breakdown was tackled by knowledge-based heuristic searching architecture.¹⁴ The genetic algorithm was used to minimize the dispatching risk.¹⁵ With unstable resource availability and time-varying, the variable neighborhood-search-based local-search heuristic was proposed to identify the priority rule for dynamical scheduling.¹⁶

However, on the one hand, relying too much on the problem, the meta-heuristic algorithms or heuristic rules are only effective in specific cases,^17–19 and cannot adjust adaptively when there is a deviation between disturbance prediction and reality. In addition, even minor changes require a recalculation since the schemes have weak generalization ability. On the other hand, the single-objective scheduling is usually insufficient for real-world manufacturing that often requires the simultaneous performance of dual or more objectives, which has been studied in academia.^20–22

Given the above issues, a robust pro-active for dual-objective scheduling scheme is proposed, where the repair time for unexpected machine breakdown is inserted into a disjunctive graph for reinforcement learning (IRDRL) to enhance the generalization adaptability of the scheduling scheme. The primary work of this paper is as follows. (1) A new practical mathematical model for dual-objective is established, considering the reality of machine wear and breakdown. (2) An adaptive robust pro-active scheduling is proposed by inserting the repair time into a disjunctive graph for reinforcement learning. (3) An online scheduling framework is developed to form the mapping relationship between the manufacturing environment and the optimal scheduling model. (4) Comparisons between IRDRL and other methods are carried out to prove the effectiveness and improvement of the proposed approach.

The remainder of this paper is arranged as follows. A brief review of dynamical scheduling based on RL and mathematical models is given in Section “Literature review.” Section “Problem description and modeling” presents the updated mathematical model of dynamical scheduling with machine breakdown. Section “The online scheduling framework and IRDRL approach for DJSP with machine breakdown” offers the online scheduling framework and the design details of the proposed IRDRL approach. Section “Experiments analysis” shows the results of numerical experiments, which can prove the effectiveness and advancement of the proposed IRDRL approach. Finally, the conclusions and suggested future work are drawn in last section.

Literature review

RL-based dynamical scheduling

In the actual manufacturing process, random job arrivals, machine breakdown, and variable processing time are typical disturbances. Many researchers have applied RL algorithms, such as Q-learning, deep-Q-network (DQN), proximal policy optimization (PPO), and other variants, to the DJSP considering various disturbances. These researches clarified new directions for efficient decision-making under uncertain resource environments, as summarized in Table 1. The general principle of RL-based scheduling is to transform the manufacturing environment into Markov Decision Processes (MDP),²³ extracting the three key elements: state, action, and reward. To present the state of the manufacturing environment, most of the artificial design feature indexes are adopted,^24–31 which rely heavily on prior expert knowledge and experience. The state of the manufacturing environment can be inaccurately and incompletely expressed due to artificial factors. Action space is mainly designed as priority rules^32,33 or parameter optimization,³⁴ which don’t meet the desired execution efficiency in action exploration. In addition, some work even generalizes the model developed in the static environment to an uncertain resource environment,³⁵ lacking the learning process in the dynamic environment.

Table 1.

DJSP with various disturbances based on RL.

Reference	Disturbances	Algorithm	State representations	Action space	Optimization
Zhou et al.²⁴	Urgent orders and machine failures	DQN	Artificial features indicators	Machine-id	Makespan
Park et al.²⁵	Stochastic processing time	Q-network	Artificial features indicators	Machine-id	Makespan
Zhao et al.²⁶	New job arrival machine breakdown	DQN + HGA	State characteristic index	Machine	Makespan
Luo et al.²⁷	Random job arrivals	DLDQN	Artificial features indicators	Operation	Makespan
Aydin andÖztemel²⁸	Random job arrivals	Q-III	Artificial features indicators	Dispatching rule	Mean tardiness
Zhou et al.²⁹	Random job arrivals	Deep Q-learning	Artificial features indicators	Machine-id	Makespan
Kardos et al.³⁰	Random job arrivals	Q-learning	Artificial features indicators	Machine-id	Early delivery
Yang and Yan³¹	Uncertain material supply	B-Q-learning	Artificial features indicators	Dispatching rule	Mean tardiness
Zeng et al.³²	Machine breakdown	D3QPN	Disjunctive graph	Dispatching rule	Makespan
Shiue et al.³³	Uncertain resource allocation	Q-learning	Artificial features indicators	Dispatching rule	Maximum throughput
Shahrabi et al.³⁴	Random job arrivals and machine breakdowns	Q + VNS	Artificial features indicators	Parameter optimization	Makespan
Wang et al.³⁵	Machine breakdown and job rework	PPO	Artificial features indicators	Operations	Machine utilization
Zhang et al.³⁶	Stochastic job insertions	PPO	Operation attributes	Machine allocation	Tardiness, executiontime(s)

Through the above literature research, it is easy to be inspired on how to solve the dynamical scheduling using RL. However, there are two remaining issues: (1) how to represent the job shop state comprehensively and extract dynamic disturbance features objectively; (2) how to design the action space to improve the action execution. Considering these two issues, a robust pro-active scheduling scheme based on RL with an objective expression of dynamic disturbance is proposed in this paper.

Commonly adopted mathematical model of machine breakdown

As one of the most classic disturbances, machine breakdown has attracted significant attention.^37–39 The machine failure probability and repair time are assumed to be same for all machines^40,41 in the common modeling of machine breakdown; alternatively, the machine breaks down in order of its load.⁴² In fact, the machine failure is related to the machine service, load. etc., and thus it varies from machine to machine. Moreover, it is not necessarily that the machine breaks down only under enormous load, and a minor load can also lead to failure. Considering the above issues, a new practical mathematical model is established in this paper.

Problem description and modeling

Deterministic scheduling

The JSSP has been regarded as a sequential decision-making problem with limited resources. To improve production efficiency, makespan $C_{b, k}$ is one of the most common objectives of scheduling optimization and the basis of other optimization objectives. Notations in the problem description and modeling are listed in Table 2.

Table 2.

Notations in the problem description.

Category	Notation	Definition
Deterministic scheduling-related parameters	n	The number of the workpiece
	J_b	The bth workpiece
	m	The number of the machines
	M_k	The kth machine
	O_b,k	The kth operation of job J_b
	p_b,k	The processing time of the job J_b on the machine M_k
	A_b,k	The start processing time of job J_b on the machine M_k
	C_b,k	The completion time of job J_b on the machine M_k
Machine breakdown-related parameters	ρ_k	The machine failure probability for M_k
	T_busy^k	The busy time of machine M_k
	W_tot	The total workload of all machines
Breakdown scenarios and optimization-related parameters	BT_k	The time when machine breakdown occurs
	RT_k	The repair time for M_k
	MS	The makespan of a schedule
	α	The parameter delegates when failure occurs
	β	The coefficients denote the breakdown level
	M(s)	The makespan of the actual schedule suffering machine breakdowns
	M ₀(s)	The makespan of the static schedule with a deterministic machine

To facilitate the modeling, some assumptions are made as follows: (1) the sequential relationship and processing time of different operations in the same workpiece are known in advance, (2) each machine can process at most one operation at a time, (3) each operation can only be processed on one machine at a time, (4) any operation should be processed continuously without interruption until completion, (5) there is no sequential constraint between processes of different workpieces.

Formulation for machine breakdown

Given that machine failure is related to its load and service time, the new mathematics model is proposed as follows: $ρ_{k}$ is approximated by the empirical as equation (1). $M_{k}$ is the breakdown machine. Machines of all probability $ρ_{k}$ have the possibility of failure. Different from other literature, it is related to $ρ_{k}$ , but not in a rigid order of $ρ_{k}$ . $M_{k}$ follows the distribution as equation (2), which is sampled according to the weight of $ρ_{k}$ . The machine with a larger $ρ_{k}$ is more likely to break down. But it does not mean absolute failure, and $M_{k}$ could also be the machine with a smaller $ρ_{k}$ .

ρ_{k} = \frac{T_{busy}^{k}}{W_{tot}}

(1)

M_{k} = [Categorical (ρ_{k})]

(2)

$C_{b, k}$ is directly or indirectly affected by machine breakdown, as shown in Figure 1, and its formula changes as follows in equations (3) and (4).

C_{b, k} = A_{b, k} + p_{b, k}

(3)

A_{b, k} = {\begin{matrix} \max {C_{b - 1, k}, C_{b, k - 1}}, machine M_{k} is n^{'} t breakdown \\ \max {C_{b - 1, k} + R T_{k,}, C_{b, k - 1}}, machine M_{k} is breakdown \end{matrix}

(4)

Figure 1.

Change in $C_{b, k}$ before and after machine breakdown: (a) deterministic scheduling and (b) an example suffered from machine breakdowns. $M_{f}$ is M3, $B T_{f}$ is 2, $R T_{f}$ is 4.

For a more accurate description of machine disturbances scenarios, the time when machine breakdown occurs and the repair time of the failed machines are defined as the uniform distribution in equations (5) and (6). The problem can be divided into four scenarios as presented in Table 3 as in literature.⁴³

B T_{k} = [α_{1} MS, α_{2} MS]

(5)

R T_{k} = [β_{1} T_{busy}^{k}, β_{2} T_{busy}^{k}]

(6)

Table 3.

Machine breakdown scenarios.

Breakdown type	Break level and break off time	$β_{1}$	$β_{2}$	$α_{1}$	$α_{2}$
MB1	Low, early	0.1	0.15	0	0.5
MB2	Low, late	0.1	0.15	0.5	1
MB3	High, early	0.35	0.4	0	0.5
MB4	High, late	0.35	0.4	0.5	1

Dual-objective optimization for DJSP

The dual-objective optimization problem, including the quality and robustness of the scheduling scheme, is considered synthetically in this paper.

Makespan

As one of the most critical scheduling quality objectives, makespan will be inevitably affected by machine breakdown. It indicates the earliest completion time for all the operations, as shown in equation (7).

M (s) = \min \max {C_{b, k}}

(7)

Robustness measure

The robustness of the scheduling scheme is defined as the relative stability between the static makespan and the actual makespan under machine breakdowns. The formula is shown in equation (8), suggesting that the higher the value, the stronger robustness.

δ (s) = 1 - \frac{M (s) - M_{0} (s)}{M_{0} (s)}

(8)

The online scheduling framework and IRDRL approach for DJSP with machine breakdown

This section proposes an online scheduling framework to avoid time-wasting of recalculation and designs the IRDRL approach of robust pro-active scheduling. As shown in Figure 2, the framework includes four blocks: Env-Block, Training-Block, Model-Block, and Implementation-Block. The operational process is as follows: Env-Block collects production data through the sensor in the job shop, and extracts the current state as the input of Training-Block. Training-Block trains the scheduling model offline by the PPO algorithm. The trained models are verified and saved in the Model-Block. In the Implementation-Block, the suitable model can be invoked in real-time corresponding to the production state. The details of the scheduling framework, including IRDRL approach and scheduling model invocation, are presented as follows.

Figure 2.

Online scheduling framework for the dynamic environment.

The IRDRL approach scheduling for DJSP with machine breakdown

This part mainly introduces the design detail of the IRDRL approach In the Env-Block, a method for status perception and feature extraction with machine breakdown is developed based on graph neural networks (GNN). In the Training-Block, the production process is translated into MDP, and the PPO algorithm is introduced for training the networks. Finally, the well-trained models are validated and saved in the Model-Block.

Perception and feature extraction for machine breakdown in the Env-Block

The production data is collected by the sensor equipment and transmitted into Env-Block to prepare for expressing the dynamical disturbance features of job shop. Different from the artificial feature index designing, the GNN is designed to optimize the transformation of the attributes in the disjunctive graph to extract dynamical disturbance features comprehensively and objectively.

Disjunctive graph for workshop scheduling

Production scheduling could be represented as a disjunctive graph. The longest distance from the starting node to the end node in the disjunctive graph is the makespan, which is determined by the direction of the disjunction arcs. As shown in Figure 3, the longest distance of strategy A is shorter than B, which means strategy A is superior to B.

Figure 3.

Disjunctive graph for different scheduling strategies: (a) strategy A and (b) strategy B.

Disjunctive graph representation for machine breakdown

The disjunctive graph can express the production scheduling process comprehensively by embedding resource information into nodes.

As described in Section “Formulation for machine breakdown,” machine breakdown affects the completion time $C_{b, k}$ and the starting time $A_{b, k}$ of operation indirectly or directly. $A_{b, k}$ is denoted by inserting repair time. As shown in Figure 4, the above two parameters together with processing time $p_{b, k}$ could be embedded into the feature vector of the node to indicate comprehensive machine breakdown status.

Figure 4.

Representation of machine breakdown.

Feature extraction of the disjunctive graph by GNN

Neighborhood aggregation of the disjunctive graph is defined by equation (9), whose purpose is an aggregate representation of neighborhood node information.⁴⁴

{\begin{matrix} a_{v}^{(l)} = AGGREGAT E^{(l)} ({h_{u}^{(l - 1)} : u \in N (v)}) \\ h_{v}^{(l)} = COMBIN E^{(l)} (h_{v}^{(l - 1)}, a_{v}^{(l)}) \end{matrix}

(9)

Where, $h_{u}^{(l - 1)}$ is the neighborhoods of nodes at the previous level, $a_{v}^{(l)}$ denotes the neighborhood aggregation, $h_{v}^{(l)}$ represents the central node of the current layer.

Combined with the multi-layer perceptron, the iterative update formula for the nodes in the disjunctive graph can be defined as equation (10) as follows⁴⁴:

h_{v}^{(l)} = M L P^{(l)} ((1 + ϵ^{(l)}) . h_{v}^{(l - 1)} + \sum_{u \in N (v)} h_{u}^{(l - 1)})

(10)

Where, $ϵ$ is a learnable parameter, $h_{v}^{(l - 1)}$ is the central node of the previous layer, and $h_{u}^{(l - 1)}$ is the neighborhood of nodes at the previous level. MLP is a multi-layer perceptron for the l layer, which generates the central node after iteration l following normalization in the PPO algorithm.

Whole-map feature extraction of the disjunctive graph, which represents the current status of the manufacturing environment, is read out by the following formula as equation (11).

h_{G} = READOUT ({h_{v}^{(l)} | v \in G})

(11)

Where, $h_{G}$ denotes the whole-map feature of the current manufacturing environment.

Markov process modeling and PPO algorithm for training in Training-Block

The agent continuously interacts with the environment to acquire a maximum cumulative reward in the process of RL. Through trial-error and experience accumulation, the reward can guide the agent to explore the best action strategy, as shown in Figure 5. The MDP model has several key elements as follows:

Figure 5.

Interaction between the agent and the job shop.

State of the dynamical job shop

In Section “Perception and feature extraction for machine breakdown in the Env-Block,” the scheduling is expressed by a disjunctive graph, and the scheduling resource information is embedded into the feature vector of the node. Each node includes: (1) the processing time $p_{b, k}$ of the operation $O_{b, k}$ ; (2) the start processing time $A_{b, k}$ of the operation $O_{b, k}$ ; (3) the completion time $C_{b, k}$ of the operation $O_{b, k}$ . The status of the manufacturing environment, which changes as the schedule progresses, is extracted by the space-based GNN.

Action space modeling

As the bridge of interaction between the agent and environment, the action is taken by the agent under the current status. In this paper, the operations of the workpieces are designed as a more direct and effective action space. However, it will lead the action space to increase sharply with the rise of scheduling size, resulting in a significant burden of exploration.

To solve this problem, the mark of the completed operation is added to the feature vector. The action space is reduced to the scope of the next operation, which is equal to the number of unfinished jobs currently, as $A = {a_{i}, i \in N}$ .

Reward designing

The instant reward is the feedback that the environment gives to the agent for its action choice. The reward guides the agent toward the direction of the optimal strategy, and the greater the cumulative reward, the more correct the exploration. In this paper, the reward function is defined as equation (12), where $r (a_{t}, s_{t})$ is positively related to the magnitude of the effect after taking action $a_{t}$ . The maximum cumulative reward is exactly our optimization objective.

r (a_{t}, s_{t}) = C (s_{t}) - C (s_{t + 1})

(12)

Where, $C (s_{t})$ and $C (s_{t + 1})$ indicate the estimated completion time of production in state $s_{t}$ and the next state $s_{t + 1}$ respectively.

PPO algorithm for training the network

The PPO is a DRL algorithm based on the actor-critic framework, where the actor network is used for action selection, and the critic network is used to evaluate the decisions made by the actor. The interaction between the actor and critic network can guide the agent to explore in a more promising direction. Therefore, this algorithm is more hopeful to find the optimal JSSP scheme in a large number of shop scheduling solution spaces. To keep network updates stable, it is necessary to limit the scale of each update, as the clip operation adopted by PPO₂. As shown in equations (13) and (14), the updated scale is restricted to $1 - ϵ$ and $1 + ϵ$ , where $ϵ$ is a hyper parameter, $A (s_{t}, a_{t})$ is advantage function.⁴⁵

J_{P P O_{2}}^{θ^{'}} (θ) = \sum_{(s_{t}, a_{t})} m i n [\frac{p_{θ} (a_{t} | s_{t})}{p_{θ^{'}} (a_{t} | s_{t})} A^{θ^{'}} (s_{t}, a_{t}), c l i p (\frac{p_{θ} (a_{t} | s_{t})}{p_{θ^{'}} (a_{t} | s_{t})}, 1 - ϵ, 1 + ϵ) . A^{θ^{'}} (s_{t}, a_{t})]

(13)

A (s_{t}, a_{t}) = \sum_{t^{'} > t} γ^{t^{'} - t} r_{t^{'}} - V (s_{t})

(14)

The networks perform excellent convergence performance through training, and represent the mapping relationship between the state ( $S_{t}$ ) and the workpiece operations ( $O_{t}$ ), as equation (15).

O_{t} = f (S_{t})

(15)

Model validation and preservation in the Model-Block

The role of the Model-Block is to validate the trained model and save the model with the best performance, constructing a matching relationship between the scheduling environment and the scheduling model.

Scheduling model invocation

The second part of the online framework focuses on the Implementation-Block; the matched model can be invoked and implemented instantly to avoid prolonged downtime waits. A mapping between the manufacturing environment ( $M E_{t}$ ) and the optimal scheduling model ( $S M_{t}$ ) is formed as equation (16).

S M_{t} = f (M E_{t})

(16)

The online scheduling framework provides a basis for real-time scheduling for complex and changing manufacturing scenarios.

Experiments analysis

The proposed IRDRL approach is validated with various conditions and parameters. This section successively provides: the parameter settings, the benchmark problem, and the experimental comparison between the proposed method and others.

Parameters setting

The training process follows the IRDRL approach described in Section “The online scheduling framework and IRDRL approach for DJSP with machine breakdown,” and the processing time of operations in various scales is randomly generated in the range of 1–99. Experimentally, the convergence is achieved when the number of training trajectories is 10,000. The proposed IRDRL approach is coded and implemented in python 3.6 on a PC with Intel Core i7-6700 @ 4.0 GHz CPU, GEFORCE RTX 2080Ti GPU, and 8 GB RAM. All the parameters of the training process are shown in Table 4, which are elaborately set by preliminary experiments. The notations “randi” and “randf” denote the uniform distribution of integers and real numbers, respectively.

Table 4.

Parameters setting of the training process.

Parameter	Value
Processing time	randi[1, 99]
Probability of machine breakdown	[0, 1]
Machine breakdown level $β_{1}$	randf[0.1, 0.15]
Machine breakdown level $β_{2}$	randf[0.35, 0.4]
Machine breakdown time $α_{1}$	randf[0, 0.5]
Machine breakdown time $α_{2}$	randf[0.5, 1]
Learning rate lr	2e−5
Decay factor of learning rate	0.9
Clipping parameter $ϵ$	0.2
Discount factor γ	1
GAE parameter λ	0.98
Optimizer	Adam

Benchmark problem

There is no standard benchmark problem for training and testing with machine breakdown. The training data is generated randomly, and endowed with dynamic properties by introducing random disturbance. The testing data is divided into two classes. One is randomly generated from the literature⁴⁶ as Ex1 consisting of 6 * 6, Ex2 consisting of 10 * 10, Ex3 consisting of 15 * 15, Ex4 consisting of 20 * 20, and Ex5 consisting of 30 * 20. The other is based on the standard benchmark problem, such as ABZ,⁴⁷ FT,⁴⁸ TA,⁴⁹ YN,⁵⁰ DMU,⁵¹and LA.⁵² To simulate a dynamic environment with machine breakdown, a series of random factors are embedded into the standard benchmark, including the probability of breakdown $ρ_{k}$ , machine breakdown time $B T_{f}$ , and repair time $R T_{f}$ .

Comparisons experiments

Comparisons with right-shift strategy

Random machine breakdown is inevitable in the actual manufacturing environment. The right-shift strategy has been proposed to deal with machine fault disturbance in job shop scheduling.⁵³ When the machine fault occurs, the interrupted work presses the pause button and restarts until the faulty machine is repaired and the machine is recovered. The interrupted operation and all other remaining operations are right-shifted by the amount of the repair time.

Under all four machine breakdown scenarios, the proposed IRDRL scheduling is compared with the right-shift strategy in terms of makespan and robustness. As shown in Tables 5 to 8, the static makespan can be obtained under the deterministic environment without machine breakdown. When the manufacturing environment is disturbed, the makespan will be extended, disrupting the production schedule. Makespan is expected to remain as stable as possible to ensure the robustness of the scheduling strategy.

Table 5.

Comparison between IRDRL and the right-shift strategy under scenario MB1.

Instance	Static makespan	Breakdown scenario MB1		Right-shift		IRDRL
Instance	Static makespan	Breakdown time	Repair time	Makespan	Robustness (%)	Makespan	Robustness (%)	Running time(s)
Ex1 (6 * 6)	574	141.1	47.8	621.8	91.67	576.5	99.56	0.082
Ex2 (10 * 10)	988	253.8	80.5	1068.5	91.85	991.5	99.65	0.22
Ex3 (15 * 15)	1504	337.3	117.6	1621.6	92.18	1510	99.6	0.49
Ex4 (20 * 20)	2007	517.8	150	2157	92.53	2016.1	99.55	1.01
Ex5 (30 * 20)	2508	640.2	223.7	2731.7	91.08	2526	99.28	1.55
Abz5 (10 * 10)	1382	501	103.4	1485.4	92.52	1423	97.03	0.93
ft10 (10 * 10)	1263	169.1	70.2	1333.2	94.44	1307	96.52	0.94
La20 (10 * 10)	1251	246.8	108.9	1359.9	91.29	1276	98.00	0.99
Tai 15 * 15	1553	419	115.4	1668.4	92.57	1660	93.11	0.59
Tai 20 * 20	2138	647.8	151.9	2289.9	92.9	2167	98.81	1.05
Tai 30 * 20	2608	580.8	208.3	2816.3	92.01	2613	99.81	1.91
Yn1 (20 * 20)	1113	450.5	82.7	1195.7	92.57	1126	98.83	1.78

Note: Robustness in bold means better stability of the scheduling scheme.

Table 6.

Comparison between IRDRL and the right-shift strategy under scenario MB2.

Instance	Static makespan	Breakdown scenario MB2		Right-shift		IRDRL
Instance	Static makespan	Breakdowntime	Repair time	Makespan	Robustness(%)	makespan	Robustness(%)	Runningtime(s)
Ex1 (6 * 6)	574	437.6	48.4	622.4	91.57	577.6	99.37	0.13
Ex2 (10 * 10)	988	740.8	79.1	1067.1	91.99	995.2	99.27	0.22
Ex3 (15 * 15)	1504	1090.1	113.4	1617.4	92.46	1513.6	99.36	0.48
Ex4 (20 * 20)	2007	1538.5	148.8	2155.8	92.59	2018	99.45	1.86
Ex5 (30 * 20)	2508	1862.1	219.6	2727.6	91.24	2530	99.12	1.62
Abz5 (10 * 10)	1382	906.1	94.5	1476.5	93.16	1401	98.63	0.94
ft10 (10 * 10)	1263	622.8	92.2	1355.2	92.7	1271	99.37	0.95
La20 (10 * 10)	1251	946.2	95.3	1346.3	92.38	1273	98.24	0.95
Tai 15 * 15	1553	1184.7	116.3	1669.3	92.51	1565.5	99.20	0.57
Tai 20 * 20	2138	1652.4	160.5	2298.5	92.49	2155.7	99.17	1.55
Tai 30 * 20	2608	2131.9	215	2823	91.7	2648	98.23	1.7
Yn1 (20 * 20)	1113	917.2	70.8	1183.8	93.64	1138.8	97.68	1.7

Note: Robustness in bold means better stability of the scheduling scheme.

Table 7.

Comparison between IRDRL and the right-shift strategy under scenario MB3.

Instance	Static makespan	Breakdown scenario MB3		Right-shift		IRDRL
Instance	Static makespan	Breakdowntime	Repairtime	makespan	Robustness(%)	makespan	Robustness(%)	Running time(s)
Ex1 (6 * 6)	574	141.9	143	717	75. 09	583	98.43	0.13
Ex2 (10 * 10)	988	230.1	240.5	1228.5	75.66	1006.7	98.11	0.21
Ex3 (15 * 15)	1504	396.7	350.4	1854.4	76.7	1530	98.27	0.61
Ex4 (20 * 20)	2007	490.8	455.9	2462.9	77.28	2045	98.11	0.98
Ex5 (30 * 20)	2508	645.5	667.8	3175.8	73.37	2522	99.04	1.6
Abz5 (10 * 10)	1382	624.7	341.5	1723.5	75.29	1403	98.48	0.89
ft10 (10 * 10)	1263	213.8	224.2	1487.2	82.25	1271	99.37	0.99
La20 (10 * 10)	1251	393.9	294.4	1545.4	76.47	1310	95.28	0.92
Tai 15 * 15	1553	208.9	339.1	1892.1	78.16	1575.4	98.56	0.58
Tai 20 * 20	2138	590.2	462	2600	78.39	2177.3	98.16	1.05
Tai 30 * 20	2608	589.1	660.6	3286.6	74.67	2657	98.12	1.71
Yn1 (20 * 20)	1113	232.5	242.7	1355.7	78.19	1162	95.60	1.76

Note: Robustness in bold means better stability of the scheduling scheme.

Table 8.

Comparison between IRDRL and the right-shift strategy under scenario MB4.

Instance	Static makespan	Breakdown scenario MB4		Right-shift		IRDRL
Instance	Static makespan	Breakdowntime	Repairtime	Makespan	Robustness(%)	Makespan	Robustness(%)	Runningtime(s)
Ex1 (6 * 6)	574	448	142.5	716.5	75.17	590	97.2	0.13
Ex2 (10 * 10)	988	740.7	239.4	1227.6	75.75	1028.1	95.94	0.21
Ex3 (15 * 15)	1504	1090.6	350.5	1854.5	76.7	1547.5	97.11	0.48
Ex4 (20 * 20)	2007	1359.3	455	2462	77.33	2075	96.61	0.92
Ex5 (30 * 20)	2508	1880	666.8	3174.8	73.41	2605	96.13	1.65
Abz5 (10 * 10)	1382	8836.9	323.1	1705.1	76.62	1417	97.47	0.93
ft10 (10 * 10)	1263	971.5	251.3	1514.3	80.1	1301	96.99	0.95
La20 (10 * 10)	1251	762.4	267.4	1518.4	78.63	1294.4	96.53	0.92
Tai 15 * 15	1553	1189.8	347	1900	77.66	1646.8	93.96	0.58
Tai 20 * 20	2138	1679.6	453.9	2591.9	78.77	2240	95.23	1.2
Tai 30 * 20	2608	1928.6	662.9	3270.9	74.58	2748.5	94.61	1.73
Yn1 (20 * 20)	1113	992.3	230.1	1343.1	79.33	1277.1	85.26	1.72

Note: Robustness in bold means better stability of the scheduling scheme.

The experiments in the above Tables 5 and 6 are carried out under scenarios MB1 and MB2. The manufacturing environment for both scenarios is characterized by machine breakdown with minor fault levels in the early or later stage. Machine breakdowns are minor, and repairs can be completed very quickly. The indicators of robustness in bold are overwhelmingly above 90%, indicating the limited impact on the scheduling strategy by the minor machine faults. In Tables 7 and 8, the experiments are carried out under scenarios MB3 and MB4 with high fault levels. The repair time is longer, and the most robustness indicators of the right-shift strategy are 70%–80%, indicating that the disturbance has a significant impact on scheduling. Experiments show that the running time of IRDRL is short enough to ensure the feasibility of real-time scheduling.

From all tables above, the makespan of the right-shift strategy is always longer, and the robustness is always smaller than the IRDRL method, which indicates that the IRDRL scheduling performs better, especially in responding to disturbances with high fault levels. The reason is that the IRDRL agent observes and evaluates the affected and unaffected operations through the disjunctive graph, and adjusts the scheduling scheme dynamically to avoid wasting time during the whole shutdown.

Comparisons with static-model of RL

To further confirm the advantage of the IRDRL scheduling, this part provides a comparison with the previous RL method. The method proposed in reference³⁵ is to generalize the model produced in the static environment to the dynamic environment. In this paper, the proposed IRDRL approach puts the agent into a dynamic backdrop to learn and accumulate experience. The comparison between the previous static-model and the proposed IRDRL model in terms of makespan and robustness is shown in Figure 6.

Figure 6.

Comparison between the IRDRL model and static-model: (a) Abz5 (6 * 6), (b) ft10 (10 * 10), (c) Tai 15*15, (d) Tai 20 * 20, (e) Tai 30 * 20, and (f) yn1 (20 * 20).

As can be seen, the selected instances cover both large and small scales. The columns of these instances with the static-model are almost always higher than the IRDRL model, which indicates that the static-model has some generalization, but can’t gain the makespan as good as the IRDRL model, verifying the effectiveness and superiority of IRDRL comprehensively.

Generalization of IRDRL approach compared with heuristic

Heuristic methods need recalculation whenever the problem changes, even slightly, which is time-consuming and impractical. Therefore, the purpose of the experiment is to prove the adaption of the IRDRL model.

Generalization of IRDRL model in scheduling scale

In actual manufacturing, the scheduling scale, that is, the number of the workpieces varies from batch to batch. It is expected that a well-trained model can have excellent scheduling performance for a similar scale, especially for large-scale scheduling problems. Therefore, this experiment takes the 30 * 20 model under scenario MB1 as an example, verifying the generalization ability of the 30 * 20 model to other large-scale cases, which are selected from standard benchmark Tai and DMU. The results are shown in Figure 7 and Table 9.

Figure 7.

Generalization of the 30 * 20 model in the scheduling scale.

Table 9.

Generalization of the 30 * 20 model in scheduling scale under scenario MB1.

Instance	Static makespan	Repair time	IRDRL		Right-shift
Instance	Static makespan	Repair time	Robustness (%)	Running time(s)	Delay time
Tai 30 * 20	2603	226	95.01	0.18	96
Tai 50 * 15	3393.8	354	98.73	0.26	310.8
Tai 50 * 20	3593.9	342	98.05	0.44	271.9
Tai 100 * 20	6097	682	99.18	3.61	632
DMU 30 * 20	5879	423	98.54	0.183	337
DMU 40 * 15	6735.6	535	99.73	0.187	516.6
DMU 40 * 20	7380	573	99.28	0.29	520
DMU 50 * 15	8175.8	707	98.02	0.24	544.9
DMU 50 * 20	8783	726	98.62	0.44	605

From Figure 7, static makespan is the solution in the determined manufacturing environment, while the IRDRL presents the solutions generalized by the 30 * 20 model for different large-scale cases. The IRDRL of each instance is better than the right-shift strategy. From Table 9, the robustness of all instances is over 90%, proving the excellent generalization ability of IRDRL models. The delay time means the time consumed by right-shift compared with IRDRL. In addition, the running time for each instance is less than 1 s, which contributes to implementing the online scheduling framework.

Generalization of the IRDRL model in similar breakdown scenarios

It is expected that the well-trained model can be used in a similar machine breakdown environment. Therefore, this experiment takes the models under scenario MB1 and scenario MB3 as examples. Models for break level 0.1–0.15 are generalized to the environment break level 0.15–0.25, while models for break level 0.35–0.4 are generalized to break level 0.4–0.5. To prove the generalization of the proposed IRDRL model, the standard benchmarks ranging from small to large are selected, as shown in Table 10. The experimental results shows that the robustness in all scenarios is above 90%, which proves the excellent generalization performance of the IRDRL models. Similarly, the extremely short running time demonstrates the feasibility of calling the model in real-time.

Table 10.

Generalization of the model to the similar break levels.

Instance	Static makespan	Model of scenarios MB1
Instance	Static makespan	Break level (0.15–0.25) makespan	Repair time	Robustness (%)	Running time(s)
Abz5 (10 * 10)	1382	1396	183	98.9	1.0
ft10 (10 * 10)	1263	1307	111.70	96.52	0.94
La20 (10 * 10)	1251	1236	166.4	89.6	0.93
Tai (15 * 15)	1553	1580	158.70	97.89	0.59
Tai (20 * 20)	2138	2160	215.00	98.97	1.06
Tai (30 * 20)	2608	2796	316.00	92.79	1.78
Yn1 (20 * 20)	1113	1126	68.4	98.80	1.74
Instance	Static makespan	Model of scenarios MB3
		Break level (0.4–0.5) makespan	Repair time	Robustness (%)	Running time(s)
Abz5 (10 * 10)	1382	1388	382.5	99.50	0.91
ft10 (10 * 10)	1263	1289	263	97.94	1.03
La20 (10 * 10)	1251	1274	358	86.97	0.86
Tai (15 * 15)	1553	1607	416	96.15	0.67
Tai (20 * 20)	2138	2194	559	97.38	1.10
Tai (30 * 20)	2608	2650	792	98.39	1.70
Yn1 (20 * 20)	1113	1153	311	96.53	1.63

The satisfied generalization performance of the IRDRL model is helpful for the implementation of the online scheduling framework.

Conclusion and future work

In this paper, the IRDRL approach is developed for a dynamic job shop to minimize the makespan and maintain the robustness of the scheduling. The starting time and completion time of operations are inserted into the feature vector of the disjunctive graph to present the machine breakdown status. The optimal dynamic scheduling model is formed through the learning of the agent. Based on the IRDRL model, the online scheduling framework is proposed, which establishes a mapping relationship between the manufacturing environment and the optimal scheduling model.

Numerical experiments are carried out in many instances, including well-known benchmarks and randomly generated instances, to demonstrate the advantage of the proposed approach. An optimal balance between scheduling quality and speed could be achieved, which has proved the feasibility of the online scheduling framework.

Although the proposed approach shows improved performance, some minor factors ignored in this paper need to be considered in future work. (1) In respect of the mathematical model, the possibility of repeated machine failure after repair is ignored, which is a small possibility but cannot be neglected in actual production, and the following research will continue to improve the mathematical model to make it more accurate and realistic to the actual production. (2) In respect of disturbance and job shop scenario, multiple disturbances usually co-exist in actual production, such as uncertain processing times and urgent orders. It is necessary to simultaneously consider this co-existing and coupled disturbance. Meanwhile, the research scenario of the proposed approach should be extended from a job shop to a flexible job shop which is widely used in modern industrial manufacturing. In addition, the research of single job shop can be extended to distributed job shops to meet new production requirements. (3) In respect of optimization objectives, many other objectives in actual production management should be considered in the following research, such as tardiness, machine utilization, cost, etc.

Footnotes

Appendix

Author contributions

The authors declare their contributions to the published paper “Dynamic scheduling for dual-objective job shop with machine breakdown by reinforcement learning” as follows: XG: Investigation, conceptualization, methodology, formal analysis, writing – original draft; YZ: Analysis, writing – review and editing; GY: Data curation, validation, writing – review and editing; AZ: Software visualization, analysis, validation; FT: Resources, supervision. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Key R&D Program of China (2020YFB1713300), in part by the Natural Science Foundation of China (61863005), in part by the Fundamental Research Funds for the Central Universities under Grant (YWF-22-L-1144).

ORCID iD

Ying Zuo

Availability of data and material

All data generated or analyzed during the current study are included in this article.

Code availability

The experiment is tested by pytorch. The code is shared in Baidu network disk, and the link is , the extraction code is “s9mx”.

References

Zhang

Yan

. Integrated optimization of production planning and scheduling for a kind of job-shop. Int J Adv Manuf Technol 2005; 26: 876–886.

Sharma

Jain

. A review on job shop scheduling with setup times. Proc IMechE, Part B: J Engineering Manufacture 2016; 230: 517–533.

Akram

Kamal

Zeb

. Fast simulated annealing hybridized with quenching for solving job shop scheduling problem. Appl Soft Comput 2016; 49: 510–523.

Zheng

Yang

, et al. A two-stage integrating optimization of production scheduling, maintenance and quality. Proc IMechE, Part B: J Engineering Manufacture 2020; 234: 1448–1459.

Sharma

Jain

. Effect of routing flexibility and sequencing rules on performance of stochastic flexible job shop manufacturing system with setup times: simulation approach. Proc IMechE, Part B: J Engineering Manufacture 2017; 231: 329–345.

Chang

Jia

, et al. Digital twin and deep reinforcement learning enabled real-time scheduling for complex product flexible shop-floor. Proc IMechE, Part B: J Engineering Manufacture 2022. DOI:10.1177/095440542 21121934

José Palacios

González-Rodríguez

Vela

, et al. Robust multiobjective optimisation for fuzzy job shop problems. Appl Soft Comput 2017; 56: 604–616.

Souza

RLC

Ghasemi

Saif

, et al. Robust job-shop scheduling under deterministic and stochastic unavailability constraints due to preventive and corrective maintenance. Comput Ind Eng 2022; 168: 108130.

Zheng

Tang

Giret

, et al. Dynamic shop floor re-scheduling approach inspired by a neuroendocrine regulation mechanism. Proc IMechE, Part B: J Engineering Manufacture 2015; 229: 121–134.

10.

Shafaei

Brunn

. Workshop scheduling using practical (inaccurate) data Part 2: an investigation of the robustness of scheduling rules in a dynamic and stochastic environment. Int J Prod Res 1999; 37: 4105–4117.

11.

Jamili

. Robust job shop scheduling problem: mathematical models, exact and heuristic algorithms. Expert Syst Appl 2016; 55: 341–350.

12.

Wang

Lan

, et al. A hybrid local-search algorithm for robust job-shop scheduling under scenarios. Appl Soft Comput 2018; 62: 259–271.

13.

Wang

Zhang

, et al. Two-objective robust job-shop scheduling with two problem-specific neighborhood structures. Swarm Evol Comput 2021; 61: 100805.

14.

Xiong

Xing

Chen

. Robust scheduling for multi-objective flexible job-shop problems with random machine breakdowns. Int J Prod Econ 2013; 141: 112–126.

15.

Sun

Xiao

. Risk measure of job shop scheduling with random machine breakdowns. Comput Oper Res 2018; 99: 1–12.

16.

Chakrabortty

Rahman

Ryan

. Efficient priority rules for project scheduling under dynamic environments: a heuristic approach. Comput Ind Eng 2020; 140: 106287.

17.

Luo

. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl Soft Comput 2020; 91: 106208.

18.

Nie

Gao

, et al. A GEP-based reactive scheduling policies constructing approach for dynamic flexible job shop scheduling problem with job release dates. J Intell Manuf 2013; 24: 763–774.

19.

Wang

Pan

Wang

. A review of reinforcement learning based intelligent optimization for manufacturing scheduling. Complex Syst Model Simul 2021; 1: 257–270.

20.

Liu

Zhou

. Multi-objective energy-saving scheduling for a permutation flow line. Proc IMechE, Part B: J Engineering Manufacture 2018; 232: 879–888.

21.

Chen

Zou

Wang

. Digital twin oriented multi-objective flexible job shop scheduling model and its hybrid particle swarm optimization. Proc IMechE, Part B: J Engineering Manufacture 2022. Epub ahead of print 9 September 2022. DOI: 10.1177/095440542211219

22.

Zhang

, et al. Multi-objective optimisation in flexible assembly job shop scheduling using a distributed ant colony system. Eur J Oper Res 2020; 283: 441–460.

23.

Feinberg

Shwartz

. Handbook of Markov decision processes: methods and applications. Springer Science & Business Media, New York, 2012.

24.

Zhou

Zhu

Tang

, et al. Reinforcement learning for online optimization of job-shop scheduling in a smart manufacturing factory. Adv Mech Eng 2022; 14: 16878132221086120.

25.

Park

Huh

Kim

, et al. A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Trans Autom Sci Eng 2019; 17: 1420–1431.

26.

Chien

Lan

. Agent-based approach integrating deep reinforcement learning and hybrid genetic algorithm for dynamic scheduling for Industry 3.5 smart production. Comput Ind Eng 2021; 162: 107782.

27.

Luo

Wang

Yang

, et al. An improved deep reinforcement learning approach for the dynamic job shop scheduling problem with random job arrivals. J Phys Conf Ser 2021; 1848: 12029.

28.

Aydin

Öztemel

. Dynamic job-shop scheduling using reinforcement learning agents. Robot Auton Syst 2000; 33: 169–178.

29.

Zhou

Zhang

Horn

BKP

. Deep reinforcement learning-based dynamic scheduling in smart manufacturing. Procedia CIRP 2020; 93: 383–388.

30.

Kardos

Laflamme

Gallina

, et al. Dynamic scheduling in a job-shop production system with reinforcement learning. Procedia CIRP 2021; 97: 104–109.

31.

Yang

Yan

. An adaptive approach to dynamic scheduling in knowledgeable manufacturing cell. Int J Adv Manuf Technol 2009; 42: 312–320.

32.

Zeng

Liao

Dai

, et al. Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism. arXiv Preprint arXiv220100548.

33.

Shiue

Lee

. Real-time scheduling for a smart factory using a reinforcement learning approach. Comput Ind Eng 2018; 125: 604–614.

34.

Shahrabi

Adibi

Mahootchi

. A reinforcement learning approach to parameter estimation in dynamic job shop scheduling. Comput Ind Eng 2017; 110: 75–82.

35.

Wang

, et al. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Comput Netw 2021; 190: 107969.

36.

Zhang

Zhu

Tang

, et al. Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robot Comput Integr Manuf 2022; 78: 102412.

37.

Lei

. Scheduling stochastic job shop subject to random breakdown to minimize makespan. Int J Adv Manuf Technol 2011; 55: 1183–1192.

38.

Wang

Zhang

Fuh

JYH

. Job rescheduling by exploring the solution space of process planning for machine breakdown/arrival problems. Proc IMechE, Part B: J Engineering Manufacture 2011; 225: 282–296.

39.

Seidgar

Zandieh

Fazlollahtabar

, et al. Simulated imperialist competitive algorithm in two-stage assembly flow shop with machine breakdowns and preventive maintenance. Proc IMechE, Part B: J Engineering Manufacture 2016; 230: 934–953.

40.

Kim

. Insertion of new idle time for unrelated parallel machine scheduling with job splitting and machine breakdowns. Comput Ind Eng 2020; 147: 106630.

41.

Nababan

SalimSitompul

Barsoum

, et al. Manipulating tabu list to handle machine breakdowns in job shop scheduling problems. AIP Conf Proc 2011; 1337: 224–228.

42.

Yang

Huang

Yu Wang

, et al. Robust scheduling based on extreme learning machine for bi-objective flexible job-shop problems with machine breakdowns. Expert Syst Appl 2020; 158: 113545.

43.

Al-Hinai

ElMekkawy

. Robust and stable flexible job shop scheduling with random machine breakdowns using a hybrid genetic algorithm. Int J Prod Econ 2011; 132: 279–291.

44.

Leskovec

, et al. How powerful are graph neural networks? arXiv Preprint arXiv181000826.

45.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. arXiv Preprint arXiv170706347.

46.

Zhang

Song

Cao

, et al. Learning to dispatch for job shop scheduling via deep reinforcement learning. Adv Neural Inf Process Syst 2020; 33: 1621–1632.

47.

Adams

Balas

Zawack

. The shifting bottleneck procedure for job shop scheduling. Manage Sci 1988; 34: 391–401.

48.

Fisher

. Probabilistic learning combinations of local job-shop scheduling rules. Ind Sched 1963: 225–251.

49.

Taillard

. Benchmarks for basic scheduling problems. Eur J Oper Res 1993; 64: 278–285.

50.

Yamada

Nakano

. A genetic algorithm applicable to large-scale job-shop problems. In: Parallel problem solving from Nature 2, PPSN-II, Brussels, Belgium, 28–30 September 1992, pp.281–290.

51.

Demirkol

Mehta

Uzsoy

. Benchmarks for shop scheduling problems. Eur J Oper Res 1998; 109: 137–141.

52.

Lawrence

. Resouce constrained project scheduling: an experimental investigation of heuristic scheduling techniques (Supplement). Carnegie-Mellon University, Oakland, 1984.

53.

Ghaleb

Zolfagharinia

Taghipour

. Real-time production scheduling in the industry-4.0 context: addressing uncertainties in job arrivals and machine breakdowns. Comput Oper Res 2020; 123: 5031.