Abstract
Safety supervision is identified as a crucial tool for encouraging safe production within chemical enterprises, yet the existing safety supervision methods often struggle to deter unsafe behaviors, leaving these enterprises susceptible to safety accidents. The current literature, predominantly based on evolutionary game theory, largely focuses on optimizing supervision methods while lacking effective guidance for enterprises to ensure rule compliance. Furthermore, this research predominantly centers on the analysis of two key stakeholders using static reward and punishment strategies, neglecting other potential participants and dynamic reward and punishment strategies. To address these gaps, this paper introduces an evolutionary game model encompassing the three primary stakeholders in chemical production safety supervision: government regulators, chemical enterprises, and employees. The study assesses the stability of these three subjects under static reward and punishment strategies, dynamic punishment strategies, and dynamic reward and punishment strategies. In conjunction with the system dynamics model, numerical simulations are utilized to analyze shifts in stakeholders' decision-making behavior across different scenarios. Simulation results show that, under the static mechanism, there is no evolutionary equilibrium solution for the three-game subjects. While increasing reward and punishment coefficients can temporarily enhance enterprise compliance, it also escalates system volatility. The linear dynamic punishment mechanism can mitigate subject volatility but does not yield optimal evolutionary results. Finally, a novel nonlinear dynamic punishment-reward mechanism is proposed, effectively controlling the instability within the game scenario and making compliant production the optimal strategic choice for chemical enterprises.
Plain Language Summary
This paper introduces an evolutionary game model encompassing the three primary stakeholders in chemical production safety supervision: government regulators, chemical enterprises, and employees. The study assesses the stability of these three subjects under static reward and punishment strategies, dynamic punishment strategies, and dynamic reward and punishment strategies. In conjunction with the system dynamics model, numerical simulations are utilized to analyze shifts in stakeholders' decision making behavior across different scenarios. Simulation results show that, under the static mechanism, there is no evolutionary equilibrium solution for the three-game subjects. While increasing reward and punishment coefficients can temporarily enhance enterprise compliance, it also escalates system volatility. The linear dynamic punishment mechanism can mitigate subject volatility but does not yield optimal evolutionary results. Finally, a novel nonlinear dynamic punishment-reward mechanism is proposed, effectively controlling the instability within the game scenario and making compliant production the optimal strategic choice for chemical enterprises.
Keywords
Introduction
The chemical industry is the foundation of national economic development and a pillar industry in each country. However, the production process of chemical enterprises involves highly hazardous chemicals and raw materials that are characterized by flammability, explosiveness, and toxicity. The chemical industry is therefore a key target for the prevention of serious production and safety accidents (Bai et al., 2023; X. Wang, 2021). To prevent production safety accidents, China's Ministry of Emergency Management has released various production safety regulations. Despite these efforts, safety incidents continue to occur in chemical enterprises due to inadequate safety protocols in some companies, low levels of safety awareness, and incomplete risk assessments on a daily basis (Pan et al., 2022 ; L. Sun et al., 2022). The causes of safety accidents in chemical enterprises are multifaceted and include hazardous work environments, malfunctioning safety equipment, outdated technology, and insufficient effectiveness of safety supervision (Han et al., 2022; Yu et al., 2019; Khrais et al., 2013). Studies have shown that production disorder in enterprises is a significant factor that contributes to safety accidents (Abbasinia & Mohammadfam, 2022; Mohandes et al., 2022; R.Zhao et al., 2018). Irregular production disorder in enterprises is typically characterized by overproduction, infrequent investigation of hidden hazards, and neglect of production safety systems, and is often driven by the pursuit of increased economic profits (Tong et al., 2021). To a certain extent, government regulation can curb irregularities by enterprises, but there are still a certain number of enterprises that take chances and neglect their responsibility for work safety. Therefore, it is imperative to encourage chemical companies to proactively comply with rules and regulations to reduce potential safety hazards.
At present, relevant researches focus on the analysis of factors leading to safety accidents in chemical enterprises, which are divided into two aspects: external factors and internal factors. External factors include material properties, equipment operation capabilities, operating environment (Guo et al., 2022; Khakzad et al., 2018; X. Zhou & Peng, 2020). Serafin et al. (2013) simulated chemical dust explosions under different airflow intensities to study the secondary effects of explosions that may occur during the reproduction process. Danzi and Marmo (2019) studied the characteristics of different chemical metal dusts and determined the correlation between chemical production processes and the level of chemical explosion risk. Lv et al. (2017) established an index weight model to evaluate the ammonia synthesis reaction equipment of a chemical industry group. The results showed that the operation of chemical production equipment affects the company's safety production management capability. The internal factors mainly include hazardous operation by employees, poor supervision, and illegal production by the company (Al-Mousa et al., 2022; Jung et al., 2020; Z. Yuan et al., 2015). Y. Gao et al. (2019) proposed a method for categorizing safety production in chemical enterprises and discussed the effectiveness of different safety supervision approaches to help chemical enterprises achieve more efficient safety production. Lu et al. (2020) used Bayesian network modeling to assess the main factors triggering explosions in chemical companies, confirming that effective government response and high level of corporate safety management can reduce corporate production risk. Moreover, J. Wang et al. (2020) leveraged an accident causality model to dissect safety incidents in extant chemical firms. These studies provide valuable insights into the causes of safety accidents in chemical firms and discuss risk control strategies, but they have yet to delve into the actual impact of government regulation on firms and the intricate behavioral interactions among various production stakeholders.
In the production process of chemical companies, many interest groups actively participate, and each participant has different demands. These individuals and entities make behavioral choices guided by their interests under the constraints of bounded rationality, which may cause conflicts of interest. To effectively alleviate conflicts between different stakeholders and maintain the security and stability of the entire system, it is imperative to improve the level of safety supervision (X. Li et al., 2020; X. Xie & Guo, 2018; Yang et al., 2022). Traditional game theory assumes that players are completely rational, which is contrary to reality. Therefore, it is necessary to use the evolutionary game theory proposed by Smith and Price (1973) to analyze the conflicts of interest between participants. Evolutionary game theory has a wider range of applications, such as the evolutionary evolution of biological populations (Antoci et al., 2023; Shaw et al., 2023), dynamic resource allocation (Barreiro-Gomez et al., 2019; Loumiotis et al., 2014; Noailly et al., 2009), social network evolution (Iyer & Killingback, 2014; N. Zhao et al., 2022; Zong et al., 2015), behavioral strategy optimization (X. Liu et al., 2022; D. Wang & Li, 2020). As one of the effective methods to study multi-agent behavior, evolutionary game theory is often used to explain the dynamic changes in competitors’ strategies under different game situations (Ahsan Habib et al., 2020). B. T. Gao et al. (2021) simulated changes in residential integrated demand response (IDR) and differences in user strategies through different contract prices. In recent years, some scholars have applied evolutionary game theory to corporate safety production supervision issues. From a macro perspective, the need for intervention by external regulators (e.g., government, and third-party regulators) has been explored. Y. B. Zhang et al. (2023) simulated the changes in the safety behavior of construction firms and the stability of the game system under different regulatory efforts by the government based on evolutionary game theory combined with nonlinear safety supervision. Zhong and Li (2022) discussed the evolutionary game model between enterprise production-safety-service procurement strategies and local government supervision strategies under hierarchical supervision, and verified that strict supervision by local governments will promote the convergence of enterprise production strategies to a more ideal state. Xin (2019) introduced third-party supervision services for chemical enterprise safety production on the basis of traditional government supervision, and used system dynamics simulation to verify the leading role of government supervision and the effectiveness of third-party supervision services. Combining macro and micro perspectives, the choice of an enterprise’s production safety strategy is jointly influenced by all stakeholders. Y. C. Xie et al. (2023) studied the relationship between employee safety behavior strategies and corporate safety investments, and explored the changes in the strategies of both parties under different investment costs.
Overall, the existing studies emphasize the importance of safety regulation and analyze the evolutionary path of decision-making behaviors of stakeholders involved in corporate safety. Some scholars (Chen et al., 2021; T. Sun & Feng, 2021; K. Zhou et al., 2022) have pointed out that firms will be forced by strong regulatory pressure to change their bad strategies, but long-term high-intensity regulation and punitive tactics will lead to slackening of the work of the regulator and a sense of fluke on the part of the firms, which is not a desirable evolutionary strategy. However, the existing research mainly focuses on the supervisory behavior of the regulator, lacks the game analysis of those directly related to the safety production risk, and ignores the impact of reward and punishment mechanism on the enterprise's safety production decision. Obviously, what we seek is for regulators to use reasonable means to guide corporate behavior, promote chemical companies to proactively comply with production regulations, and reduce the probability of accidents, rather than severe “one size fits all” form of violation penalties and subsequent accountability. Therefore, in order to fill this research gap, the main contributions of this paper are as follows: (1) Based on the context of preventing the risk of safety accidents in chemical enterprises, this study constructed an evolutionary game model between the governmental supervisory departments, chemical enterprises and employees, and explored the game relationship among the three parties by using the evolutionary game theory. (2) Combined with system dynamics simulation methods to further analyze the dynamic evolution process and stability state of the game system, which is the optimization of the single static game of traditional research. (3) Analyzed the implementation effect of different reward and punishment strategies, verified the effectiveness of the dynamic punishment mechanism on the constraints of chemical enterprise safety production, and put forward the idea that the nonlinear dynamic punishment-reward mechanism is closer to the actual scenario of the safety supervision problem of chemical enterprises.
Model Construction
Evolutionary Game Model Assumptions and Description
At present, the main participants in production safety supervision in the chemical industry are local governments, chemical enterprises and employees. The safety regulatory department of the local government carries out safety regulations, chemical enterprises make decisions on production behavior, and front-line employees supervise and report production behavior and safety risks. Therefore, this paper treats the local government as a whole, and all three parties are bounded rational beings, all aiming at maximizing their interests. Government regulators can choose whether to regulate or not, chemical enterprises can choose whether to produce according to regulations, and employees can choose whether to conduct daily supervision.
Assuming that the probability of government regulation is p (0 ≤ p ≤ 1), with larger p representing stronger government regulation. When p = 1, the government chooses strong regulation. When p = 0, the government regulates poorly; Assuming that the probability of the enterprises' following the regulations is q (0 ≤ q ≤ 1), the larger q represents the higher probability of the enterprises’ following the regulations. When the enterprises are completely following the regulations, q = 1, and when the enterprises are completely violating the regulations, q = 0. Assuming that the probability of employees’ participation in safety supervision is z (0 ≤ z ≤ 1), the larger z represents the higher probability of employees' participation in safety supervision. When employees are highly involved in safety supervision, z = 1, and when employees are not involved in safety supervision, z = 0.
The government needs to spend a certain amount of manpower and material resources in daily supervision, which is assumed to be the cost of comprehensive supervision of production safety D1, and active supervision is conducive to the government’s ability to establish a good law enforcement image and enhance its credibility K1. The government supervises the process of giving incentives to compliant enterprises for safe production W1, and the enterprises will reap intangible rewards such as a good word of mouth B2 as a result. The employees, as the first actors of the production activities, have a direct grasp of the enterprise’s irregularities and other malpractices. As the first actors in production activities, employees can directly grasp the irregularities and other malpractices of the enterprises, and when they successfully report the irregularities or hidden risks of the enterprises, the employees will be rewarded for supervision B1, and the chemical enterprises will be punished for the irregularities of production C.
The effectiveness of the government’s supervision is affected by a variety of factors, such as the frequency of supervision, the intensity of supervision, and the supervisory personnel. If the government is ineffective in monitoring and allows unsafe production, it will lose credibility M3. Meanwhile, the government needs to incur a series of costs D3 to control unsafe production, and the daily supervision by employees also incurs costs D2. If the employees do not participate in the supervision and the government allows unsafe production, the enterprise not only receives an additional gain ΔQ from the unsafe production, but also incurs an additional loss ΔF due to the unsafe production. Once a firm's unsafe production activities are reported by its employees, it will incur tangible losses, such as reduced profits, M1, and intangible losses, such as damage to its reputation, M2.
Considering that employees are the direct beneficiaries of the enterprise’s safe and compliant production, they need to voluntarily investigate the safety risks during the production process out of the need to maintain their safety. Assume that B1 < W1; under the situation of resumption of work and production, the additional payment to employees for production violations of chemical enterprises will be greater than the cost of employees to take supervisory behaviors, assuming that D2 < ΔQ; chemical enterprises carry the expectations of employees, and if the enterprise violates the production rules, it will have a negative impact on its reputation and fame, assuming that B2 < M2.
The variables involved in the above game process of chemical enterprise production safety regulation are shown as Table 1.
Meaning of Each Variable in the Three-Party Game System.
From the above assumptions on the behavioral strategies of each party and the setting of each variable, the three-party benefit payment matrix of the government, chemical enterprises and employees can be obtained, as shown in Table 2.
Payment Matrix of Three-Party Evolutionary Game Model of Government, Chemical Enterprise and Employees.
Replicator Dynamic Equations
The expected benefits under the government's regulation and non-regulation strategies are represented by U11 and U12 respectively, and their average benefit is
Variation of the proportion of government regulation amount:
The expected benefits under the enterprises' production strategies of following and violating regulations are denoted by
Variation of the proportion of chemical enterprises' following the regulations production amount:
The expected benefits under employee supervised and non-supervised strategies are denoted by
Variation of the proportion of employees supervision amount:
From the above results, a three-party system of replicated dynamic equations
Game System Stability Analysis
Government Stability Analysis
When
(1) If
(2) If
Chemical enterprises stability analysis
When
(1) If
(2) If
Employees stability analysis
When
(1) If
(2) If
Game System Stability Analysis
When the replicated dynamic equation system is equal to 0, it indicates that the speed and direction of strategic adjustment of the three parties involved in the evolutionary game system of safety production supervision in chemical enterprises no longer change. At this time, the game system reaches a relatively stable equilibrium state. Therefore, let H(p) = H(q) = H(z) = 0 in equation (13), then the equilibrium solution of the evolutionary game system of safety production supervision in chemical enterprises is:
Substitute each equilibrium solution into the above Jacobian matrix and obtain the corresponding eigenvalues, as shown in Table 3.
Equilibrium Solutions and Their Eigenvalues.
Since all parameters in the game system are greater than 0, there is
Multi-player Game Simulation Based on System Dynamics (SD)
There is group feedback behavior in the process of evolutionary game of chemical enterprise safety regulation, which refers to the evolutionary game process of the government’s safety risk prevention of chemical enterprises, the subject’s behavioral strategy will be adjusted and changed accordingly with the strategic choices of other participants. Using system dynamics (SD) can capture the interdependence and feedback loops within the system and deeply analyze the behavioral decision changes of the participants under different variable conditions, which is a simulation method to study the feedback behaviors and strategy changes of complex systems (He & Sun, 2022; Q. Liu, 2021; Long et al., 2019; Song et al., 2023; H. Zhang et al., 2022).
Construction of Simulation Model
Specification of Variables
Based on the above assumptions of game subjects and stability analysis, this study uses Vensim software to construct a multi-player evolutionary game SD model composed of three subsystems, namely government, chemical enterprises, and employees, to investigate the strategy changes of each participant (Figure 1). The SD model consists of three stock variables, three rate variables, nine auxiliary variables, and thirteen constants. The stock variables include the willingness of the government to regulate, the willingness of enterprises to produce in accordance with regulations, and the willingness of employees to supervise. The rate variables include: the amount of change in the proportion of government regulation, the amount of change in the proportion of firms producing in compliance, and the amount of change in the proportion of employee supervision. The remaining variables are state variables.

Multi-player game SD model of government, chemical enterprises, and employees.
Expert Consultation
Referencing research data from X. Wang et al. (2023) and You et al. (2020), and consulting safety management personnel and industry experts from chemical enterprises, initial values for model variables were comprehensively established through expert interviews with practitioners engaged in chemical production activities, senior management of chemical enterprises, and government regulators. The process is as follows:
Expert Selection
The criteria for expert selection are as follows: (1) Possess over 10 years of professional experience. (2) Hold an intermediate-level professional title or higher. (3) Hold a master’s degree or higher. (4) Originating from government regulatory bodies, chemical enterprise management, or frontline operational roles, capable of providing representative insights. Exclusion criteria include: lack of relevant professional experience or direct conflicts of interest with the research team. Ultimately, 15 qualified experts were recruited through recommendations from local industry associations. Basic information is presented in Table 4.
Expert Profile (N = 15, n(%)).
Interview Method
Interviews were conducted both in-person (nine experts) and via telephone (six experts) to establish initial values for the model’s variables that accurately reflect actual situations. Each interview, lasting 40 to 60 min, followed a standardized semi-structured interview guide and was audio-recorded with the participant’s consent. To guarantee consistency and accurate data recording, all research team members were formally trained in interview methodologies.
Data Collection
Interviews were conducted by research team members (interviewers) with experts using semi-structured questioning, focusing on the initial value settings for model variables. Each expert was interviewed independently. To ensure data objectivity, the research team adopted a “dual recording” approach. During interviews, two other team members (recorders) independently documented expert responses, which were later cross-checked and corrected. All numerical opinions provided by experts underwent standardized consolidation. To mitigate the impact of outliers on results, the aggregated opinion data for each variable underwent “truncated averaging”: the highest and lowest values were discarded, and the arithmetic mean of the remaining experts’ numerical inputs was adopted as the final parameter setting. During consultations with 15 experts, values provided by the 12th expert and subsequent participants showed high consistency with earlier results, with parameter fluctuations within a 5% range, further enhancing the reliability of the findings.
Ethical Considerations
In accordance with Chinese laws, regulations, and institutional requirements, the model parameter setting process for this study constitutes expert interviews within the social sciences. It does not involve human clinical trials or animal experiments and is therefore exempt from formal Institutional Review Board (IRB) review. Prior to conducting expert interviews, the research team thoroughly reviewed the interview content to ensure it did not involve sensitive or inappropriate information and contained no potentially offensive material. Furthermore, the semi-structured interview format is non-invasive, ensuring no physical harm or psychological intervention to participants. Research team strictly adheres to scientific ethics standards to safeguard participating experts’ rights:
Informed Consent: Prior to interviews, experts were thoroughly briefed on the study’s purpose, significance, methodology, and data usage. Interviews commenced only after obtaining their informed consent. All experts confirmed participation verbally or in writing, retaining the right to withdraw.
Voluntary Participation: Experts participated entirely voluntarily, with no exchange of benefits involved.
Anonymity: Interview materials were coded for each expert, with no identifiable personal information recorded. Findings were presented as aggregated data and anonymous information.
Data Security: Interview recordings and transcribed texts were stored on encrypted hard drives accessible only to the research team, minimizing data leakage risks.
Based on interview findings, initial values for each variable are shown in Table 5.
Initial Values of Each Constant.
Model Simulation Analysis
The model settings are as follows: INITIAL TIME = 0, FINAL TIME = 120, TIME STEP = 0.125; Units for Time: Day.
Stability Analysis
Substituting the initial values of each variable in Table 2 into the replication dynamics equation of the internal safety supervision evolutionary game system and setting H(x) = 0, a total of 10 evolutionary equilibrium solutions can be obtained, including eight pure strategies and two mixed strategies. The details are shown in Table 6.
Equilibrium Points and Steady State of the Game.
The simulation results of pure strategy x1 are shown in Figure 2a. All three parties of betting and giving up converge to 0 and reach a relatively balanced state. However, if the parameters are fine-tuned, the equilibrium state will be broken. For example, when the initial value p = 0 is changed to p = 0.05, the simulation result evolves into the state in Figure 2b. It can be seen that there is no stable strategy for the evolutionary game system under this strategy. By the same token, it can be deduced that there is no stable evolution strategy for other pure strategies. The simulation results of the mixed strategy x10 are shown in Figure 2c. When the fine-tuning parameter p is from 3/26 to 5/26, the simulation results evolve into Figure 2d. An evolutionary stable state is not formed, and the fluctuation range gradually increases. Government and enterprises will change their behavioral strategies due to sudden changes in the initial strategy. In the same way, it can be seen that the mixed strategy x9 is also unstable. In summary, there is no evolutionarily stable equilibrium solution for the game system. In addition, by calculating the eigenvalues corresponding to the Jacobian matrix of the tripartite evolution system, it can be seen from Lyapunov’s first method to determine the stability. When the real part of the eigenvalue is negative, the system is judged to be stable. Table 3 shows the eigenvalues and stable states corresponding to each equilibrium point. It can be found that there is no stable equilibrium point in the system, which is consistent with the SD simulation analysis results.

Simulation evolution process for different strategies: (a) (p, q, z) = (0, 0, 0). (b) (p, q, z) = (0.05, 0, 0). (c) (p, q, z) = (3/26, 13/17, 1). (d) (p, q, z) = (5/26, 13/17, 1).
Static Reward and Punishment Strategy Analysis
In the actual safety regulations process of chemical enterprises, reducing the violations of enterprises is the main goal of safety production management. Existing research shows that reasonable rewards and punishments for enterprises can improve the level of safe production, based on this, this section will change the reward and punishment strength in the three-party evolutionary game system of chemical enterprise safety production supervision, that is, appropriately adjusting the value of reward and punishment parameters to observe the evolution of the game system. Adjust the intensity of rewards and punishments imposed by government authorities on the production behavior of chemical enterprises, and explore the evolution of the behavioral strategies of all parties under the static reward and punishment strategy.
Each participant is boundedly rational, and their initial strategy selection is random. They usually dynamically adjust their strategies by observing and comparing benefit changes. Assume that the initial strategies of the three parties in the game are set to p = 0.5, q = 0.5, z = 0.5, and the reward coefficient and penalty coefficient W1 = 0.4, C = 0.9 are fine-tuned to (1) respectively: W1 = 0.2, C = 0.6, (2) W1 = 0.6, C = 1.2, the simulation results are shown in Figure 3a, b, c.

(a) Evolution results under static strategy (W1 = 0.4, C = 0.9). (b) Evolution results under static strategy (W1 = 0.2, C = 0.6). (c) Evolution results under static strategy (W1 = 0.6, C = 1.2).
Comparing Figure 3a, b, c, it can be seen that although increasing the reward and punishment coefficients can enhance the probability of the enterprises’ compliance production and the government’s efficient supervision in a short period of time, it can not inhibit the fluctuating process of the three parties, whose behavioral strategies are always in an unstable state. Therefore, it is not feasible to choose a completely static reward and punishment mechanism by only changing the size of the fixed parameters of rewards and punishments. In the long-term game process, the static rewards and punishments cannot be adjusted according to the behavior of other subjects, and there is no stable equilibrium solution for the system. Oscillating fluctuations in the strategies of each party also provide conditions for speculative behaviors such as illegal production by enterprises and ineffective government supervision, which is an important reason for the failure of law enforcement by regulators.
Analysis of Dynamic Punishment Strategy
Penalizing chemical enterprises for production violations can promote their choice to follow the rules of safe production. Considering the realities of the situation, it is clear that the degree of unsafe production by chemical companies varies, as does the level of consequences and negative impacts, and the government needs to decide on the appropriate level of penalties based on the degree of non-compliance by the chemical companies. This study refers to the research results of X. Wang et al. (2023) and assumes that there is a linear relationship between the government’s penalty C for chemical companies’ illegal production behaviors and their illegal production strategy probability (1 - q). Therefore, the dynamic penalty variable C* is introduced, as shown in formula (14).
Among them, a1 is a parameter variable. After tuning and testing the dynamic variables, we can get a1 = 2.851. At this time, the SD model under the linear dynamic penalty strategy adjustment is shown in Figure 4.

SD model under dynamic penalty strategy.
After introducing the dynamic penalty strategy, the initial probabilities of the government, chemical enterprises and employees are set as: a: (p, q, z) = (0.5, 0.5, 0.5), b: (p, q, z) = (0.6, 0.4, 0.2). Extend the model simulation end time to 1,200, and the convergence simulation results are shown in Figure 5a and b.

(a) Evolution results under dynamic penalty strategy (p, q, z) = (0.5, 0.5, 0.5). (b) Evolution results under dynamic penalty strategy (p, q, z) = (0.6, 0.4, 0.2).
It can be seen from the figure that under the dynamic penalty scheme, due to the different initial probabilities of the three parties in the game, the initial degree of fluctuation is different, and the corresponding behavioral strategies change. However, as time progresses, the behavioral strategies of the three parties in the game will eventually stabilize at a fixed point E0 =(0.175676, 0.5, 0)T. Replace C in formula (13) with C* = C(1 - q) + 2.851, and you can get the replicated dynamic equation system H1(x):
From the previous assumptions and formula (15), 9 equilibrium solutions under the dynamic penalty mechanism can be obtained, which are:
At this time, the corresponding eigenvalues can be found to be λ1 = -0.01976 + 0.3267i, λ2 = 0.01976 - 0.3267i, λ3 = −0.06608, (i is the imaginary part), all less than 0, satisfying the stability condition of Lyapunov’s first method, proving that E0 = (0.175676, 0.5, 0) is dynamic stable equilibrium solution under the penalty mechanism game system. Combining the SD simulation results, it can be seen that compared with the static game system, the dynamic penalty mechanism can effectively suppress the strategic fluctuations of the three-party game subjects, prolong the simulation time, and each subject will converge to the corresponding stable point.
Under this mechanism, although the three parties in the game have stable evolutionary strategies, chemical enterprises still have the possibility of illegal production. The differences in the initial strategies of the three parties in the game will only affect the convergence speed, but will not affect the convergence results. It is clear that the dynamic adjustment of the penalty variable has a positive effect on the overall stability of the game system, but its effect in controlling the selection of chemical enterprises’ compliance and safety production strategies is not obvious.
Analysis of Dynamic Punishment-Reward Strategy
The previous parts analyzed the evolutionary results under the dynamic punishment mechanism. This section further improves on this mechanism and explores the impact of the dynamic punishment-reward dual policy on the behavioral strategies of chemical companies. First, adjust the dynamic penalty variable C*. Government penalties for enterprises are not only related to the probability of production violations by enterprises, but also to 2 factors: the probability of government regulation, and the extra revenue gained by the enterprise for production violations. The hypothesis exists:
Second, introduce a dynamic incentive mechanism. Among the existing results, most scholars believe that there is a linear correlation between government incentives and enterprises’ willingness to produce according to regulations, but You et al. (2020) pointed out that, unlike the punishment mechanism, excessive government incentives are not conducive to promoting enterprises’ production according to regulations, which may lead to enterprises not having enough motivation and consciousness in safety production. Therefore, the incentive mechanism should be set up reasonably taking into account the actual situation. When the incentive effect is produced after the policy is implemented and the expected goals are achieved, it should be appropriately weakened. To sum up, we put forward the hypothesis that there is a parabolic relationship between the incentives for enterprises to comply W1 and the probability of enterprises follow the regulations q. At this point, dynamic reward variables can be introduced:
Among them a2, a3 are parameter variables. The optimal parameters a2 = 1.85, a3 = 0.405 are obtained after the test adjustment, in which the enterprises tend to choose the compliance production strategy the fastest in this scenario. The adjusted SD model is shown in Figure 6.

SD model under dynamic punishment-reward strategy.
It is assumed that the initial probabilities of the tripartite subjects of the government, the chemical enterprises, and the employees are: a: (p, q, z) = (0.5, 0.5, 0.5), and b: (p, q, z) = (0.6, 0.4, 0.2), respectively. The end time of the model simulation is 120 and the convergence results are shown in Figure 7a and b.

(a) Evolution result under dynamic punishment-reward strategy (p, q, z) = (0.5, 0.5, 0.5). (b) Evolution result under dynamic punishment-reward strategy (p, q, z) = (0.6, 0.4, 0.2).
It can be seen from Figure 10 that under the dynamic punishment-reward mechanism, the behavioral evolution strategies of the three parties finally converged to E1=(0, 1, 0)T, forming an ideal and stable evolution strategy. The fluctuations of chemical enterprises were effectively suppressed, and in Choosing regulated production as its optimal strategy choice, it stabilized after the 9th day.
Replace C and W1 in equation (13) with
From the previous assumptions and formula (18), we can get 9 equilibrium solutions x12∼x92 under the dynamic penalty mechanism. Similarly, it can be seen that x12∼x82 are not stable equilibrium solutions. Duplicate the system of dynamic equations within H2(p), H2(q) contains p in the denominator with p≠ 0, at which point the equilibrium solution x92 = (f, 1, 0)T (f is a placeholder in place of p, and lim f→ 0). The Jacobian matrix corresponding to H2(x) is shown in formula (19). The obtained eigenvalues are λ1 = -1.15 + 2.105f, λ2 = -0.5 - 1.255f, λ3 = -0.2, all less than 0, indicating that x92 = (f, 1, 0)T (limf→ 0) is the stable equilibrium solution of this game system, which is consistent with the simulation results of the SD model.
In the evolution process of forming a stable strategy E1 = (0, 1, 0) T, the relationship between the government’s reward W* for chemical enterprises for compliance production and the probability q of chemical enterprises choosing a compliance production strategy is shown in Figure 8. W increases with q. When the probability that enterprises choose to follow the regulation is high enough (at this point q is 0.925), the government will choose to gradually reduce the incentives, in line with the reality.

The relationship diagram between W1* and q.
Sensitivity Analysis
Under the linear dynamic punishment strategy and nonlinear dynamic punishment-reward strategy, each participant can converge to a steady state. In order to further analyze the sensitivity and effect of the game subject under different factors, this study simulates the changes of each participant under different rewards and punishments by adjusting the values of parameters (L. Yuan et al., 2024 ).
Sensitivity analysis of dynamic punishment strategy
Considering that the evolution results under the linear dynamic punishment strategy are highly correlated with the parameter C, therefore, on the basis of the original parameter setting, C is increased by 10% and 20% respectively to examine the impact of the change of punishment strength on the strategy selection of each participant, as shown in Figure 9(a), (b), (c), (the initial strategies of the participants are all 0.5). From the simulation results, it can be seen that with the increase of C, the volatility of the game tripartite curve in the early stage are reduced, and the convergence speed will be accelerated, the sensitivity of this parameter is larger, but the adjustment effect is limited.

(a) Impact of parameter C on firm strategy choice (dynamic punishment strategy). (b) Impact of parameter C on government strategy choice (dynamic punishment strategy). (c) Impact of parameter C on employee strategy choice (dynamic punishment strategy).
Sensitivity Analysis of Dynamic Punishment-Reward Strategy
Under the nonlinear dynamic punishment-reward strategy, the core parameters affecting the convergence result include C, ΔQ and W1, and the values of the core parameters are increased by 10% and 20% based on the original level, which together with the original values constitute three sets of simulation scenarios for sensitivity analysis. The results show that the system converges stably to the ideal state regardless of the changes in the core parameters. The evolutionary trajectories of enterprise compliance probability q all converge rapidly to the fully compliant state with minimal changes in the convergence speed and path, verifying the adaptability and stability of the nonlinear incentive mechanism to uncertain environments (Figure 10a, b and c).

(a) Effect of parameter C on enterprise strategy choice (nonlinear dynamic punishment-reward strategy). (b) Effect of parameter ΔQ on enterprise strategy choice (nonlinear dynamic punishment-reward strategy). (c) Effect of parameter W1 on enterprise strategy choice (nonlinear dynamic punishment-reward strategy).
Additionally, this study examines the effects of varying the constant variables (a1, a2, a3). As shown in Figures 11a, b, and c, altering the constant parameter values affects convergence speed but does not change the convergence outcome. Compared to the altered constant values, the original parameter settings exhibit faster convergence rates. This further validates the scientific and rational nature of the parameter configuration and provides a stable reference framework for sensitivity analyses of other parameters.

(a) Effect of parameter a1 on enterprise strategy selection. (b) Effect of parameter a2 on enterprise strategy selection. (c) Effect of parameter a3 on enterprise strategy selection.
Discussions
Different reward and punishment strategies have significant differences in the behavioral strategies of each participant. Under a completely static reward and punishment mechanism, 10 equilibrium solutions of the government-chemical enterprise-employee tripartite game system are obtained, but there is no evolutionary stable strategy. In this scenario, regardless of the initial strategies of the parties, the final state of evolution is highly susceptible to small perturbations that break the equilibrium, showing the ineffectiveness of the fixed punishment-reward mechanism, and the chemical enterprise’s motivation for choosing safe production is not mobilized, which is more common in actual production. This result may be due to the chemical enterprises’ fluke mentality and negative willingness to comply with production regulations in order to maintain economic benefits; As the government is a regulator, it is difficult to control unsafe production behaviors of enterprises with a single reward and punishment method, resulting in inefficient regulation; As daily production participants, whether employees participate in supervision depends on the initial policy value. However, limited by the constraints and management of the enterprise, employees are not willing to proactively conduct safety production supervision in practice. It can be seen that the static mechanism is not conducive to promoting chemical enterprises to choose production according to regulations, nor is it conducive to improving the efficiency of government law enforcement.
Furthermore, the introduction of a dynamic penalty mechanism can control the fluctuations in the strategic behavior of the three-party game subjects to a certain extent. However, the chemical enterprises still have a certain probability of choosing illegal production. The analysis shows that with the increase in penalties, chemical enterprises will inevitably increase their original production costs and reduce expected profits, resulting in some enterprises choosing to produce in violation of regulations; only if the penalties are higher than the cost of safe production according to regulations, they will be forced to carry out safety rectification. In addition, SD simulation results show that the government’s evolutionary strategy fluctuates in the early stage and stabilizes at a lower value in the later stage, indicating that as punishment increases, the government will reduce its willingness to regulate. Therefore, the implementation of dynamic punishment mechanisms by government departments can inhibit some illegal production behaviors of chemical enterprises, but it cannot promote larger-scale regulation.
In addition, there is an unstable point x2 in the initial safety supervision evolutionary game system. In this situation, we find that the strategies of the participants in the game system will always be affected by the strategies of other subjects, and the overall system is unstable. This situation can be explained by dilemma intensity in social dilemmas. Specifically, dilemma intensity represents the degree of conflict of interest and difficulty of cooperation among participants. Arefin et al. (2020) proposed the concept of social efficiency deficit, which reflects the difference between the optimal solution of the social group and the expected utility in the evolutionary equilibrium. It generally explains the intensity of the dilemma by quantifying the ability of social progress. The smaller the social efficiency deficit, the greater the intensity of the dilemma. The ideal solution believed in this article is that government departments choose weak regulation, employees choose weak supervision, and chemical enterprises choose production according to regulations. Since the difference between the equilibrium point x2 and the optimal solution of the social group is very small, the social efficiency deficit is small. At this time, the benefits of breaking the evolutionary balance to obtain the optimal solution of the group are very small, and the intensity of the dilemma is relatively large. We found that when the dynamic punishment-reward strategy is optimized, there is a stable ideal equilibrium solution for the three-party game system of the government, chemical enterprises, and employees, which converges to point E1 = (0, 1, 0). Regardless of the initial strategies chosen by all parties, chemical enterprises will eventually choose compliance production as their evolutionary strategy. This mechanism adds a penalty variable and optimizes the static reward in the previous article to a quadratic variable related to q. At this time, the three game entities can quickly evolve to the ideal state, and the nonlinear incentive function conforms to the marginal incentive principle that “excessive incentives weaken the endogenous motivation”. The intensity of the penalty increases nonlinearly with the probability of violation, which also reflects the law enforcement idea of “major violations must be severely punished” in real supervision. In addition, the results of sensitivity analysis can also prove that safe production needs to return to corporate responsibility, S. Li et al. (2025) also reached a similar conclusion. In order to supervise and reduce illegal production behaviors, it is feasible for the government to use rewards and subsidies to stimulate the enthusiasm of enterprises to comply with regulations. However, when the policy is effective enough, fiscal pressure should be considered, and the reward expenditure should be gradually weakened to stimulate the awareness of enterprises to comply with regulations.
Conclusions
Safety regulation of chemical enterprises plays an important role in reducing the incidence of accidents and optimizing process processes. By constructing a three-party game system between the government, chemical enterprises, and employees, we explore the paths for each participant to achieve a stable equilibrium state under different reward and punishment mechanisms, and combine the SD model to simulate the evolution process under different reward and punishment mechanisms, revealing the complexity of chemical enterprise safety risk supervision. nature and the connection between various stakeholders, so as to promote more effective government supervision and more active corporate production strategy selection in compliance with regulations. The main conclusions of this article are as follows:
(1) Under the static reward and punishment mechanism, there is no stable equilibrium solution for the government, chemical enterprises, and employees. The game system will always fluctuate with small changes in the initial strategy, and the static reward and punishment scheme cannot be reasonably adjusted according to the performance of each subject, which reflects the negative attitude towards safety production of chemical enterprises and an important reason for the low management efficiency of government management departments.
(2) Under the linear dynamic punishment mechanism, the degree of corporate illegal production behavior is linearly related to the intensity of punishment, which can promote the stability of the game system and form a stable equilibrium solution, but the strategy selection is still not ideal. The evolution results of the government, chemical enterprises, and employees are not affected by the initial probability, which will only affect the time required for the system to evolve to a stable state.
(3) Under the optimization scheme that combines quadratic dynamic rewards and nonlinear dynamic penalties, the game system is stable and improving, and all parties in the game have stable ideal solutions. That is to say, if the government regulatory authorities do not choose a strong supervision strategy, chemical enterprises will independently produce according to regulations and reduce the risk of safety accidents. In addition, economic benefits drive chemical enterprises to make diversified strategic choices. The goal of maximizing profits may be incompatible with their strict performance of safety production responsibilities. The ideal evolutionary strategy can only emerge with external intervention methods such as reasonable rewards and punishments.
Overall, the theoretical contribution of this paper lies in introducing a dynamic nonlinear incentive and penalty function, which effectively addresses issues such as poor stability and limited mechanisms in traditional multi-agent evolutionary game models. This provides a new theoretical paradigm for dynamic regulation in high-risk industries. In terms of practical value, the proposed nonlinear reward-punishment mechanism demonstrates the potential to reduce regulatory costs while incentivizing enterprises toward “self-compliance”. This governance approach can be extended to multiple public management domains such as healthcare and environmental protection. However, this study still has some limitations. First, the bifurcation analysis and global sensitivity analysis of the model are not sufficiently in-depth. In the future, integrating relevant theories and methods could enhance the interpretation of critical points for system mutations and strengthen the robustness of conclusions. Second, parameter selection is based on the specific industry context of China. Future cross-national case studies could improve the model’s accuracy and practicality. Addressing these issues will further bridge the gap between theoretical research and management applications, providing more reliable decision-making support for achieving efficient social governance.
Footnotes
Acknowledgements
This work was supported by the Philosophy and Social Science Research Projects of Anhui Colleges and Universities (2022AH010054) and National Social Science Foundation (22ZDA112). We would like to thank all participants in this study.
Ethical Considerations
In accordance with Chinese laws, regulations, and institutional requirements, this study was exempt from formal approval by an Institutional Review Board (IRB), as it consisted of expert consultations in the social sciences and did not involve human clinical trials or animal experiments. The interview protocol was internally reviewed by the research team prior to data collection to ensure objectivity and the avoidance of sensitive topics. All participants provided informed consent, and strict measures were in place to guarantee their anonymity, the confidentiality of their data, and the voluntary nature of their participation. Thus, this study fully complied with the ethical standards for academic research throughout all its stages.
Author Contributions
Conceptualization, Yue Xu; Methodology, Li Yang; Software, Li Yang; Validation, Li Yang; Formal Analysis, Yue Xu; Investigation, Yue Xu; Resources, Junqi Zhu and Li Yang; Data Curation, Yue Xu; Writing – Original Draft Preparation, Yue Xu and Li Yang; Writing – Review & Editing, Yue Xu; Visualization, Yue Xu; Supervision, Yue Xu; Project Administration, Junqi Zhu and Li Yang; Funding Acquisition, Li Yang. All authors reviewed the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Philosophy and Social Science Research Projects of Anhui Colleges and Universities (2022AH010054) and National Social Science Foundation (22ZDA112).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
