Abstract
Bearings are essential elements of rotating machinery and their malfunction may result in considerable operational interruptions and financial detriment. This paper investigates Proximal Policy Optimization (PPO), a reinforcement learning (RL) technique, to formulate data-driven policies for bearing maintenance. A custom OpenAI Gym environment was developed to replicate the decision-making process employing experimental vibration data from normal bearings, as well as bearings with ball, inner-race, and outer-race faults. The RL agent was trained to determine the appropriate maintenance policies such as inspection, repair, and replacement to reduce total costs and prevent breakdowns. In addition, training performance was evaluated using essential measures such as cumulative rewards, loss, KL divergence, and value loss. The experimental findings demonstrate that the PPO agent achieved 94.2% accuracy in decision making in 10 epochs with limited improvement from additional training. Furthermore, the method shows instability in policy updates, value loss, and sensitivity to sparse-reward structure. These findings demonstrate that PPO holds considerable potential for vibration-based CBM; however, its performance in real-world operational environments remains highly dependent on reward design and hyperparameter tuning. This research showcases a balanced evaluation of PPO’s strengths and limitations in bearing maintenance and provides a foundation for future studies on hybrid and alternative reinforcement learning strategies.
Keywords
Get full access to this article
View all access options for this article.
