Abstract
With the rapid development of electric drive technology for new energy vehicles, fault data identification of key components of electric drives has become a crucial issue in improving the stability and safety of electric systems. However, traditional fault data recognition methods have many limitations in dealing with complex and variable operating fault situations. To address this problem, this paper proposes a deep learning model, Vision Transformer Plus (ViT++), based on the self-attention mechanism and combined with data enhancement strategies for fault identification of energy vehicle electric drive system. Accurately identifying fault types is achieved by transforming the electric drive system fault data into an image matrix and performing feature extraction and learning with the help of the ViT model. To validate the effectiveness of the proposed method, we conducted extensive cross-experiments using a large amount of actual electric drive key component fault data and applying a data enhancement strategy. The experimental results show that the fault data recognition method based on the ViT algorithm has higher accuracy and robustness than the traditional convolutional neural network (CNN)-based method. Therefore, the proposed method in this paper is conducive to improving the accuracy and efficiency of fault data identification for key components in electric vehicles, thus playing a critical role in analyzing electric drive system faults.
Keywords
Introduction
With the increasing global concern for environmental protection and energy sustainability, the research and development of new energy vehicles has become an important direction for the global automotive industry. A new energy vehicle is a vehicle that uses a new type of power system, such as an electric motor, battery, fuel cell, etc., instead of the traditional internal combustion engine as the driving power.1,2 The development of new energy vehicles is of great significance for improving environmental quality, promoting energy transformation, and realizing the sustainable development of the automobile industry. By strengthening the research on new energy vehicles, the performance and range of vehicles can be further improved, the manufacturing cost can be reduced, the popularization and application of new energy vehicles can be promoted, and a greener and cleaner way of traveling can be created for human beings. The electric drive system is the most important core component of new energy vehicles, and whether it can efficiently and reliably provide power for new energy vehicles has attracted widespread attention.3,4
The rapid development of electric drive technology for new energy vehicles makes fault data identification of key components of the electric drive system an important challenge to improve the stability and reliability of electric vehicles.5,6 However, traditional fault data identification methods have certain limitations in dealing with complex and variable operating fault situations. This article proposes a novel approach to address these issues, that is, based on the improved Vision Transformer Plus (ViT++) (which introduces a new function, Quiet Attention, an innovative adaptation of the traditional Softmax function) for electric vehicle drive systems fault data enhancement and identification. By leveraging the self-attention mechanism of the ViT++ model, we aim to achieve the ability of the electric drive system to automatically learn and extract key fault features from raw data.7,8 In addition, we introduce data enhancement strategies to enrich fault data features and improve the robustness and generalization of the model.9,10
Our focus is on the potential of the ViT++ algorithm for fault data recognition in electric vehicle drive systems. By converting fault data into an image matrix and utilizing the ViT++ model for feature extraction and learning, we aim to achieve more accurate and efficient fault type recognition. The introduction of data playback and image preprocessing techniques allows random sampling of fault data to be detected, further enhancing the model’s adaptability to different real-world fault situations. To evaluate the effectiveness and superiority of the proposed method, we conducted in-depth experiments using real datasets containing many faults in critical components of electric drives. The experimental results demonstrate the superior performance of the ViT++ algorithm compared to traditional Convolutional Neural Network (CNN)-based methods and provide valuable insights into the in-depth analysis of key features and patterns in fault identification.
The main contributions can be summarized as:
We propose an architecture for fast and intelligent diagnosis of electric drive fault data under multiple operating conditions.
A practical data enhancement algorithm is proposed to increase the robustness of the system for fault identification.
Fixing the flaws in the attention formula in machine learning and applying the improved ViT++ algorithm to electric drive fault data identification.
A sliding window-based data recognition method is proposed to replace the neural network method with long and short-term time memory, which improves the corresponding speed of the system, increases the sensory field of recognition, and strengthens the recognition of fault relations.
The rest of this paper can be organized as follows: Section “Related work” reviews the current research related to electric drive system fault diagnosis. Section “Proposed method” introduces an efficient architecture for data enhancement and data identification and proposes the ViT++ algorithm. Section “Experimental results” shows experimental results. Finally, conclusions are drawn in Section “Conclusions and future work.”
Related work
In recent years, the study of fault data enhancement and intelligent identification methods for electric drive systems of new energy vehicles has attracted widespread attention, mainly focusing on the following aspects:
Research on fault data enhancement methods: introduce existing domestic and foreign research in fault data enhancement methods. This includes data enhancement techniques, such as data augmentation, Generative Adversarial Networks (GANs), Autoencoders, etc., and illustrates their applications in improving the diversity and quantity of fault data samples. 11
Research on Machine Learning and Deep Learning Methods: Introduces the research progress of scholars at home and abroad in applying machine learning and deep learning methods. It covers the effects of traditional machine learning algorithms such as Support Vector Machine (SVM), Decision Tree, Random Forest, etc., as well as deep learning models such as CNN, Recurrent Neural Networks (RNN), and Transformer, etc., in the application of fault data recognition. 12
Research on Fault Diagnosis and Intelligent Recognition Methods: Introducing existing research on methods for fault diagnosis and intelligent recognition of electric drive systems. This includes rule-based expert systems, model-based data methods, model-based physical methods, and hybrid methods, as well as the results of applying these methods to the fault diagnosis of key components of electric vehicles. Describes existing fault datasets and benchmarking systems, discussing the size and quality of these datasets and their applicability for evaluating new algorithms and methods.13,14
Research on Heterogeneous Data Fusion Methods: Research work on heterogeneous data fusion in electric drive system fault data is presented. Heterogeneous data may come from different sensors, devices, or platforms and are important for comprehensive fault diagnosis analysis. Existing data fusion methods, including sensor fusion, feature fusion, and decision fusion, are discussed, as well as the effectiveness of these methods in improving fault identification performance in applications. 15
Dataset Construction for New Energy Vehicles: Discuss the efforts of existing research in constructing a dataset of electric drive system faults for new energy vehicles. The dataset construction methods, data collection strategies, labeling methods, and the size and representativeness of the dataset are discussed, emphasizing the importance of a high-quality dataset for fault identification research. 7
Migration Learning Methods Based on Pre-trained Models: Introducing the application of migration learning methods based on pre-trained models in fault data recognition of electric drive systems. Pre-trained models, such as BERT, GPT, etc., have made significant breakthroughs in natural language processing, and studying their migration learning effects on fault data provides a new research direction. 16
Real-time Fault Diagnosis Methods: To explore the research related to real-time fault diagnosis for electric drive systems of new energy vehicles. Real-time fault diagnosis is crucial for ensuring the safety and reliability of vehicle operation. Introduce existing fault diagnosis methods, including model-based real-time prediction, data flow processing technology, etc., and their application in the vehicle environment. The problem of cross-domain fault diagnosis between different EV brands or models is described. Cross-domain fault diagnosis involves the transfer of data distribution and knowledge migration, a key problem to be solved in practical applications.17,18
By reviewing and synthesizing the above aspects, it can provide a comprehensive understanding in the research of fault data enhancement and intelligent identification methods for electric drive systems for new energy vehicles, and at the same time, provide a theoretical basis and research motivation for the research framework and methodology of this paper. In addition, it also helps to point out the shortcomings of the current research and provides new directions and possibilities for future research. Based on this, this paper aims to solve the problems faced by the fault diagnosis of electric drive systems in new energy vehicles, to ensure timely and accurate identification of electric drive system fault problems, and to provide useful academic and practical references for the innovative development of new energy vehicle fields.
Proposed method
This section mainly consists of two parts: The first part is the explanation of the overall architecture diagram of the fault data intelligent diagnosis system of the electric drive system. The second part introduces the visualization and application of the fault data of the electric drive system and proposes a fault data recognition model based on the ViT++ algorithm, as well as explains the data enhancement method.
System architecture design
This paper focuses on the architecture of an intelligent diagnosis system for electric drive system fault data, as shown in Figure 1. The system architecture includes four main modules: fault components, fault tree, data storage and analysis, and fault diagnosis.

Overall architecture of intelligent diagnosis system for electric drive system fault data.
The fault components are the basic source of research data from the electric drive system components of a certain model of new energy vehicle. From all the faults, nine common fault types are sorted out: Motor Encoder Error (MEE), Battery Over-discharge (BOD), Battery Pack Voltage Fault (BPVF), Motor Gear Scratch (MGS), Motor Startup Speed Abnormal (MSSA), Speed Sensor Fault (SSF), Hub Motor Speed Abnormal (HMSA), Gearbox Gear Wear Fault (GGF), and Bearing Inner Ring Fault (BIRF), which have a significant impact on the safety and stability of the electric drive system, and thus are the focus of this paper. Fault components must be retrofitted with sensors converted to analog-to-digital signals by Analog-to-Digital Converter (ADC) to collect much real fault data. With the increased amount of fault data, this paper elicits the fault tree approach to increase the system’s expandability and facilitate data analysis and storage. A fault tree is a graphical logical analysis method for describing and analyzing the likelihood and impact of system failures. By unfolding the tree structure layer by layer, each node represents a failure event or cause, and logical gates indicate the relationship between events. Fault tree analysis assesses system reliability, identifies the causes of faults and major fault paths, and guides preventive and maintenance measures to improve system safety and reliability. In this paper, the faults are categorized into three main categories: Motor Component Failure (MEE, MGS, MSSA, HMSA), Power Battery Failure (BOD, BPVF), and Other Component Failure (SSF, GGF, BIRF).
The process from electric drive system components to data acquisition and processing to data diagnosis is shown in Figure 1. Firstly, many faulty electric drive system components need to be found. Then the fault data are obtained through the data acquisition system (during which the analog and digital signals must be converted to each other by ADC). Different types of fault data need to be categorized into faults (motor faults, battery pack faults, other components), followed by the need to store this type of data in the big data service platform.19–21
Data analysis and fault diagnosis are the focus of this research, from the characteristics of the data, the type of fault perspective to analyze the cause of the fault, through the fault diagnosis algorithm, can quickly from the real-time massive data to find the new energy vehicle electric drive system faults, and remind the user in time to carry out vehicle maintenance, the research for the development of new energy vehicles to bring an important role.
Proposed ViT ++ algorithm
The ViT algorithm is an image classification model based on the attention mechanism. Although it has achieved significant results on many image tasks, we found that the ViT algorithm showed a large difference in SOTA when applied to the dataset of a public competition versus the data studied in this paper. To address this problem, we improved the ViT algorithm with optimizations in attention mechanism and data enhancement. After the improvement, we successfully applied the proposed algorithm to the fault data diagnosis of the electric drive system and achieved satisfactory results. The optimized algorithm is called ViT++, and the detailed optimization process of the algorithm is described below.
In the Vision Transformer algorithm, the traditional Softmax function is usually used to calculate the attention weights to determine the correlation between different image regions.22,23 However, the Softmax function may produce unstable values and high computational complexity when facing many image regions. To improve this situation, we introduce a new function called “Quiet Attention” to replace the traditional Softmax function and improve the performance and efficiency of the model. The goal of the Quiet Attention function is to reduce the computational complexity and numerical instability while maintaining the original characteristics of the attention computation. Using mathematical techniques and approximations, the Quiet Attention function is designed to be more stable and efficient. It can be adjusted according to specific application scenarios and needs to achieve the best results. By introducing the Quiet Attention function, the Vision Transformer algorithm can better handle many image regions and improve the operational efficiency and performance of the model, resulting in better performance in image recognition and other visual tasks. Such improvement is significant for developing and applying the Vision Transformer algorithm.
All Transformer models, including GPT, LLaMA are affected, and we know that the attention formula in machine learning is like this:
The attention mechanism calculates the similarity scores between the query matrix and the key matrix, then normalizes the scores by the
3. A new function, Quiet Attention, also called
Adding 1 to the denominator allows the vector to be taken as a whole that tends to 0. Otherwise, it will only shrink the value a little, and the normalization process will compensate for the shrinkage. Otherwise, it will only shrink the value a little, and the shrinkage will be compensated for in the normalization process. Compare this with the new, improved
Optimized:
The derivative is positive, so there is always a non-zero gradient, and it sums between 0 and 1, so the output doesn’t get out of control. And the following properties are satisfied.
That is, the relative values in the output vector remain unchanged.
The data used in this paper are special in that the length of the diagnosed data is inconsistent, the possible fault features are random in the time series, the amount of data to be diagnosed is large, and the possible fault components in the electric drive system are also random. Therefore, designing a comprehensive and fast method to process the data and identify the features is necessary, and thus we designed a time series-based equal step fault diagnosis sliding window (T-DSW), as shown in the left half of Figure 2. The idea of T-DSW design is as follows:
The Figure 2 shows the image plotted for a total amount of data of 3 s in time, with nine types of fault states of the data, three sliding windows as a group (Fbox, Mbox, Bbox), five reference quantities on the timeline
The experiments designed in the paper used intensive data acquisition to generate 2D image features, where the image’s dimensions are
Each time we diagnose data, we must simultaneously acquire nine fault state data. From the visualization point of view, we can intercept the fault state data image of the current period from the data area covered by the sliding window Fbox from time

Data Augmentation, graphical visualization of electric drive system state data, sliding windows Fbox, Mbox, Bbox, acquire the electric drive operating state data during a specific period to generate and intercept block images of P × P (32 × 32) size, with a total of nine states (MEE, BOD, BPCVF, MGS, MSSA, SSF, HMSA, GGF, BIRF) stitched into 3 × 3 nine-grid size images and displayed on the real-time fault diagnosis and monitoring interface of the electric drive system. GGF, BIRF into a 3 × 3 nine-grid size image and display it on the real-time fault diagnosis and monitoring interface of the electric drive system.
Based on the above improvement scheme, data enhancement is needed to improve the recognition rate of the model, and the specific design scheme is shown in Figure 3 and can be described in detail as follows:
To conduct multi-model comparison experiments, we divide the input images into two categories (multi-class fault spliced images and single-class fault images), and the size of both input images
Multi-class fault splicing image from the sliding window of the nine classes of fault data in the image, through the image splicing generated by an image data, if the model diagnosis of the image does not have a fault, then we can omit the nine classes of data to input the process of model verification, which greatly saves the system diagnostic resources.
If the results of a multi-class fault show that the image is faulty, the next task involves identifying the single-class fault images of each of the nine classes and determining the faulty electric drive system component.

Fault data recognition model based on ViT++ algorithm.
Above are the key algorithms and the core design methods involved in this paper. In the next section, we will verify the effectiveness of the design scheme through experiments.
Experimental results
The dataset used in this paper is sourced from a precision instrument company in Jiangsu province that specializes in the maintenance and fault handling of new energy vehicles. We have a close collaboration with this company in the field of data analysis and mining algorithms. The objective of this research is to enhance the efficiency of fault diagnosis in the electric drive systems of new energy vehicles. Moreover, the data used in this study is highly competitive within the industry.
In order to validate the performance of the proposed method, Visual Geometry Group-16 (VGG-16), InceptionV3, and Residual Network-101 (ResNet-101) are used for comparison. Among them, VGG-16 is a deep CNN architecture for image classification and object recognition tasks, InceptionV3 is designed to improve the performance of image classification and object detection, and ResNet-101 contains a 101-layer-deep network architecture designed to improve the performance of computer vision tasks such as image classification, object detection, and semantic segmentation. In addition, we used training accuracy, testing accuracy, and loss as experimental evaluation metrics. Training accuracy is the ratio of the number of samples correctly classified by the model on the training data to the total number of samples, testing accuracy indicates the proportion of samples correctly classified by the model on the test dataset, and loss is used to measure the gap or error between the model prediction and the actual target. Our goal is to minimize the value of the loss function and thus improve the performance of the model.
The proposed method transforms the fault diagnosis problem from discrete data to visual image recognition, thus providing clearer insights into the dynamic visualization of data changes. Among the numerous image recognition models, the Transformer model has demonstrated superior recognition capabilities in various domains such as text, speech, and image. We initially considered a backbone of image classification models commonly used in academia and industry, such as VGG, Inception, and ResNet, and compared them to the Transformer model. The Transformer model demonstrated strong performance in our paper, prompting us to choose the ViT neural network algorithm based on the Transformer backbone. After choosing the Transformer model, we observed that its performance varied with different training parameter settings. The performance of these models in various research depends on parameters such as the chosen model structure, dataset, and training parameters. In our parametric experiments using the Transformer model, we found that the input block size of the image directly affects the performance of our research. The ViT++ model supports various patch sizes, including 14 × 14, 16 × 16, and 32 × 32. We evaluated the performance of the ViT++ model under different patch input conditions. To ensure the comparability of the experiments, we set the epoch number to 50 based on our experimental experience. The dataset is divided into training data and test data according to the ratio of 6:4.
In our experiments, we tested the performance of the proposed scheme on a deep learning service, while conducting experimental verification with multiple deep learning image classification models (VGG-16, InceptionV3, ResNet-101, ViT++). Moreover, we exclude the time consumed on the communication channel as it heavily depends on the network traffic. The parameter settings of the experimental environment are shown in Table 1. Specifically, the experiments can be divided into two main parts: the recognition of multi-class fault spliced images and the recognition of single-class fault images, as shown in follows.
Recognition of multi-class fault spliced images with only two types of classification results: faulty or normal. We input the processed multi-class fault spliced image inputs into VGG-16, InceptionV3, ResNet-101, and proposed ViT++ models to verify the recognition effect, respectively.
Configuration parameters of the experimental environment.
From Table 2, it is evident that InceptionV3 performs worse in terms of both training accuracy and testing accuracy compared to the other models. It also exhibits higher loss values. VGG-16 outperforms InceptionV3 slightly in terms of training, testing, and loss, but the overall recognition accuracy remains below 80%. The ResNet-101 model achieves a high training accuracy of 99.1%, but its testing accuracy is only 78.8%, resulting in a large gap between training accuracy and testing accuracy. Additionally, the loss value of the ResNet-101 model is not the minimum in the same column. Analyzing the whole Table 2, we can see that the ViT++ algorithm consistently achieves over 90% accuracy in both training accuracy and testing accuracy. It also exhibits an advantage in terms of loss within the same column. This shows the superiority of the ViT++ model and the accuracy of the model improves as the patch size increases, with the best performance observed at a patch size of
Comparison of recognition results of multi-class fault spliced images with multi-model experimental effects.
From the results of multi-model training accuracy and loss of multi-class fault spliced images in Figure 4, the fastest convergence is the ViT++14/16/32 model, and the accuracy of more than 90% are the ResNet-101 and ViT++ models. Combining accuracy and loss considerations, it is the ViT++ algorithm that achieves the best results.
2. Recognition of single-type fault images is shown in Table 3, and the classification results in nine types: MEE, BOD, BPVF, MGS, MSSA, SSF, HMSA, GGF, and BIRF. We input each type of processed fault image input to VGG-16, InceptionV3, ResNet-101, and ViT++ models respectively to verify the recognition effect. From the experimental results, we can see that the best average recognition rate of the four models for nine categories of fault data recognition is the ViT++ algorithm, and the worst average recognition rate is the VGG-16. Meanwhile, the proposed ViT++ algorithm implements the SOTA effect on each category individually and achieves higher recognition rates as the patch size increases.

Multi-model training accuracy and loss results for multi-class fault spliced images.
Comparison of recognition results of single class fault spliced images with multi-model experimental results.
The above results demonstrate the superior performance of the proposed ViT++ algorithm. It is due to the fact that the proposed ViT++ deep learning architecture, unlike traditional CNNs, takes a completely different approach to processing image data, utilizing the self-attention mechanism to capture global and local relationships in an image, which is more suitable for massive and complex data with multiple feature dimensions in this paper. Specifically, we can make the following analysis:
Traditional image classification methods mainly rely on CNNs. In contrast, the ViT++ algorithm breaks the limitations of traditional CNNs in terms of image size by introducing the Transformer model, which converts image data into sequence data and uses the self-attention mechanism to learn global features. This enables the ViT++ algorithm to handle images of arbitrary size, providing greater flexibility for image classification tasks.
The proposed ViT++ algorithm can learn global features of images and contextual information through the self-attention mechanism instead of being limited to the local sense field. This enables the ViT++ algorithm to perform well in understanding the overall content and contextual relationships of an image and to capture important features in an image better.
The proposed ViT++ algorithm achieves efficient feature extraction through the Transformer model, which achieves accuracy comparable to that of large CNNs with relatively small model parameters. This makes the ViT++ algorithm more efficient under resource constraints and provides new ideas for designing lightweight models.
Conclusions and future work
In this paper, we propose a data enhancement and real-time diagnosis method based on the optimized ViT++ algorithm for fault data identification problems in electric drive systems. We have achieved remarkable results by applying the proposed ViT++ algorithm to fault data diagnosis of the electric drive system. By improving the attention mechanism in the ViT++ algorithm and introducing a data enhancement strategy, we were able to identify faults in electric drive systems more accurately and efficiently, improving the intelligent diagnosis ability of fault data in electric drive systems. However, some aspects still need to be further discussed and improved. Although we have introduced a data enhancement strategy to enrich the characteristics of the fault data, there is still a possibility of overfitting or underfitting problems. More data enhancement methods can be further explored, and experiments can be conducted to verify their effectiveness. Complex and variable working conditions and environments may affect the electric drive system in practical applications. Therefore, we need to consider the robustness of the model to ensure high accuracy even under different conditions.
Footnotes
Handling Editor: Chenhui Liang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Quzhou Science and Technology Key Research Project: Research on key technologies for Intelligent diagnosis and predictive maintenance of faults in new energy vehicle Integrated electric drive systems based on big data (2022K105); Research on intelligent detection methods for electromagnetic interference attacks in industrial IoT (2023K252); Research on intelligent visual networking platform for pump station clusters used in urban sewage lifting (2023K248). Natural Science Foundation of Zhejiang ProvinceMulti-agent based hierarchical cooperative control strategy for torque distribution of distributed drive electric vehicle (LY21E050001).
