Abstract
Assigning inspection trains to monitor track quality is a standard procedure for maintaining railway system safety. The main challenges lie in lacking time and resources to perform the inspections because of the increasing traffic nowadays. To overcome these challenges, many consider adopting the on-board monitoring (OBM) technique for performing the inspections. This technique assigns commercial trains, instead of traditional track recording vehicles (TRVs), to monitor the track status, allowing railway operators to perform more inspections without affecting the traffic and using expensive inspection trains as well. However, compared with TRV data, the new OBM data are of lower data quality and have fewer features, although they can be recorded more frequently. Therefore, new methods should be developed for effectively applying the new data. This study develops four models, namely the linear regression model, Markov model, ordinary Kriging model, and Kalman filter model, for predicting the track status based on the OBM data. Data collected from the Switzerland railway network are used for verifying the models. Results show that the proposed models can effectively predict the degradation of the track status in different ways and, therefore, assist railway operators in scheduling maintenance tasks.
Assigning track recording vehicles (TRVs) to monitor track quality is a standard procedure for maintaining railway system safety. Railway operators usually schedule TRVs to monitor the railway network and use the recorded data for future prediction (
To deal with these problems, railway operators are investing in new methods for monitoring the track status of the railway network. One possible way is to apply on-board monitoring (OBM) techniques for monitoring the track status (
Despite the advantages mentioned above, the new data collected based on the OBM technique still has several problems, including relatively higher sensor errors and fewer features recorded, compared to traditional data collected by TRVs. Therefore, it is crucial to preprocess the raw data before applying the OBM data for prediction. Moreover, the past regulations (e.g., the tamping threshold designed based on the track quality index [TQI]) and methods (e.g., the linear regression [LR] model and the Markov model) used for quantifying and predicting the future track status are designed for using accurate and periodic measurements. Therefore, railway operators must compare the different performances between traditional TRV data and the newly collected OBM data when applying the common methods they used before.
This study reviews various track status degradation models and proposes several models (e.g., the LR model, Markov process model, ordinary Kriging model, and Kalman filter [KF] model) for examination and comparison, to assist railway operators in using OBM data for predicting the future track status. We first review multiple track status degradation models, then introduce the proposed four models, followed by case examples to validate the model, a discussion, and conclusions.
Review of Track Status Degradation Models and the Track Quality Index
The prediction of the future track status can be classified into two steps. The first is to define and apply the method for quantifying the track status. Next is to develop and use the track status degradation models based on the selected quantifying method.
Track Quality Index
A standard method to quantify the track status is the TQI. The TQI is defined as a single parameter or a set of parameters of engineering quality value converted from the raw geometric data (
Track Geometry Degradation Models
Track geometry degradation models have been studied by many experts. They can be classified into three categories based on the applied method, namely the mechanistic approach, statistical approach, and artificial intelligence approach. The mechanistic approach applies the mechanical properties of rail structures to predict the future track status. The literature within this category usually proposes models formulated based on influencing factors such as forces and stresses. For example, Sato (
The statistical approach mainly applies the influencing factors from observation data (traffic density, monitoring track geometry data, and so forth) for simulating the track status degradation in real life. For instance, Guler et al. (
Finally, the artificial intelligence approach applies methods such as artificial neural networks and neuro-fuzzy models. The literature within this category is usually known for its higher prediction accuracy. For example, Karimpour et al. (
Several conclusions can be drawn from the literature review. Firstly, we found that the previous literature usually applied data from TRVs. Most of them seldom applied the new OBM data for prediction. Compared with traditional inspection data, the OBM data have higher data collecting frequencies (periodic or non-periodic) and sensor errors. Therefore, there is a need to develop models for applying such data. Secondly, among all the reviewed methods for predicting track geometry degradation, the LR method is the most commonly used in practice because of its simplicity. The second most common is the Markov model, which is also simple, interpretable, and considers the uncertainty of the degradation rate. However, the past literature about these models seldom considers the effect of the low quality and non-periodic inspection period of the raw data, which results in a higher prediction error. Therefore, an analysis should be performed to quantify these data features. Thirdly, although the Kriging method has been used in the literature in spatial prediction problems, applications on railways are scarce. For example, this method has been applied to other topics, such as weather forecasts (
Model Formulation
From the literature review, many models have been proposed for predicting the track status, while a few of them are developed based on OBM data. In this study, we propose four models, two commonly used ones and two new ones, with the aim of understanding the difference between applying OBM data and TRV data. Notice that for all of the proposed models, we assume that the TQI will only be affected by the traffic density measured as MGT. Moreover, our target is to predict the track degradation process within two major maintenance tasks. In other words, we do not model the recovery effect of the maintenance.
Linear Regression Model
The LR method assumes that the relationship between a set of independent or dependent variables and scalar values has a simple linear or affine form. In this study, we mainly focus on analyzing the relation between the TQI value and the MGT, where the MGT is the independent variable (Equation 1):
In Equation 1,
Markov Model
The Markov model is a common discrete statistical approach for predicting future track status with the consideration of uncertainty. It works by considering the quality (e.g., TQI) of the infrastructure as a state. For instance, between a lower and upper threshold for the TQI considered, the degradation will result in a change of state. The primary assumption is that the probability of going from one state to another only depends on the current state, which is known as the memoryless property. In other words, we assume the degradation process follows the Markov property (Equation 3):
where
where
Ordinary Kriging Model
The Kriging model is an interpolation method that weights the surrounding inspection point values to derive a prediction for an unmeasured location (Equation 5):
where
To find
where
In addition, the Kriging method can only be applied when the stationary assumption (Equation 7) is fulfilled. The primary idea for the stationary assumption is that all the values collected at the inspection points follow the same distribution:
Therefore, the correlation between different inspection points can be simply demonstrated as the function of distance (time/MGT). Notice that in this study, we assume that the TQI value of a unit track corridor (100 m long according to practical usage) will follow the same distribution at a different time/cumulative MGT.
Kalman Filter Model
The KF model (
where
Based on the model and the practical noisy data
The result is optimal when the Kalman gain matrix
Prediction Accuracy
This study applied the root mean square error (RMSE) to compare the performance of the models’ prediction results (Equation 17):
where
Case Studies
From the literature review, multiple track status degradation models are proposed, while they were rarely applied with low-quality, high-frequency, and irregularly sampled OBM data. Therefore, we perform two case studies to understand different models’ performances on hypothetical and real OBM datasets.
Case I: Model Analysis Based on Hypothetical Data
For Case I, we compare the LR, Markov, Kriging, and KF models based on hypothetical linear degraded TQI data (40 samples, following
Dataset A (linear distributed sample): sample frequency one per timestep.
Dataset B (frequency reduced sample): sample frequency one per four timesteps.
Dataset C (unbalanced sample I): where 35 samples are recorded below the threshold and five are recorded above the threshold.
Dataset D (unbalanced sample II): where 20 samples are recorded within time intervals 0–10 and 20 samples are recorded within time intervals 30–40.
Dataset E (unbalanced sample III): where all samples are recorded within time intervals 10–30.
Dataset F (uniform sample with high data error): where samples correspond to the same samples from Dataset A plus an error following
Dataset G (uniform reduced samples with high data error): where samples correspond to the same samples from Dataset A plus an error following
The first case is meant to analyze the characteristics of the proposed models considering the data frequency, data error, and data collection pattern, which are the main difficulties faced when dealing with OBM data. Notice that the Markov model applied here only has two states. As for Case II, we fit the proposed models based on the practical OBM data and TRV data to compare the models’ prediction results. Both datasets are collected from four track segments (100 m each), which have different degradation rates (T1–T4, high to low), in the western region of Switzerland. The target track geometry for the models is the longitudinal level, while the target TQI is the standard deviation of longitudinal level over 100 m. The timestep
Notice that regularly sampled data is required for applying the KF model. Therefore, in Cases I and II, we preprocess the data by first dividing the time region based on the applied timesteps and then assigning the original value to the nearest timesteps.
The results for Case I are summarized in Figure 1. In Figure 1, the

Results of the analysis based on hypothetical data. (Color online only.)
On the other hand, for the KF model’s result, we first find out that it can also predict the degradation trend with a relatively small amount of prediction error when applying datasets A and B. However, the prediction error significantly increases when no observations (monitoring points) are used. Secondly, by comparing the results generated from datasets A and B, we find that the prediction error does not significantly increase (0.03%) when the number of samples decreases from 40 to 10. Moreover, by comparing the prediction error generated from fitting datasets A–E, we find that the prediction error will be small and similar when there are observations available for update. Thirdly, looking through the results generated using datasets C–E, we find that the prediction error will vastly increase when fewer samples are recorded. Fourthly, from the result obtained from dataset F, we find that the KF model’s prediction result only slightly smooths out the curve compared with the fitted raw data. This indicates that the model’s ability to remove data errors when predicting is limited.
As for the Markov model, we first find that it predicts well when applying datasets A, B, E, and F with accuracies (number of correct predictions over the total number of predictions) of 100%, 80%, 80%, and 100%, respectively. However, it does not perform well when dealing with unbalanced samples (both datasets C and D have an accuracy of only 40%). Secondly, comparing the prediction accuracy of datasets A and B, we find that the model’s prediction accuracy decreases when reducing samples from 40 to 10. Thirdly, based on the prediction result generated by dataset C, we find that the models tend to underestimate the degradation rate when more samples are recorded in state 1 (under the threshold). Fourthly, we find that the model’s prediction accuracy is still relatively high when dealing with high sensor error dataset F. This indicates that the Markov model can withstand a certain level of low-quality data.
Last but not least, all models, except the LR model, perform poorly when no monitoring data is available in dataset D. In addition, the results obtained from datasets A, B, F, and G indicate that even with a high data error, the prediction accuracy for all models, except the Markov model, increases when the number of samples increases.
Case II: Model Analysis Based on Practical TRV and OBM Data
In Case II, we apply the practical OBM data (collected in 2019, 201 data points) and TRV data (collected from 2017 to 2019, 21 data points) to compare the models’ prediction accuracy. Firstly, we selected the data recorded before tamping (which occurred in September 2019) for fitting the models. Later on, we used the TRV data collected after tamping as the ground truth for prediction. Notice that the time period for collecting the TRV data is longer than that for collecting the OBM data to increase the number of data points for fitting the models.
In addition, for the Markov model in Case II, our target is to predict the state transfer between two inspection points where no maintenance occurs. The states are defined based on a practical regulation (EN 13848-5, 2008). The proposed Markov model contains four states, including State 1, where TQI is less than 50% of the IL threshold; State 2, where TQI is larger than 50% of the IL threshold but less than 100% of the IL threshold; State 3, where TQI is larger than the IL threshold but less than the IAL threshold; and State 4, where TQI is larger than the IAL threshold. The model’s structure is demonstrated in Figure 2, where a circle indicates a state while an arc indicates the transfer of the states.

The structure of the Markov prediction model.
In addition, this study applied the RMSE to quantify the models’ prediction accuracy. For the Markov model, assume that we applied timestep
The timesteps
The results of Case II are summarized in Figure 3 and Table 1. In Figure 3, a–c , the red color lines indicate the results using TRV data, while the blue colors indicate the results using OBM data. The dotted lines are the 95% confidence region, and the continuous lines are the prediction results. The black dots indicate the data used for fitting the models. As for Figure 3d, the color represents the probability of the analyzed track segment staying in certain states. This includes blue indicating staying in State 1; orange indicating staying in State 2; green indicating staying in State 3; and red indicating staying in State 4. In Figure 3e, we summarize the raw data used for fitting the models, in which TRV data is red and OBM data is blue.
The Corresponding Root Mean Square Error Values for Different Models Predicting Future Track Status at Different Tracks

Models’ fitting results based on practical data for each model at one of the analyzed track segments: (
Several conclusions can be observed from Figure 3. Firstly, we find out that OBM data’s variance and frequency are both larger in comparison to those of the TRV data. Secondly, despite the OBM dataset’s high variance, the predicted degradation rate is similar when applying the KF, the Kriging, and the LR models. Thirdly, the KF model, Markov model, and LR model all proposed a similar linear prediction model when monitoring data are available. However, when the monitoring points (fitting using the TRV dataset) are not available, they perform quite differently. From the results in Figure 3, the Kriging model predicts that the TQI will recover with the prediction error being increased. The LR model predicts that the TQI will increase with the same degradation rate, while the prediction error only slightly increases. The KF model predicts that both the TQI value and the related prediction error increased significantly. Fourthly, the Markov model predicted that the TQI would exceed half of the IL after 280 days, which is true since all recorded data recorded after 280 days stay in State 1. However, it also predicts that the TQI will be in State 2 forever, mainly because no data are collected in States 3 and 4. Notice that the Markov models developed based on OBM data are not shown in Figure 3. This is because the model is not formulated because of the lack of monitoring data in State 1.
After formulating the models, we applied the TRV dataset recorded after tamping as the ground truth to estimate the prediction accuracy (RMSE) of the models. The models are translated to the first monitoring point, which is set as the origin. The results are summarized in Table 1. The average prediction accuracies for the TRV dataset from high to low are Kriging > LR > Markov > KF, while for the OBM dataset they are Markov > LR > Kriging > KF. Moreover, the Kriging model tends to perform slightly better than the LR model when dealing with tracks with higher degradation rates (T1, T2), while the KF model sometimes performs better than the LR model when applying OBM data (T1, T3). For the Kriging model, we observe that it performs slightly better when applying TRV data. As for the KF model, it performs better when applying the OBM dataset. The Markov model’s RMSE values vary from case to case, which indicates the model is unstable. Moreover, if we apply the accuracy used in Case I, the results are (62.5%, 100%, 100%, 100%), which is quite good on average. The LR model performance for both datasets is quite similar.
Discussion
In this research, we aim to assist railway operators in using OBM data to predict future track status. The literature review discusses multiple track status degradation models previously applied and their characteristics related to data usage. Four models are later proposed for examination and comparison. The goals are to understand the models’ characteristics when dealing with irregular sampling, sampling frequency, and data error and, moreover, to understand whether commonly used models (LR and Markov) and the newly proposed models (Kriging and KF) can be applied to TRV data and OBM data or not.
For the LR model, which is the most commonly used track degradation model, the resulting analysis shows that it can be easily applied to irregular sampling. The model can also be applied to high-frequency data, where the prediction accuracy may increase as the monitoring frequency increases. When dealing with data with larger errors, the models tend to generate results with a higher RMSE, while the RMSE value is still lower than other proposed models. The major limitation of the LR model is that it is unable to predict nonlinear degradation rates. As stated in Table 1, a track with higher degradation rates tends to degrade nonlinearly, which results in the model having a higher RMSE compared to the Kriging model’s RMSE at tracks T1 and T2. When applying models to the OBM and TRV data, we find that the RMSE values are similar, while the RMSE value from OBM data is slightly larger than that from TRV data. This indicates that the model can be applied using both datasets with similar prediction accuracy.
As for the Markov model, the result shows that it is strongly affected by irregular and unbalanced sampling. This is because the formulation of the cumulative distribution function (CDF) is closely related to the timestep that is used for prediction. When the prediction timestep used for prediction is less than the minimum interval between recorded monitoring points, those samples will be neglected when formulating the cumulative distribution. This eventually results in reducing the accuracy of the prediction. One can improve this process by fitting the discrete CDF with continuous CDF. However, the unbalanced recorded samples will make the proposed CDF biased. For example, in the result obtained in dataset C, most of the samples are collected within State 1 (under the threshold). This eventually results in overestimating the probability of samples staying in State 1 for each timestep and underestimating the degradation process. This type of problem has not been widely discussed by the literature that applied the Markov model for prediction, because the previous studies usually applied data recorded by inspection trains. This type of data is usually periodically recorded two to three times per year, which enables one to assume that all data are recorded uniformly. However, for OBM data, this is usually not the case. In addition, the prediction accuracy of the Markov model is not sensitive to the monitoring frequency and the data error compared to other models. In fact, the number of samples distributed in each state, whether they are distributed uniformly or not, and the size of the states are the factors that significantly affect the model’s prediction result. To sum up, we suggest applying this model with datasets that are periodically collected. The model can also be applied to low-quality data since it can withstand a certain amount of data error. As for determining the state setting of the model, one should consider the distribution of the monitoring points before formulating the model.
The Kriging model performs similarly to the LR models. It can be applied using irregularly and unbalanced samples, while the prediction error significantly increases if no monitoring point is recorded with a long time interval. Unlike the LR models, the Kriging model can better demonstrate nonlinear degradation. It can also smooth the prediction curve, removing a certain amount of data error. However, the Kriging model can be easily affected by extreme values (high data error) since it firmly believes in the monitoring point values. The prediction will become even more biased if no samples are available. Take the results obtained from datasets F and G as an example. The prediction accuracy reduces significantly when samples are removed. Since the number of samples would significantly affect the result for the Kriging model and the high-frequency data can reduce the prediction error, we suggest applying this model using OBM data.
The KF model is commonly applied by using periodic monitoring data since it predicts and updates its prediction by uniform timesteps. In this study, we preprocess the distance between each applied data into multiple timesteps to create the periodically sampled data. The timestep is set small enough to not significantly affect the data. However, the small timestep will result in the model predicting multiple times without an update from monitoring points, which significantly increases the prediction error. Therefore, we strongly suggest applying periodic monitoring data when applying this model. The KF model can predict the linear and nonlinear degradation processes with relatively small prediction errors by receiving a periodic monitoring dataset. On the other hand, the KF model, similar to the Kriging model, is strongly affected by the frequency of sampling. The prediction error will significantly increase when no data points are updated. Therefore, we suggest applying this model using OBM data.
As a result, all four proposed models can be applied by using OBM data. The LR model can provide the most stable and accurate prediction result when the track is assumed to be degraded linearly. When dealing with a heavily loaded or old track, in which the degradation process tends to perform nonlinearly, the Kriging and the KF models can better predict the result. In addition, the KF model performs better when the data is sampled periodically, while both of them require a high-frequency dataset for better prediction. As for the Markov model, it can apply data with lower frequency and data quality. It can also demonstrate a nonlinear degradation rate. The major limitation of the Markov model is that it is strongly affected by the sampling distribution, which may require filters being applied before applying OBM datasets
Conclusion
This study reviews various track status degradation models and proposes several models for examination and comparison to understand the models’ characteristics and performance when dealing with different datasets. The goals are to assist railway operators in using OBM data for predicting future track status. By understanding the degradation models’ characteristics, railway operators can efficiently predict the future track status, which is useful in scheduling maintenance and inspections. The ultimate goal is to attain a lower life cycle cost at the system level while safety and availability are guaranteed.
The literature review shows the application of OBM data for prediction has received more and more attention in academia so far. We argue that the OBM data have higher data collecting frequencies and sensor errors compared to traditionally used TRV data, which requires new models for application. The commonly used model performance on the new dataset should also be considered and analyzed.
The study proposes four models (i.e., the LR model, Markov process model, ordinary Kriging model, and KF model) for predicting future track status based on the application of hypothetical, TRV, and OBM data. The results show that the LR models can withstand irregular sampling and provide the most stable and accurate prediction result when the track status degrades linearly. The Kriging and KF models can better predict the result when the track status degrades nonlinearly, while the prediction accuracy would increase when more samples are available. The Markov and KF models are more likely to be affected by irregular sampling because of their model formulations, whereas the Markov model’s prediction accuracy would be strongly affected by the sample distribution rather than sample frequency.
To sum up, we suggest railway operators apply the LR model for applying OBM data and specifically apply the Kalman or the Kriging model at places (highly loaded or old track) that are most likely to degrade nonlinearly. The Markov model can be applied when the data quality is relatively low, while the state setting of the model should further consider the sampling instances. The analysis of the characteristics of different possible machine learning approaches (for example, the difference between the
Footnotes
Author Contributions
The authors confirm their contribution to the paper as follows: study conception and design: T.H. Yan, M.A. Costa, F. Corman; data collection: T.H. Yan; analysis and interpretation of results: T.H. Yan, M.A. Costa, F. Corman; draft manuscript preparation: T.H. Yan, M.A. Costa, F. Corman. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant (project OMISM) from the ETH Zurich Mobility Initiative and a Swiss Government Excellence Scholarship (grant No. 2021.0152).
