Abstract
The current measures for railway track fixity in the UK’s railway remain at a relatively low level of granularity. This paper presents a pilot proof-of-concept study on the development of an integrated computing framework for improving the measurement, prediction, and analysis of profile-specific track fixity in the context of the UK’s rail network. The framework is aimed to produce a data integration and mining tool, which can determine track fixity parameters for any given section of track. In this study, we propose to measure track movement based on point cloud data and assess the track fixity by a set of parameters, such as the direction and rate of the track movement relative to the plane of the rail within a certain period. We seek to integrate a data mining algorithm into the framework to predict these parameters, given vast amounts of disparate and heterogeneous data of potential influencing factors in the area. From the study, we have developed a prototype framework, which allows the rapid implementation of data workflows with the necessary functionality. The feasibility of the prototype was demonstrated by training a random forest model on real data from an approximately 80-km section of the East Coast Main Line, southeast of Edinburgh, Scotland. The modeling results indicate that the curvature, cant, and maximum speed of trains are among the key factors that impact on, and are critical for predicting and analyzing, the profile-specific track fixity.
Keywords
Track fixity, which refers to the degree to which the position of a railway track remains unchanged over time, is one of the key measures used to calculate clearances between rolling stock and structures. In the UK’s rail industry, track fixity is typically presented as a simple value of “low,”“medium,” or “high,” with unrestrained ballast being the lowest and slab track being the highest. The measurement of fixity, however, remains at a low level of granularity, and there is a lack of predictive tools that can provide more detailed, constantly updated information about the movement of tracks through an automated process. As Network Rail—the major railway infrastructure manager in the UK—seeks to improve the accuracy and reliability of its gauging assessments, there is an immediate requirement for improved resolution of the track fixity across its railway network. To achieve this would require explicit computation and prediction of the rate and direction of the track movement within a given period, depending on the speed and use of the track (e.g., six months or even a longer term). With such short-, medium-, and long-term calculations, track engineers would be able to better assess the risk of foul clearance developing in the foreseeable future based on changes in the track alignment and historic surveys, thus making better informed decisions about proactive actions against the risk. The assessment results would in turn help identify and verify key factors influencing the track fixity in the area and thus allow more intelligent prioritization of survey campaigns and preventative maintenance activities in the event of resource challenges.
In essence, the possibility of some movement (i.e., shift or displacement) of a railway track directly describes the track fixity. In the study discussed in this paper, accordingly, the track fixity is measured with respect to both the rate and direction of the track movement, given a certain confidence level. More specifically, the parameters of the track fixity examined in this study include the displacements of any rail head in both the horizontal and vertical planes (relative to the plane of the rail) within a certain period.
The track movement can be caused by numerous factors. Apart from known factors, such as track geometry ( 1 ), track subgrade ( 2 , 3 ), track conditions ( 4 ), and train dynamics ( 5 – 7 ), there can be an interplay of various factors, which can potentially have some direct or indirect impact on track movement. To better understand the relative influence of these factors on track movement, and identify any unidentified risk, requires mining a large amount (e.g., terabytes worth) of data and a wide spectrum of data, allied to the elicitation of information from experts. Usually, these data are made available from disparate sources across different rail subsystems, such as engineering structures and rolling stock; many of them, such as LiDAR (light detection and ranging) data, are likely to be clustered with redundant or irrelevant information and some may be unstructured. The diversity and unobserved heterogeneity of the data resources often poses a major obstacle to data integration for meeting the requirements of examining the profile-specific track fixity, given a higher degree of granularity. To overcome these difficulties, there is clearly a critical need to develop an integrated computing framework for facilitating a congruent workflow, involving the following: (a) effective integration of the heterogeneous data into a unified view, followed by (b) implementation of appropriate tools for timely calculation of track fixity and prediction of fixities of new structures. Despite the growing salience of such kind of models (e.g., Damme et al. [ 8 ], Young et al. [ 9 ], Hall [ 10 ], and Smarzaro et al. [ 11 ]), few studies in the literature have investigated this subject.
This paper presents a pilot proof-of-concept study that seeks to (a) design a data pre-processing workflow, which enables the smooth integration and management of a structured corpus of data that is relevant to track fixity and (b) create a data mining tool as a prototype, which can assist track engineers in measuring, predicting, and analyzing the track fixity parameters for any given section of a railway track. In this phase of the framework development, we compile a comprehensive database for a representative section of the railway track in the UK. It is used as the fundamental building block of the computing framework for generating track fixity values, which would serve as a basis for all reference curves used across the whole rail network. On this basis, we develop data-driven models to investigate the relationships between potential influencing factors and track movement. Through this analysis, we identify and verify some key parameters that can serve as predictors of future track movements. Accordingly, the present paper is dedicated to the following:
1) proposing an effective metric and method of calculating track movements using LiDAR point cloud data;
2) developing an integrated data model with a machine learning model (e.g., a random forest [RF] model), which is trained on data of calculated track movements and a set of empirically selected relevant factors that potentially influence the movements; and
3) verifying the identified influencing factors on the track movement.
The remainder of the paper consists of three sections. In the following section, we provide detailed information on the data resources used in the study and propose a method for calculating the rate and direction of track movement. Next, we demonstrate the method with a case-study example on a selected section of the railway track of the East Coast Main Line between Edinburgh and the Scottish border. We present a prototype computing framework for track fixity, which considers a selected set of key factors influencing track movement in the UK context. In the concluding section, we summarize the outcomes of the work, discuss limitations of our study, and suggest possible avenues for further research.
Methodology
To address the challenges outlined above, we gathered a wide spectrum of survey data, including track geometry, tonnage, line speed, ballast age, geology, and many other potential factors (e.g., type of ballast materials, track quality measurements, and lineside vegetation) that may influence track fixity. These factors were empirically identified following an interview with Network Rail’s engineering staff. The disparate data resources are of various data formats, and they could be extracted and made available from different subsystems (e.g., the Civil Asset Register and Reporting System) and data models (e.g., the Corporate Network Model) of Network Rail. They, together with historical track positions and movements, were processed on a uniformed computing environment (i.e., using Python programming language). By doing so, a comprehensive database was created, allowing the analysis, restructuring, integration, and cross-referencing of the data associated with the influencing factors, thus enabling the extraction of spatial-temporal information that is most relevant to track movement. On this basis, we can further construct a new data model specifically for track movement, which incorporates suitable machine learning algorithms to predict gauging issues with a quantifiable confidence level of their risk of occurrence. However, it needs to be pointed out that in this study, not all of the above-named data resources were thoroughly evaluated for their compatibility and potentiality of being integrated into the same data set. Therefore, only a selected set of factors and their data were considered for the development of a protype model.
Data Integration
The data utilized in the study were from the following four different resources.
a) LiDAR data, which contains point cloud data showing the spatial position of every profile-specific railway track, represented as three-dimensional (3D) geographic coordinates, of rail heads—this data was available in LAS/LAZ file format ( 12 ). With data pre-processing, position data aligning with the rail head was made available for every approximately 1-m length. The positions are described by geographic coordinates measured in the OSGB36 (Ordnance Survey Great Britain 1936) reference system ( 13 ). Note that all the geographic coordinates utilized in this study are OSGB36 based.
b) Survey data that is related to the infrastructure that supports the rails—this data provides basic information such as the curvature, cant, maximum allowable train speed, and axle load, as well as the types of embankments, cuttings, rails, and sleepers used, for every varying length of meters of a railway track. It was available in CSV file format ( 14 ).
c) Data of structures, including overline and underline bridges, retaining walls, tunnels, and stations—to further clarify, an “overline bridge” refers to a bridge structure that spans over a railway line, while an “underline bridge” refers to a railway bridge that passes over a road. This data encompasses information about their locations, materials, structural forms, and construction details. It was available in DGN file format ( 15 , 16 ).
d) Data of a range of parameters associated with track geometry—this data comprises information on the layout and geographic locations of the railway tracks within the UK’s rail network, as well as reference data that associates the tracks with different infrastructure. It was available in shapefile format ( 17 ).
Noticeably, these resources present distinctly different data and file formats. To integrate the data, we transform them into a unified format and visualize the pre-processed data through the use of the Python programming language (hereafter referred to as Python). All the pre-processed data were stored in a database managed by a PostgreSQL server. On this basis, an application programming interface (API) is established, as a prototype, for further data processing in a fully Python-supported computing environment. Utilizing this approach not only facilitates the efficient storage, retrieval, and extraction of the most relevant information among all the available data, but also offers greater flexibility and extensibility with respect to software engineering for modeling and future development of an integrated computing framework, as compared to using commercial tools. With the prototype API in this pilot study, we shall be able to achieve the following:
1) calculate the displacement of rail heads with respect to both rate and direction;
2) cross-reference the track fixity measures with data of any identified influencing factors (given their availability);
3) integrate these data in both spatial and temporal contexts to create a comprehensive data set; and
4) develop a prototype track fixity prediction model using an appropriate machine learning model and the data set.
To be more specific, the data set can be created at a specified level of resolution, such as 1, 10, or 100 m intervals, by matching the location and time across the different data resources. In this study, it relies entirely on open-source tools, including PyHelpers ( 18 ), PyRCS ( 19 ), PyDriosm ( 20 ), LAStools ( 21 ), Laspy ( 22 ), folium ( 23 ), and Open3D ( 24 ), all of which are under free licenses. The method of how to calculate the displacement is detailed and illustrated in the case-study example in the next section of the paper.
Calculation of Track Movement
Traditionally, the calculation of the rail displacements would necessarily entail data about on-site in situ measurement; some researchers have also used digital images ( 25 , 26 ). Inspired by Ye et al. ( 27 )’s work, we propose in our study to use LiDAR data to calculate the track movement. The LiDAR data provides comprehensive information on all visible objects within a certain area that scanners can reach. It generates a “point cloud,” namely, a set of 3D points, which could be used to describe and represent the shape and relative spatial position of an object, given a reference system such as the OSGB36.
In this pilot study, the point cloud data extracted from the raw LiDAR data is a dense, ordered set of 3D Cartesian coordinates, representing the spatial position of the external surface of the rail heads. Each point is in the format of (X, Y, Z), referring to Easting, Northing, and elevation, respectively. In view of the wheel–rail contact where the most significant impact on track movement is, the focus in this study is on the top surface and running edge of the rail heads. More specifically, with any two sets of the point cloud data—collected from scanning the same railway track at two different times—we shall be able to reproduce two trajectories, which are also referred to as curves or “polylines,” of the mobile scanner moving along the track. Calculating the track movement is therefore essentially equivalent to measuring the relative displacement (or shift) of one polyline to another formed by the point clouds.
One way to approach this problem is to measure the similarity between the two polylines. In this regard, there are several methods for conducting the comparisons of two given polylines. One possible option is to compute Fréchet distance ( 28 – 30 ), which, however, usually describes the smallest of the maximum pairwise distances. Alternatively, one may consider the Pompeiu–Hausdorff distance (or Hausdorff distance) that, as suggested by Aronov et al. ( 31 ), may not be adequate for measuring the polylines’ similarities. Besides the notion of mathematical distance, one may also utilize statistical theory and consider Kolmogorov–Smirnov statistics ( 32 ), and weighted least squares (i.e., the sum of the absolute differences between observed values and expectations divided by the observed values). The investigation of such approaches is beyond the scope of this pilot study but may usefully be explored in the next phase of the study. In this study, we propose a more straightforward method, which can have two options:
a) to calculate the distance between a point (of a later observed data set) and its nearest line segment formed of two adjacent points (of an earlier observed data set); or
b) to calculate the distance between a line segment formed of two adjacent points (of a later observed data set) and its corresponding one (of an earlier observed data set).
The line segment can be of arbitrary length (e.g., 1, 10, or 100 m) given the granularity of the point cloud data provided. To minimize computing errors, we consider the minimum distance between any two points as a unit line segment. On this basis, we shall be able to calculate the average and/or moving average of track movement at equal distances, or that at any greater distances by aggregating the displacements of their unit line segments. In this study, the unit line segment is approximately 1 m. After an interview with experienced railway track engineers at Network Rail, we adopt method (b) in this pilot study, as illustrated in Figure 1.

Illustration of the displacement (movement) of the top surface of a 1-m length of rail head (color online only).
Figure 1 shows an example of calculating the movement of the top surface of a rail head, with the blue line representing a 1-m rail head measured in October 2019 and the orange line the nearest line segment, measured in April 2020. Taking a point on the blue line—empirically, the centroid—we draw and compute the length of a perpendicular line segment (in dotted purple in Figure 1) from that point to the orange line, and therefore obtain the displacement of the blue line from October 2019 to April 2020. On this basis, the lateral displacement (dotted green) and vertical displacement (dotted red) of the top surface can be easily computed.
It must be noted that (a) the line segments of the same rail head observed/measured at different times would not be necessarily parallel with each other; (b) the two ends of any line segment would not necessarily correspond to the same ones at the different observation times; and (c) the two line segments (e.g., the blue and orange lines in Figure 1) may not be of exactly the same length, because of measuring errors. For these reasons, using the above proposed method would entail offsetting the errors by cutting out a section of the same length for each line segment in either the horizontal or the vertical plane. In that way we could directly connect the two centroids each located on a cut-out section (slightly shorter than 1 m) to obtain a perpendicular line between the two segments.
Besides the method illustrated in Figure 1, there can be different alternatives. For example, one could also project either an end or the centroid of a line segment onto the other line in the space to obtain a perpendicular line, whereby calculations of the lateral and vertical displacements could also be obtained. However, this method is likely to result in more errors when many pairs of line segments are highly mismatched with respect to the spatial position, in which case the cut-out section can be shorter than 0.5 m. In this paper, we consider only the method illustrated in Figure 1.
Predicting Track Movements
The data integration and further data processing generate a comprehensive data set that contains both of the following:
1) calculated results of average displacements for every 1-m track section and for a given period (i.e., six months); and
2) specific information of the associated factors affecting the track fixity, translated into numerical forms.
With the data of calculated track movements for every 1 m, the track fixity is measured as the direction and average rate of movement of the rail head for any track section length of 1 m and above. As part of the computing framework discussed in this study, we propose to further integrate it with an appropriate machine learning model (e.g., a RF model) to establish a functional link between the track fixity and those influencing factors for analyzing the movement of the rail heads, which should be capable of assisting track engineers in the analysis of track movement over time.
Given the immediate availability of data in this pilot study, we consider the following variables as some major influencing factors:
A Case-Study Example
This section demonstrates the methodology proposed in the previous section with a case-study example in the context of the UK’s railway system.
Case-Study Region
To develop and demonstrate our proposed computing framework, we selected an approximately 80-km section of railway track along the East Coast Main Line in the UK. This section of track, as highlighted in blue in Figure 2a, passes through four stations located between Prestonpans and Berwick-upon-Tweed, southeast of Edinburgh in Scotland, and is primarily used for passenger rail, although there may also be some freight train traffic. In this study, we refer to this section of track as the “example rail line” (or “example track”), which we treat as a representative of the entire network. The area highlighted in yellow in Figure 2a is magnified in Figure 2b.

Illustration of an 80-km section of the East Coast Main Line and its survey grids for the case study: (a) an 80-km section of the East Coast Main Line for the case study, (b) the west end of the track section, and (c) an example 100 m × 100 m survey grid (color online only).
The point cloud data for the example rail line was available for two survey periods: October 2019 and April 2020. The data was collected from about 2000 survey grids, each of which was a 100 m × 100 m area measured in the OSGB36 reference system, as illustrated in Figure 2b. In this study, we randomly selected a survey grid, highlighted in yellow in Figure 2b, and magnified it in Figure 2c. The example grid was originally labeled “Tile_X+0000340500_Y+0000674200,” indicating that its lower left corner is located at the OSGB36 coordinates (340500, 674200). We used this example grid to further demonstrate the framework and methodology for calculating and predicting track movement.
Track Movement
The raw LiDAR data from the example grid, Tile_X+0000340500_Y+0000674200, for the two survey periods, October 2019 and April 2020, are shown in Figure 3, a and b , respectively. The colors in the figures represent different elevations of objects, where warmer colors (e.g., red, orange, and yellow) indicate relatively higher elevations, while cooler colors (e.g., green, blue, and purple) indicate relatively lower elevations. According to Figure 2c, it can be inferred that the blue lines in both Figure 3, a and b , represents the railway tracks; the warmer colors surrounding the tracks represent mostly lineside vegetation, which notably had grown considerably higher since October 2019.

Representation of the rail heads based on their point cloud data within the example 100 m × 100 m grid (340500, 674200): (a) raw LiDAR (light detection and ranging) data (October 2019), (b) raw LiDAR data (April 2020), (c) point cloud of the rail heads (October 2019), (d) point cloud of the rail heads (April 2020), (e) polyline based on the point cloud (October 2019), and (f) polyline based on the point cloud (April 2020) (color online only).
Further, Figure 3, c and d , illustrates the point cloud data of the rail heads in Figure 3, a and b , respectively. We linked every two adjacent points in sequential order to create a polyline for each of the elements, including the top surfaces, running edges, and the center of a track, as illustrated in Figure 3, e and f . On this basis, we could use the method described in the methodology section to calculate the displacements of each of the polylines.
For example, consider the top surface of the rail head of the left-hand rail of the track (hereafter referred to as “left rail top”) in the Up direction where all trains run toward Edinburgh. Note that in the UK, the terms “Up” and “Down” are conventionally used to indicate, respectively, the directions of trains running toward and away from a major destination, such as Edinburgh and London. This predefined reference system is used across the entire UK rail network and provides a convenient and straightforward way for on-site railway staff to conduct inspections and maintenance work. In our case-study example, Edinburgh is identified as a major destination, and the direction in which trains run toward it is thus referred to as the Up direction.
Figure 4 shows violin plots for the calculated movements of the left rail top of the example 80-km track in the Up direction. The plots depict the probability density and boxplot information of average displacements in both lateral and vertical planes for every 10-m track section between October 2019 and April 2020. The direction of track movement is indicated by a positive (+) or negative (−) sign, where a positive sign (+) denotes that the track had moved toward the left in the Up direction since October 2019, and a negative sign (−) denotes movement to the right.

Violin plot of the average lateral and vertical track movements for every 10-m section of the example track.
As illustrated in Figure 4, the lateral track movement exhibited an average displacement of less than 5 mm per 10-m track section, with the maximum displacement of nearly 20 mm observed in the rightward movement (i.e., movement of the left rail top toward the center of the track). Notably, there was a greater range of values for leftward movement (i.e., movement of the left rail top away from the center of the track), which may indicate a need for further investigation into the underlying factors contributing to this variation. On the vertical track movement, however, the 10-m average displacement was mostly around 2 mm or less; a few extreme values ranging from 5 to 15 mm were also observed.
Based on the calculation results, Figure 5, a and b , shows the hotspots of significant lateral and vertical displacements of the left rail top, respectively. The color scale ranges from cooler colors like green, indicating relatively lower rates of movement, to warmer colors like yellow for moderate rates and red for higher rates. It needs to be noted that these heatmaps are based only on the absolute values of the calculated track movements.

Average track movement (mm per 10-m length) of the top surface of rail head of the left-hand rail in the Up direction (October 2019 versus April 2020): (a) lateral track movement and (b) vertical track movement (color online only).
Further to the calculation of the track fixity parameters, we proceeded to integrate all the available data of the several selected factors influencing the track fixity to create a comprehensive data set for developing a prototype machine learning model capable of analyzing the movement of the rail heads.
Prototype Predictive Model for the Region
This section describes how we could develop a prototype model that is capable of predicting the track movement under specific conditions, drawing on a comprehensive data set of recorded asset data, local operational factors, and a knowledge of the track design.
Influencing Factors and Modeling
For the data integration of influencing factors, we created a series of contiguous circular “buffer zones” overlaying the example track, as illustrated in Figure 6. Each of the buffer zones represents a virtual area surrounding a specific section of the track, where we calculate the track movement and collect data of any factors that may influence the track fixity. The diameter of the buffer zone is equal to the length of the track section (or rather, the rail head), for which average track movement is calculated. This approach allows us to gather information on influencing factors and associate them with the track movement within the same designated buffer zone. For instance, in Figure 6, the green dots indicate the presence of overline bridges within each of the buffer zones. It should be noted that the buffer zones in Figure 6 have a diameter of 1 km, which is used for demonstration purposes only.

Illustration of buffer zones and the presence of overline bridges on the example rail line (color online only).
As mentioned in the methodology, our modeling trial in this case-study example was limited to nine factors, including curvature, cant, and maximum allowable train axle load and speed, as well as the presence of overline bridges, underline bridges, retaining walls, tunnels, and stations. To create a prototype predicative tool for the track movement in this study, we trialed a RF model ( 35 ), considering only the nine factors.
The RF model is one of the most popular machine learning methods used in many applications ( 36 ). We chose the RF as a starting point from among various machine learning models because of its robustness and interpretability. In comparison to other models, the RF is less susceptible to overfitting and is capable of handling irrelevant factors in the data. Moreover, the output of a RF model can offer valuable insights into which factors are most important for making predictions. In essence, a RF model is an ensemble learning method that combines a set of decision tree models (hereafter referred to as “trees”), each of which may have insufficient individual competence in using the same data for making predictions ( 37 ). More specifically, a RF model evaluates all the predictions made independently by its component trees and provides a comprehensive prediction result ( 38 ).
Take, for example, the lateral displacement of the left rail top. Following consultation with Network Rail engineers, we categorized the displacement into five ranges, namely “≤−4.45 mm,”“(−4.45 mm, −3.5 mm],”“(−3.5 mm, −2.5 mm],”“(−2.5 mm, 0.0 mm],” and “>0.0 mm.” Note again that negative sign (−) indicates that the left rail top moves rightwards in the Up direction, and leftwards otherwise. The predicted class for a given case should correspond to the class with the highest probability across all the decision trees in the RF model. In this case-study example, a total of 6792 valid cases of 10-m track movements were obtained; we shuffled and divided the data set into a training set of 5433 cases and a test set of 1359 cases. We trained a commonly used RF classifier on the training set using a Python package, scikit-learn ( 39 ), considering different numbers of individual trees (i.e., 50, 150, 200, 250, 300, and 350) and different maximum allowed depths of the trees (i.e., 5, 10, 15, 20, 25, and 30). Through an exhaustive search from different combinations of the two sets of values, we identified the best RF model on the training set, which was formed of 300 decision trees, each with a maximum allowed depth of 15. This model was based on five-fold cross-validation.
Results
The results from the trained RF model presented valuable evidence of the relative importance of the identified factors on predicting track fixity, which could be used to better understand how much impact these factors would have on the track fixity and therefore the prediction of future track movement.
To get the overall performance of the RF model in predicting the lateral track movement, Figure 7 shows a confusion matrix on a set of test data. It illustrates a comparison between the predicted values of the model with the actual values of the test data. For instance, the model predicted that the lateral displacement of the 255 10-m left-rail-top section fell within the range of (−4.45 mm, −3.5 mm], which was consistent with their corresponding calculated values. Because of limited data availability, the model’s absolute accuracy is only around 50% overall. However, the confusion matrix shows that most cases are centered around the diagonal, indicating that the model’s predictions of lateral displacements are largely consistent with the calculated track movements. The output of the trained RF model demonstrates good predictive capability and shows that the proposed computing framework has enormous potential as a tool for predicting and further exploring the sensitivity of the factors influencing track fixity.

Confusion matrix on the test data set.
The importance of the different influencing factors (relative to all the others) being considered in the RF model is presented in Table 1, where the values associated with the different factors sum to one. The table is sorted in descending order of the importance values; the higher the value, the more significant the impact that the corresponding factor can exert on the track fixity in the lateral plane.
Relative Importance of Factors Influencing the Track Fixity
As expected, curvature and cant proved to be the most important among all factors that were considered in the model. Besides the track geometry, axle load and train speed would also be expected to significantly impact track fixity ( 33 ). With respect to the presence of structures, the track fixity of ballasted track can be more vulnerable to movement than fixed structures, such as retaining walls and tunnels, that may offer a greater degree of track bed stability. Track sections within station areas are much less likely to suffer from fixity issues given much slower train speeds and lack of track curvature.
In summary, the integrated computing framework in the current phase of this study allows for high flexibility for further development with more, adequate data across the railway network. There remains, however, a need for trialing and validating the data model in different areas of the railway network to help further refine the methodology.
Conclusion and Discussion
This paper presents a pilot study that seeks to design the most comprehensive integrated computing framework to date for track fixity in the context of the UK’s railway system, allowing for a step change in the temporal and spatial resolution of understanding profile-specific track fixity. Although it is currently in the prototype stage, the design of the data flow pipeline enables it to gather, process, and combine as much relevant information as possible for track fixity. With the established workflow, we propose a new metric for assigning track fixity values to a given track profile with respect to its movement relative to the plane of rail within a given period. A RF model was trialed to predict future movement of the track. The prototype framework has demonstrated its capability of predicting track movement with an engineering acceptable confidence level, with most movements classified either in the predefined ranges or within a single bin width of the true value (based on calculation with point cloud data). There is thus the potential of using the framework to explore the sensitivity of track fixity to the factors affecting it and calculate future track fixity for new structures. Also, the prediction model developed from this research is highly adaptable to different contexts, given the availability of similar data resources of the factors that the model accounts for.
However, it is recognized there are three main limitations in this pilot study. Firstly, the calculation of track movement was based only on 3D geographical coordinates of the rail head position, which were made available in the form of point cloud data. However, errors generated from data collection, as well as that in the data collected at different times, are unavoidable, notwithstanding the same geographical coordinate system. While the actual track movement is mostly within a few millimeters, it can possibly be even smaller than the errors in the measurement at the same point. Arguably, on the one hand, the errors can depend largely on the technology used for collecting the data; on the other hand, it also needs to be noted that the development of the prediction model was not intended for predicting the precise movement but rather the movement within a predefined set of displacement ranges into which the movement would be most likely to fall. Whether the errors in the measurements could be fully contained within the predefined range would be worth further investigation for a specific data collection technology. Secondly, the data integration for the model development relies heavily on mapping heterogeneous information (about the factors influencing track movement) onto the same geographical system. Because of a deficiency of accurate location referencing data, it was not possible to consider all influencing factors in the prototype model. Thirdly, data was only available from two time periods (October 2019 and April 2020). Therefore, to further the development of the framework, the following issues should be addressed.
1) Improving the quality of reference data across various location identifiers in different data resources.
2) The comprehensive data set should be extended to include additional line sections with differing reference curves and speed profiles. In this way, it will become possible to test the potential of the model in a more general context, and to begin to (a) determine what and how many prediction models would need to be trained to obtain national coverage and (b) investigate the trade-off between the type and number of models and the individual discriminatory power of each instance.
3) Using data from additional measurement campaigns to enable the existing model to be further developed, leading to improved accuracy and greater confidence in the results produced by the model.
4) To look ahead in the longer term would require a more harmonized and unified data codification system across the rail industry to accelerate the further development of the proposed framework and the implementation of a full-fledged, automated computing platform to be integrated into the railway track system.
Footnotes
Acknowledgements
The authors extend their sincere gratitude to Network Rail and its staff for providing the data for this study, as well as for their continued support. Special thanks are owed to Dr. Huan Tong for her kind assistance in the validation process.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: Q. Fu, J.M. Easton, M.P.N. Burrow; data collection: J.M. Easton, M.P.N. Burrow; analysis and interpretation of results: Q. Fu; draft manuscript preparation: Q. Fu, J.M. Easton. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to express their gratitude to Network Rail for funding this study.
