Sage Journals: Discover world-class research

Abstract

This paper proposes a data-centric method for identifying geometric defects in railway tracks using acceleration data collected from high-speed trains. Unlike many existing drive-by monitoring approaches that rely on classical supervised learning models requiring extensive labeled data from every line, a novel framework based on unsupervised domain adaptation (UDA) is proposed. This framework transfers the geometric defects diagnosis model learned from one line (source domain) to a new line (target domain) without necessitating labeled data from the latter. Given the variability in operational conditions, a detection model trained on one known scenario cannot be directly applied to another. Thus, the framework learns features sensitive to geometric defects and invariant to different tracks using the progressive distribution alignment based on label correction (PDALC) algorithm. Input data comprises labeled time-domain features extracted from acceleration data of the source line and unlabeled data from the target line. Output predictions are health status (target domain labels) for each track zone of the target line. The framework is evaluated using a comprehensive dataset of field measurements from a high-speed train traversing four different lines of France’s high-speed rail network, representing four distinct domains. Comparative results across 12 cross-domain recognition tasks reveal that the UDA framework based on PDALC outperforms four other UDA algorithms: transfer component analysis, maximum mean and covariance discrepancy, learning via low-rank and sparse representation, and geodesic flow kernel algorithm. Compared to the basic method (a classical supervised learning model without UDA), the proposed framework achieves a 12% increase in defect detection accuracy. Furthermore, the paper investigates the impact of different sensor layouts, tuning parameters of PDALC, classification algorithms, and the number of features on the accuracy of the approach.

Keywords

Unsupervised domain adaptation drive-by approaches data driven track geometry defects high-speed rail network

Introduction

Railway systems serve as critical infrastructure assets, necessitating ongoing monitoring of track networks to ensure the safety of passengers and to minimize maintenance expenses through early fault detection.¹ Tracks, integral components of railway systems, are engineered with specific geometric parameters that can vary along routes due to topographical constraints. Operational tracks can deteriorate over time due to multiple factors, resulting in deviations from their intended geometry that need to be rectified to maintain passenger comfort and safety. Regular inspections are carried out to assess the condition of the tracks, identify degradation, and detect faults, utilizing both manual inspection methods and specialized equipment like track geometry cars (TGCs) or track recording vehicles (TRVs). Generally, specific fault thresholds and statistical indicators of track quality are defined, prompting maintenance actions when these thresholds are surpassed.² While TRVs offer precise data, their fleet sizes are often limited. Inspections may occur infrequently depending on the track category; for instance, in the United Kingdom, track geometry recordings may have a maximum interval of up to 52 weeks,¹ with more vulnerable lines undergoing more frequent assessments. TRVs may necessitate line closures if operating below line speed. Failure to detect faults between inspections can compromise ride quality and result in higher maintenance expenses than if identified earlier.³

To address these challenges, sensors have been installed on operational trains to monitor track geometry.⁴ Known as “onboard” or “drive-by” monitoring techniques, these methods offer continuous data collection, although with fewer parameters measured and reduced accuracy compared to TRVs. Nevertheless, the increased frequency of data collection holds potential for enhancing preventative maintenance practices.⁵

Accelerometers mounted on the train measure vibrations transmitted from the interaction between the wheel and rail in onboard techniques.^6,7 These methods offer scalability in sensing, requiring minimal cost and maintenance, as each equipped in-service train can effectively monitor multiple lines.⁸ However, drive-by data contain higher levels of noise due to the indirect nature of the measurement and the influence of various environmental and operational variations (EOVs).⁹ Thus, data analysis techniques play an important role in reducing these measurement uncertainties. The results of the implementation of different data-driven approaches based on recorded dataset set of high speed train in France show that drive-by scheme can be used to monitor track irregularities.^1,10 In addition, recorded acceleration responses from field test dataset⁷ or train–track–bridge interaction model¹¹ can be used to monitor frequencies of railway bridges¹² and also detection of scour in bridge piers.¹³

On the other hand, in recent years, machine learning-based techniques have gained attention for track condition monitoring. Pires et al.¹⁴ proposed a data-driven approach to estimating geometric track irregularities using instrumented railway vehicle data, employing eight different regression machine learning models. De Rosa et al.¹⁵ utilized vehicle acceleration response data and track irregularities to establish severity thresholds. Similarly, Paglia et al.¹⁶ proposed a condition monitoring approach for railway tracks that involves estimating vertical track alignment. This is achieved by analyzing the vertical acceleration of the bogie and linking synthetic indicators, which represent vehicle dynamic behavior, to track geometry measurements obtained from a diagnostic train. Tsunashima¹⁷ applied vertical and lateral accelerations, along with the carbody roll rate, to identify degraded track segments and faults using support vector machines and clustering methods. Yuan et al.¹⁸ introduced a method using axle box data for squat detection, employing convolutional variational autoencoders to extract damage-sensitive features, followed by anomaly detection algorithms in the latent space. These studies highlight the potential of machine learning approaches for track defect detection.

However, a key limitation in these approaches is the assumption that training and testing data are independent and identically distributed (IID). In practice, training data often come from specific conditions, while testing data can encompass diverse operational and environmental scenarios, violating the IID assumption and leading to domain shift between datasets.¹⁹ Directly applying models trained on existing labeled data to new datasets (e.g., different track lines) may result in poor performance. While new labeled data could improve model accuracy, collecting such data, particularly damage-state data, is often impractical.²⁰

To address these challenges, transfer learning (TL) has emerged as a powerful strategy for handling domain shift. TL enables models to leverage knowledge from previous tasks and apply it to new, similar tasks, improving performance.²¹ A specific type of TL, known as domain adaptation (DA), aligns data distributions from the source and target domains, allowing the model to generalize well to unseen data by either adjusting sample weights or mapping data into a shared feature space.²² Specifically, for multi-line monitoring, DA transfers a damage diagnostic model learned from one line to other lines, eliminating the need for labeled training data from every line in the railway network. In recent studies, DA has been applied to structural damage detection across different domains.^20,22 For instance, Giglioni et al.²³ developed a DA approach for bridge monitoring, transferring damage knowledge between monitored bridges. They used DA techniques like joint DA (JDA) to align feature spaces between source and target domains while minimizing distribution discrepancies, thereby enabling damage detection across different bridge structures even when exposed to varying operational and environmental conditions. The study validated this approach on two benchmark bridges (Z24 and S101) and their finite element models, demonstrating that DA can significantly improve damage classification in structural networks. Similarly, Yano et al.²¹ applied DA for TL across bridges, showing that even with distributional shifts, domain-invariant features could be extracted, enabling successful knowledge transfer across different bridge conditions.

In railway engineering, DA has been applied to railway vehicles damage detection. For instance, Yu et al.²⁴ used conditional adversarial DA to predict faults in gearboxes and shafts at varying running speeds. Qin et al.²⁵ proposed a Stepwise Adaptive Convolutional Network for fault diagnosis of high-speed train bogies under varying speeds. The model leverages DA to handle changing operational conditions, achieving a fault classification accuracy of 96.1% for key components like air springs, anti-yaw dampers, and lateral dampers. Chen et al.²⁶ introduced a semi-supervised adversarial DA model for assessing high-speed train wheel conditions under varying operational environments. By leveraging both labeled and unlabeled data, the model reduces discrepancies between different operational conditions, ensuring reliable performance in wheel condition monitoring. Their model outperformed baseline approaches in cross-domain assessments using real-world onboard monitoring data from the Lanxin high-speed rail line. Jiang et al.¹⁹ also proposed a DA approach to diagnose the health conditions of maglev rail joints under complex operational conditions using an unsupervised discrepancy-based DA network. This approach was validated on a dataset of time–frequency spectrograms derived from experimental acceleration data of maglev rail joints. It successfully identified two conditions: bolt-looseness-caused rail step and misalignment-caused lateral dislocation.

In this study, we propose a novel framework based on a DA architecture for detecting geometrical defects across multiple railway lines. The framework uses the track line with labeled data as the “Source Domain” and applies the model to a new, unlabeled track line, referred to as the “Target Domain.” To achieve this, unsupervised DA (UDA) is employed, which transfers the model trained on the source domain’s labeled data (e.g., train vibration data with damage labels) to predict defects in the target domain without the need for labeled data.⁹ By addressing the domain shift, the proposed UDA method enables effective defect detection across railway networks, eliminating the need for extensive new labeling efforts.

The proposed framework has three modules. In the first module, data for UDA prepared by conducting data cleansing and extracting the time domain features from raw acceleration response of various sensors on body and bogies of a train. In the second module, a UDA algorithm is trained to obtain domain-invariant and class discriminative feature representations from the input data. This is carried by using a novel UDA method named progressive distribution alignment based on label correction (PDALC).²⁷ PDALC leverages class discriminative information to conduct subspace learning, facilitating the acquisition of a domain-invariant subspace.²⁷ Additionally, it incorporates a mechanism for pseudo label correction to assess the reliability of pseudo labels and rectify any inaccuracies.²⁷ In the final module, domain-invariant features extracted from data using PDALC are used as inputted to the damage classifier, to predict damage states of the target line.

The proposed framework is validated using extensive field measurements obtained from a specialized high-speed measuring train operated by the Société Nationale des Chemins de Fer Français (SNCF).^28,29 The dataset includes track geometry measurements recorded by the track geometry measuring system (TGMS) and acceleration responses collected by multiple accelerometer sensors installed on the IRIS320, a high-speed instrumented train used on the French high-speed line (HSL).^29,30

The IRIS320 train functions as a mobile probe, capturing real-time vibration and sensor data during regular operation to detect and analyze track conditions. In previous work, it was demonstrated that a data-driven algorithm using the instrumentation from the IRIS320 was effective in detecting geometric defects in track zones along a specific line in France.^4,31 For this paper, the recorded dataset of four various lines is used as the main case study, and the result of 12 different cross-domain recognition tasks in the UDA framework is assessed.

In addition, to compare the performance of the PDALC with other UDA methods, the results are compared with the ones obtained from another four well-known algorithms of Learning via low-rank and sparse representation (LRSR),³² geodesic flow kernel (GFK),³³ transfer component analysis (TCA),³⁴ and maximum mean and covariance discrepancy (McDA).³⁵

The outline of this paper is as follows: the second section formulates the domain shift problem and discusses the concept of condition monitoring in UDA system and the formulation of PDALC, as proposed by Lie et al.²⁷ The third section presents detail of the onboard monitoring system on the IRIS320 high-speed train. The proposed UDA framework is presented in the fourth section. In the fifth section, a series of comparison conducted on real-world datasets across 12 different tasks to verify the effectiveness and efficiency of the proposed method. Finally, the sixth and seventh sections present the discussion and conclusion of the work.

Theoretical background

Problem statement based on the DA concept

In machine learning, a domain ( $D$ ) refers to a feature space $X$ , characterized by a marginal distribution $P (X)$ , where samples are drawn as $X = {x_{1}, x_{2}, \dots, x_{n}} \in X$ . Each domain is tied to a task ( $T$ ) that involves learning a conditional distribution $P (Y | X)$ , where $Y$ represents the label space, containing samples $Y = {y_{1}, y_{2}, \dots, y_{n}} \in Y$ . In typical machine learning models, there are two primary domains: the source domain ( $D^{s} = {X^{s}, P (X^{s})}$ ) with task $T^{s}$ , and the target domain ( $D^{t} = {X^{t}, P (X^{t})}$ ) with task $T^{t}$ . Traditional methods assume that the task and domain are identical across both source and target ( $T^{s} = T^{t}$ ) and $D^{s} = D^{t}$ , using labeled data { ${x_{i}, y_{i}}$ . However, when faced with task variations $T^{s} \neq T^{t}$ or domain shifts $D^{s} \neq D^{t}$ , the performance of these models declines. DA addresses this by utilizing knowledge from both $D^{s}$ and $D^{t}$ .¹⁹

DA assumes that while the tasks in the source and target domains are identical ( $T^{s} = T^{t}$ ), the domains themselves are different ( $D^{s} \neq D^{t}$ ). DA is categorized based on the divergence between domains. In homogeneous DA, the feature spaces remain the same ( $X^{s} = X^{t}$ ) but the data distributions differ ( $D^{s} \neq D^{t}$ ), while in heterogeneous DA, the feature spaces differ ( $X^{s} \neq X^{t}$ ).² Figure 1 illustrates how DA can mitigate domain shift in a railway fault detection scenario. Here, the feature space is made up of data points representing various defect categories. Without DA, a model trained on source domain data may misclassify target domain data. To overcome this, DA aligns features from both domains into a shared feature space, allowing the model to generalize better to the target domain and reduce classification errors.

Figure 1.

Illustration of how DA addresses domain shift in fault detection. DA: domain adaptation.

One of the key challenges in railway track monitoring and structural health monitoring (SHM) applications is the variability in the operating conditions across different environments, structures, or time periods. For example, the vibration characteristics of a track section can vary significantly depending on external factors like weather, load conditions, or geographic location.¹ In the case of UDA, where labeled data are not available for the target domain, this variability poses a significant challenge. The ability to transfer knowledge from a source domain (such as a specific railway line or a numerically simulated environment) to a target domain (such as another railway line or operational environment) without labeled data becomes crucial for effective and scalable monitoring systems. UDA methods are particularly useful in these contexts, as they allow models to account for these variations without requiring extensive manual labeling efforts in each new domain.

In SHM, the source domain $D^{s}$ often consists of labeled data obtained from numerical simulations or specific structural conditions.³⁶ The target domain $D^{t}$ consists of unlabeled data collected from experimental structures or operational states. In such cases, DA aims to classify target domain data using knowledge transferred from both source and target datasets.³⁶ In this study, $D^{s}$ comprises labeled vibration data from a high-speed train’s onboard monitoring system for a specific railway line, while $D^{t}$ contains unlabeled data from the same system but for different lines. The assumption is that the distributions between the two domains are different ( $D^{s} \neq D^{t}$ ), though they share the same label space. Specifically, the input distributions between the source and target domains differ, $p (X^{s}) \neq p (X^{t})$ , and the conditional distributions for inference may also vary, indicated by $p (Y^{s} | X^{s}) \neq p (Y^{t} | X^{t})$ .

Progressive distribution alignment based on label correction

The existing methodologies for UDA can be categorized into two main types: instance-based UDA and feature matching-based UDA techniques.³⁷ Instance-based UDA assumes that data from both source and target domains are available for training and focuses on aligning individual instances from the source to the target domain.³⁸ In contrast, feature alignment-based UDA seeks to reduce the disparity between marginal or conditional distributions of the source and target domains by analyzing their geometric structure in a shared subspace.³⁹ Generally, in UDA, pseudo labels need to be generated for the target domain,^40,41 which effectively transforming the UDA problem into a supervised learning task through the utilization of these pseudo labels. However, the reliability of these pseudo labels is often questionable, potentially leading to misalignment of conditional distributions and incorrect pseudo label generation during UDA model iterations. Despite this, many existing UDA approaches overlook the issue of pseudo label reliability. To address this challenge, this study adopts a novel UDA method called progressive DA with label correction (PDALC),²⁷ which combines progressive learning, label correction, and subspace structure learning to reduce the distribution gap between domains. PDALC employs a label correction mechanism to assess and rectify pseudo-label inaccuracies as the learning process unfolds. The framework of PDALC is illustrated in Figure 2, and it consists of two primary phases: (1) learning geometric structure of data and (2) label correction. More details of the PDALC are provided by the study by Li et al.²⁷

Figure 2.

The illustration of PDALC. PDLAC: progressive distribution alignment based on label correction.

Learning the geometric structure

Pseudo-labels assigned to the target domain are used to learn the structural relationships between data points in both the source and target domains. Assuming the availability of pseudo labels, the objective is to ensure that data belonging to the same class (e.g., healthy class) are clustered closely together, while data points from different classes (e.g., healthy class vs damaged class) are kept apart. This is achieved using the following equation:

\begin{matrix} \arg \underset{A}{min_{︸}} \sum_{c \in Y} \sum_{x_{i} \in D^{c}} ‖ A^{T} x_{i} - A^{T} {\bar{x}}_{c} ‖_{2}^{2} υ_{i, c} \\ - β \sum_{c \in Y} \sum_{k \in Y} ‖ A^{T} {\bar{x}}_{c} - A^{T} {\bar{x}}_{k} ‖_{2}^{2} + λ ‖ A ‖_{F}^{2} \end{matrix}

(1)

Here, $β$ and $λ$ are regularization parameters, $v_{i, c}$ represents the confidence level of a data point, and JDA is applied to minimize domain discrepancies in both marginal and conditional distributions.³⁹ The optimal solution is determined through the following:

\begin{matrix} \arg \underset{A}{min_{︸}} Tr (A^{T} Q A) + μ Tr (A^{T} XM X^{T} A) \\ - 2 β Tr (A^{T} {\bar{X}}_{s} E {\bar{X}}_{s}^{T} A) + λ ‖ A ‖_{F}^{2} \\ s . t A^{T} XH X^{T} A = I, \end{matrix}

(2)

Solving this eigenvalue problem provides the final projection matrix, allowing for the transformation of source and target data into a shared subspace.

Labels correction

To tackle the challenge of misclassification and linear inseparability in the new shared subspace, the samples are first mapped to a kernel space to improve separability. A classifier based on structural risk minimization (SRM) is then used to generate pseudo-labels for the target domain data.⁴² This process can be expressed as:

\arg \underset{f \in H_{K}}{\min_{︸}} \sum_{i = 1}^{n_{s} + n_{t}} Λ_{i i} {(y_{i} - f (z_{i}))}^{2} + η ‖ f ‖_{K}^{2}

(3)

where $Λ$ is a diagonal domain indicator matrix with $A_{ii} = 1$ if the sample belongs to the source domain, and $A_{ii} = 0$ otherwise. $H_{K}$ is the kernel space, $η$ is a regularization parameter, and $z_{i}$ represents the mapped feature vector in the shared space. This allows the classifier to assign pseudo-labels to the target domain samples.

However, pseudo-labels generated during this process are not always reliable. To mitigate the impact of inaccurate pseudo-labels, a confidence coefficient matrix $V_{t}$ is introduced. Initially, all elements of $V_{t}$ are set to 1. As the algorithm progresses, the confidence in the pseudo-labels is updated iteratively:

{(V_{t})}_{ic} = {\begin{matrix} {(V_{t})}_{ic} + φ, if . the . pseudo . label . of x_{ti} is c, \\ {(V_{t})}_{ic}, otherwise, \end{matrix}

(4)

where $φ$ is a constant used to increment the confidence for a specific class. The confidence in the pseudo-labels is assessed using information entropy, which evaluates the reliability of each pseudo-label:

G ({(V_{t})}_{ic}) = - \sum_{c = 1} p ({(V_{t})}_{ic}) \log p ({(V_{t})}_{ic}),

(5)

where $p ({(V_{t})}_{ic})$ is the probability of the $i$ th sample belonging to class $c$ , calculated as:

p ({(V_{t})}_{ic}) = \frac{{(V_{t})}_{ic}}{\sum_{j = 1}^{C} {(V_{t})}_{ij}},

(6)

If the entropy $G ({(V_{t})}_{i})$ is sufficiently low, the pseudo-label corresponding to the maximum confidence value is considered reliable and is used to update the target domain pseudo-labels. The pseudo-label correction mechanism ensures that reliable labels are preserved while inaccurate ones are adjusted iteratively, leading to improved model performance.

Experimental dataset

The experimental dataset utilized in this study is collected as part of the International Union of Railways (UIC) Harmotrack project run by SNCF Réseau.²⁹ The HSL train IRIS320 is equipped with multiple advanced sensors, cameras, and lasers, including accelerometers, inertial measurement units, high-speed imaging systems, line-scan cameras, effort sensors, surface pressure sensors, and an electric arc detection system. This specialized inspection train conducts assessments at high speeds, reaching up to 320 km/h, and can operate during the daytime between commercial high-speed train services. The data were gathered over 3 years, encompassing a range of geographic, environmental (weather), and seasonal conditions. The instrumentation recorded train acceleration responses and track geometry data every 15 days for each line.²⁹

Accelerometers positioned at the front, middle, and rear sections of the train (as illustrated in Figure 3) are used to measure indicators of vehicle-track interaction, including vertical and lateral vibrations transmitted to the train from the forces generated at the contact surface between the wheels and rails. These accelerometers are installed on both the train’s body (on the floor) and the bogies, as shown in Figure 4(a).²⁹ For each section, two accelerometers were installed to measure the Acceleration in Transversal (lateral) direction of Bogie ( $ATB$ ), Acceleration in Transversal direction of Carbody ( $ATC$ ), Acceleration in Vertical direction of Bogie ( $AVB$ ), and Acceleration in Vertical direction of Carbody ( $AVC$ ). In total, there were 12 acceleration measurements (channels). The sampling frequency of IRIS320 is 400 Hz for bogie and car body accelerations, and the raw data are processed to give a sample every 25 cm. The train position is calculated by a localizer which uses an odometer that counts the number of wheel turns and deduces the covered distance. As wheels are not perfectly round, a database is used to correct the covered distance as it gives the exact positions of specific points on the network. On HSLs, the IRIS320 train measures the track quality every 2 weeks. Figure 3 displays the arrangement of the installed accelerometers on the HSL train. Additionally, Table 1 lists the sensor names in the dataset along with the corresponding sensor numbers utilized in this study. More details of the instrumentation can be found in the study by Sorrentino et al.²⁹

Figure 3.

Layout of sensors on the IRIS320 train.

Figure 4.

Monitoring systems: (a) accelerometer on bogie and (b) track geometry measuring system.⁴⁷

Table 1.

Number and locations of sensors.

Sensor number	Placement on train	Location	Direction	Sensors name in dataset
1	Bogie no. 4	Head	Transversal	$ATB - B 4$
2	Bogie no. 7	Middle	Transversal	$ATB - B 7$
3	Bogie no.10	Tail	Transversal	$ATB - B 10$
4	Car-body no. 2–3	Head	Transversal	$ATC - B 4$
5	Car-body no. 5–6	Middle	Transversal	$ATC - B 7$
6	Car-body no. 8–9	Tail	Transversal	$ATC - B 10$
7	Bogie no. 4	Head	Vertical	$AVB - B 4$
8	Bogie no. 7	Middle	Vertical	$AVB - B 7$
9	Bogie no. 10	Tail	Vertical	$AVB - B 10$
10	Car-body no. 2–3	Head	Vertical	$AVC - B 4$
11	Car-body no. 5–6	Middle	Vertical	$AVC - B 7$
12	Car-body no. 8–9	Tail	Vertical	$AVC - B 10$

In addition to the accelerometer sensors, the GEOV2 beam is utilized as a TGMS to accurately measure track geometry (both vertical and lateral) under realistic loading conditions transmitted by rolling stock, as depicted in Figure 4(b). Various parameters, including track level and alignment, twist, gauge, and cant, are processed according to SNCF standards (short and long wave bases). This system provides real-time or batch-processed measurement data that comply with the Mauzin and European standard (EN13848-2) for ranges D1, D2, and D3.⁴³ The IRIS320 system quantifies track irregularities by measuring the distance between a train’s bogie and four designated points on the rails.⁴⁴ This assessment is performed using two cameras positioned beneath the bogie, with lasers used to illuminate the rails. The system maintains a distance of 0.25 m between consecutive measurements along the track, which allows for precise estimation of track irregularities, considering that the smallest detected wavelength of these irregularities is 3 m.⁴⁵ However, movements of the bogie introduce bias into the track geometry measurements. To mitigate this bias, the data undergo post-processing, which includes eliminating the translational and rotational effects caused by bogie movements, using accelerometer and gyroscope data to make the necessary corrections.⁴⁶ It is important to highlight that TGMS measurements recorded while the train is traveling below 80 km/h are not correctly processed and have been excluded from the dataset. Additionally, instances where the cameras fail to capture accurate measurements, resulting in constant rail position values over several meters, have also been removed from the track geometry dataset.⁴⁵

In the framework of the Harmotrack project, the track geometry data set is divided into 60-m-long zones, in order to associate acceleration data with track geometry in each zone.²⁹ The condition of each track zone is determined by analyzing the geometry data recorded within that zone and comparing it against standard values outlined in the SNCF Réseau maintenance standard.⁴⁸ Any deviation from these standard thresholds is regarded as a track geometry defect. The recorded data set shows several defect types, which are given in Table 2 with their term in the SNCF Reseau maintenance standard and track geometry safety limits for high-speed track.⁴⁹ As the main purpose of this paper is to develop a defect detection algorithm, all the defect types are grouped as one class with the label of damaged state.

Table 2.

The track geometry defect.

Used term in the SNCF Réseau maintenance standard	Type of defect	Chord	Limit	Interpretation
Ed	Cant variation	—	max: 180 mm	Maximum cant (cross-level)allowed on curves
Dres	Alignment	10 m	12 mm	Short-wavelength alignment variation
Dall	Elongated alignment	31 m	20 mm	Long-wavelength alignment variation
Niv	Longitudinal level	12.2 m	15 mm	Short-wavelength vertical irregularities
Nall	Elongated longitudinallevel	31 m	24 mm	Long-wavelength vertical irregularities
G3	Twist	10 m (base)	18mm	Cross-level difference over a fixed base length
Emax	Minimum gauge over atrack section	—	1427.5 mm	Minimum track gauge
Emean	Mean	—	1430 mm(over 100 m):	Gauge uniformity (average in100 m window)
Emin	Maximum	—	1454 mm	Maximum permissible gauge

SNCF: Société Nationale des Chemins de Fer Français.

It’s important to note that the “normal state” refers to a condition of track geometry that complies with SNCF Réseau standards and does not require maintenance intervention. In this study, if one or more types of defects are observed within a zone, it is labeled as a defected zone (DZ),²⁹ indicating that the track within that zone is considered damaged. Conversely, zones without observed defects, where maintenance threshold exceedance is absent, are labeled as healthy zones (HZs).

The proposed framework

The overview flowchart of the proposed UDA-based framework for geometrical defect detection is shown in Figure 5, illustrating the system’s data flow. The framework comprises three main modules:

Data pre-processing module: This module extracts time-domain features from the recorded acceleration responses across different railway lines. These features are essential for subsequent analysis.

UDA module: The UDA module minimizes the domain discrepancy between the source and target lines. This ensures that the data distribution of the source line (source domain) becomes similar to that of the target line (target domain), overcoming the challenges posed by differences between the lines.

Damage diagnosis module: Once sufficient alignment of the data distributions is achieved, this module enables the model to detect geometrical defects in the track zones of the target line. It utilizes the aligned data to predict the condition of track zones in the target domain.

Figure 5.

The schematic of the proposed UDA framework for drive-by railway track monitoring. UDA: unsupervised domain adaptation.

The framework uses labeled data from the source domain and unlabeled data from the target domain as input, ultimately producing predicted labels for the target domain. In the following subsections, each module is explained in detail.

Data pre-processing module

In this module, the input data for UDA are prepared by conducting data cleansing and computing the submodules of the “Data segmentation” and “Feature transform.” This module has its own three steps to pre-process the dataset. Figure 6 presents a flowchart that illustrates the dataset pre-processing steps. Initially, train acceleration responses and track geometry data are collected. The recorded track geometry dataset is then processed as outlined in the fourth section to identify the HZs/DZs.

Figure 6.

Flowchart of the pre-processing module for creating HZs and DZs based on geometry dataset and normal/damaged state of the corresponding acceleration responses and features extraction. DZ: defected zone; HZ: healthy zone.

It is worth noting that the track layout can induce quasi-static lateral acceleration due to centrifugal forces. To eliminate this effect from the measured acceleration signals and ensure consistency across the dataset, frequencies below 0.4 Hz were filtered out to remove quasi-static rigid body movements (caused by the spatial evolution of the track design).⁴⁵ A band-pass filter with a gradient of −24 dB per octave was applied to the acceleration signals. This filtering process improves the correlation between track geometry and measured acceleration by ensuring that the retained acceleration components primarily reflect perturbations due to actual track geometry conditions, rather than effects introduced by the track layout.²⁸ Each zone is categorized based on its state: if a zone is in good condition, its corresponding acceleration response is classified as “Normal.” In contrast, if a zone exhibits defects, its acceleration response is marked as “Damaged.” By the end of step 2, an efficient coherent dataset is created that correlates geometrical data and acceleration measurements, in a way that the acceleration response set of each HZs and DZs is specified.

The feature transforms are then applied to the acceleration responses part of this coherent dataset to create the feature space. To do so, the statistical metrics of each acceleration record is calculated. Statistical metrics, whether in the time domain, frequency domain, or time-frequency domain (wavelets), are frequently employed for extracting features from raw signal data.⁵⁰ Significant track irregularities can lead to alterations in the amplitude and distribution of the time-domain signal, compared to signals induced by minor irregularities. Similarly, changes may occur in the frequency spectrum and its distribution, potentially resulting in the emergence of new frequency components linked to track severity.⁵¹ In this study, only time-domain metrics are utilized due to their ease of implementation. These metrics are: mean, standard deviation (Std), root mean square (RMS), skewness, peak value, crest Factor, clearance Factor, and impulse factor. These indexes are selected based on recommendations from previous works in this field.^52,53

The size of the feature matrix is $C \times W$ , where $C$ is the number of sensor channels on the train. $W$ is the number of the statistical indexes calculated for each response. Furthermore, the naming convention for the statistical metric step involves taking an existing sensor channel and concatenating a string that refers to the applied metric. For example: “ATB_B4_RMS” means the RMS value of the transversal acceleration response that collected by the sensor mounted on bogie number 4.

Hence, the final dataset is obtained that linking various accelerometric metrics to corresponding track geometry defects.

UDA module

In this module, the UDA algorithm is trained to obtain a domain-invariant and class discriminative feature representation that extracted from the input data using PDALC algorithms. It is worth noting that the proposed framework is general so each of the previously mentioned UDA algorithms can be used for this module. However, PDALC is mentioned here as the main algorithm because of its unique mechanisms of geometric structure alignment and pseudo-label correction. As detailed in the second section, PDALC leverages class discriminative information for subspace learning to acquire the domain invariance subspace. Additionally, a mechanism for pseudo label correction is employed to assess the reliability of pseudo labels and rectify any inaccuracies present. Through the integration of subspace learning and label correction, PDALC demonstrates a continuous enhancement in performance, thereby mitigating the occurrence of inaccurate pseudo labels.

The problem addressed in this study represents a typical homogeneous DA scenario. Specifically:

the source and target tasks are identical, both aiming to identify the conditions of railway track zones;

the feature spaces in both the source and target domains are the same, as both domains contain the same set of statistical features extracted from the acceleration responses of a moving train across different track zones; and

the data (or samples) are distributed unevenly between the source and target domains, as the feature matrices are collected from distinct railway lines, leading to distributional difference.

More precisely, the goal of each task is to develop a model capable of determining whether a 60-m-long track segment is in a healthy (normal) or defective condition, in accordance with the SNCF maintenance standard. The model is trained using acceleration-based features extracted from one railway line (source domain) and then tested on corresponding features from another line (target domain), thus evaluating the model’s ability to generalize across different track environments.

Target domain prediction module

The UDA algorithms employed in this study aim to discover the latent space where the source and target datasets are aligned. Using these UDA methods, any classifier can be trained on datasets that are transformed to the latent space. Therefore, the classifier can transfer knowledge from a labeled source dataset to an unlabeled target dataset on that space. In this paper, to ensure a fair comparison, the SRM classifier is employed for all UDA algorithms. Also, to assess the effect of changing the classification algorithm on the overall efficiency of the proposed framework, the results of SRM (kernel = linear) have been compared with support vector machine (SVM) (kernel = polynomial), $k$ -nearest neighbor (kNN),⁵⁴ and fully connected feedforward neural network (FNN) (number of hidden layers = 2) algorithms.

Moreover, the performance metric utilized in this study is the classification accuracy on the target domain. This metric, commonly employed in previous research, is defined as:

Accuracy = \frac{| x : x \in D_{t} \land \hat{y} (x) = y (x) |}{| x : x \in D_{t} |}

(7)

where $y (x)$ and $\hat{y} (x)$ represent the true and predicted labels, respectively, for the target domain.

The results

In this section, the experimental dataset detailed in the third section is used to evaluate the efficacy of the proposed framework for DA in railway track monitoring, as elaborated in the fourth section. The collected datasets of IRIS320 for four different railway lines in the French railways TGV high-speed trains network are used as case study for this paper. Table 3 provides detailed information about each dataset, including the scale of the feature matrix used as input for the UDA algorithm and the train speed statistical indices (mean, Std, min/max) for each line. These four lines were selected from the available dataset to represent a diverse range of geographic locations, defect types, and dataset sizes. This selection ensures that the chosen datasets are appropriately representative of the broader collection gathered during the Harmotrack project.

Table 3.

The details of different lines in Harmotrack dataset.

Line name in dataset*	Name abbreviation	Total number of data	Number of data in each class	Dimension of features	Statics of train speed
					Min	Mean	Max	Std
MjcyNTc0	Mj	1755	Damaged = 1226 Healthy = 529	96	160	192	200	16
NTcwNTc0	NT	710	Damaged = 297 Healthy = 413	96	100	174	220	36
NzUyOTA0	Nz	6244	Damaged = 2590 Healthy = 3654	96	90	182	270	82
NDMxODg5	ND	201	Damaged = 96 Healthy = 105	96	160	160	160	0

Std: standard deviation.

The names assigned to each dataset do not correspond to the actual names of the railway lines and were selected by SNCF to ensure data anonymity.

With 12 sensor directions and 8 features corresponding to the measured signals in each direction, a total of 96 feature values are computed for each HZs/DZs.

Since the information extracted from four different lines is considered as different domains, in total 12 different permutations of source and target data selection can be considered. Table 4 presents the list of tasks that are designed for this study. In each task, feature matrix of one of line is selected as source domain data and feature matrix of another line is selected as target domain data. The designed tasks cover potential scenarios typically encountered in real applications.

Table 4.

The details of different tasks ( $D^{s} \to D^{t}$ ).

Task	1stTask	2ndTask	3rdTask	4thTask	5thTask	6thTask	7thTask	8thTask	9thTask	10thTask	11thTask	12thTask
Source line ( $D^{s}$ )	Mj	Mj	Mj	NT	NT	NT	ND	ND	ND	Nz	Nz	Nz
Target line ( $D^{t}$ )	NT	ND	Nz	Mj	ND	Nz	Mj	NT	Nz	Mj	NT	Nz

In order to evaluate the performance of the proposed framework based on PDALC, several state-of-the-art UDA methods are employed for comparison: LRSR,³² GFK,³³ TCA,³⁴ and McDA.³⁵ Moreover, considering the time-consuming nature of the two loops within PDALC, the number of iteration $T$ and $t$ are straightforwardly set to 5 and 15, respectively. To streamline the process, $φ$ is established as $1 / t$ , while the threshold $ϵ$ is configured to the third smallest value in $G$ . The specific parameter settings for PDALC are provided in Table 5. The linear kernel chosen for SRM kernel.

Table 5.

The details of parameter settings.

$μ$	$β$	$λ$	Dimension
1.00	0.060	0.32	30

On the other hand, to assess the effectiveness of the UDA method, the disparity between domains in the latent space is quantified. Ben-David et al.⁵⁵ introduced the ProxyA-Distance (PAD) metric to quantify the similarity between the feature representations of samples from the source and target domains in DA scenarios. The PAD is computed using samples from the latent space, which are typically used to align discrepancies between domains. It is defined as:

PAD = 2 (1 - 2 ϵ)

(8)

Here, $ϵ$ denotes the generalization error of a classifier trained on the combined dataset of source and target domain samples. A lower PAD corresponds to a higher generalization error, meaning the classifier struggles to differentiate between source and target samples. Therefore, a smaller PAD implies that the feature representations of the source and target domains are more similar, indicating greater domain alignment and proximity.⁵⁵ The PAD is calculated using a binary classifier that is based on a linear support vector machine.

Data distribution shift for train-track systems

The joint distributions of the train vibrations and damage labels are shifted as the train passes by different lines. To intuitively observe the data distribution, two datasets, Line Mj and Line NT, are examined. With a total of 96 features extracted from all the sensors, the feature set is too high-dimensional to be observed directly. High-dimensional features can be visualized in a low-dimensional space using t-distributed stochastic neighbor embedding (t-SNE), a nonlinear dimensionality reduction technique.⁵⁶ Figure 7 shows the t-SNE-based feature visualization results of the recorded samples, where the features are mapped onto a two-dimensional (2D) scatter plot. In this plot, each point represents a sample, with its category indicated by color: red for normal conditions and blue for defected conditions. As seen in Figure 7, the data distributions from the two domains do not exhibit clear separations between categories, indicating that the feature spaces of the two domains are similar but not identical. This suggests that addressing both the marginal and joint distributions is crucial for effective DA. Moreover, the features appear scattered and unclustered, both in the source domain (Mj) (Figure 7(a)) and the target domain (NT) (Figure 7(b)). This disordered distribution implies that a classifier trained on the source domain may not generalize well when applied to the target domain.

Figure 7.

The 2D t-SNE visualization of feature data distributions: (a) source line and (b) target line (before UDA). 2D: two dimensional; t-SNE: t-distributed stochastic neighbor embedding; UDA: unsupervised domain adaptation.

One of the main reasons for the distribution shift observed between datasets from different railway lines is the variation introduced by changes in EOVs during data collection. These EOV-induced shifts significantly affect the statistical properties of the recorded signals, contributing to discrepancies between source and target domains. This distributional mismatch underscores the necessity of applying DA methodologies to develop a generalized railway track geometrical defect detection system. The primary goal of such methods is to discover a shared latent feature space where the source and target datasets can be better aligned, thereby enabling more robust and transferable defect detection across different railway environments.

In order to investigate the impact of PDALC implementation on data distribution, the distribution of the first two features specifically analyzed. Figure 8 illustrates the distribution of maximum of acceleration response that extracted from sensors $ATB - B 4$ and $ATB - B 10$ for both the source and target datasets. Prior to applying the PDALC algorithm, differences in distribution indicators such as mean and Std are observed between the source and target data, as depicted in Figure 8. After implementing the PDALC algorithm, as shown in Figure 9, the data distribution in these two domains became more aligned.

Figure 8.

Scatter plot with marginal histograms of feature data distributions: (a) source line and (b) target line (before UDA). UDA: unsupervised domain adaptation.

Figure 9.

Scatter plot with marginal histograms of feature data distributions: (a) source line and (b) target line (after UDA). UDA: unsupervised domain adaptation.

To demonstrate this process for all extracted features, 2D visualization of feature data distribution following the application of the PDALC algorithm is presented in Figure 10. A comparison with Figure 7 reveals a closer alignment between the data distribution of the source and target domains. Furthermore, the classification of data into normal and defective states of the rail showed improvement post-PDALC algorithm application. A clearer boundary between these two classes emerged, aiding in enhancing the performance of the classification algorithm.

Figure 10.

The 2D t-SNE visualization of feature data distributions: (a) source line and (b) target line (after UDA). 2D: two dimensional; t-SNE: t-distributed stochastic neighbor embedding; UDA: unsupervised domain adaptation.

Performance comparison with baseline method

To quantitatively evaluate the impact of the proposed UDA framework, SRM models are trained as straightforward benchmark machine learning approaches using the labeled source data and subsequently tested on the unlabeled target data. The UDA algorithms employed include TCA, GFK, McDA, LRSR, and PDALC. Additionally, line “Mj” is designated as the source domain, while line “NT” is designated as the target domain. The results are shown in Figure 11. The classification accuracies that displayed in header of each plot are calculated using SRM as the damage classifier. Results indicate that PDALC outperforms other UDA algorithms, achieving an 83% accuracy rate in detecting geometric defects on the railway track of the target line. Compared to the baseline model, PDALC exhibits a notable increase of over 12% in classification accuracy. Among the other algorithms, the performance order is as follows: GFK > LRSR > TCA > McDA > Baseline model. This suggests that utilizing any UDA algorithm enhances classifier performance by at least 4%.

Figure 11.

Comparison of detection accuracy of TCA, GFK, McDA, LRSR, and PDALC on first task, plotting the data in different spaces: (a) the original space, (b) projected spaces of PDALC, (c) projected spaces of TCA, (d) projected spaces of GFK, (e) projected spaces of McDA, and (f) in projected spaces of LRSR. TCA: transfer component analysis; McDA: maximum mean and covariance discrepancy; LRSR: learning via low-rank and sparse representation; GFK: geodesic flow kernel; PDLAC: progressive distribution alignment based on label correction.

In order to verify the validity of the proposed method on a larger data set and also to investigate the effect of changing the source and target data on the accuracy of the PDALC algorithm, a comparison is made between 12 different tasks that is mentioned in Table 4. The experiment results are summarized in Table 6, with the best results highlighted in bold. As anticipated, the SRM classifier trained on unadapted features exhibits poor generalization when applied to another line, which can be considered as employing a classifier trained via a traditional SHM approach. As shown in Table 6, the proposed method based on PDALC has yielded promising experimental outcomes on the Harmotrack dataset. The UDA framework leveraging PDALC has achieved the best results in 9 out of the 12 subtasks, with its average accuracy nearly 11% higher than the baseline model. Moreover, the average accuracy of the PDALC method exceeds that of GFK (the second-best algorithm) by 3%, and its Std is lower than that of GFK. This suggests that while domain shift may impact classification accuracies, such influence can be mitigated using an appropriate DA algorithm. One possible reason for the superior performance of the PDALC method is its integration of a label correction mechanism with manifold subspace learning, which helps preserve the geometric structure of the original data. In each inner loop of the PDALC algorithm, the confidence coefficient matrix (Equation (4)) is updated to enhance the reliability of pseudo-labels for each sample. On the Harmotrack dataset, this label correction procedure improves classification accuracy by approximately 5–7% across different tasks. For example, in task 1, the initial accuracy of the PDALC algorithm is 76%, which increases to 82.9% after applying the label correction mechanism. This improvement corresponds to correcting the labels of approximately 42 samples (calculated as (Final Accuracy − Initial Accuracy) × Number of Target Samples) that were misclassified in the initial step. As a result, the model’s performance improves iteratively, progressively reducing the number of incorrect pseudo labels over time. This iterative refinement enhances the overall accuracy and effectiveness of the model.

Table 6.

Classification accuracy (%) of various UDA algorithm on Harmotrack data set.

	1stTask	2ndTask	3rdTask	4thTask	5thTask	6thTask	7thTask	8thTask	9thTask	10thTask	11thTask	12thTask	Average	Std
Base model (SRM)	70.9	69.6	70.1	65.5	70.6	69.1	68.7	67.0	70.0	70.0	73.3	73.1	69.8	2.1
LRSR	73.8	71.6	72.4	71.1	73.1	71.0	71.8	72.7	73.8	72.5	75.4	82.1	73.4	2.9
TCA	73.5	74.4	71.7	69.0	79.2	75.2	78.8	76.6	78.7	75.1	76.6	77.6	75.5	2.9
GFK	78.8	73.7	75.6	73.0	80.9	77.9	75.2	84.0	78.0	77.5	80.3	86.7	78.4	3.9
McDA	71.9	70.7	73.9	68.0	72.6	73.4	75.4	71.7	76.2	71.7	75.3	75.6	73.0	2.3
PDALC	82.9	79.9	80.9	79.9	81.6	79.5	78.6	78.9	79.8	82.6	83.4	84.6	81.0	1.9
Max.	82.9	79.9	80.9	79.9	81.6	79.5	78.8	84.0	79.8	82.6	83.4	86.7

UDA: unsupervised domain adaptation; PDLAC: progressive distribution alignment based on label correction; TCA: transfer component analysis; McDA: maximum mean and covariance discrepancy; LRSR: learning via low-rank and sparse representation; GFK: geodesic flow kernel; Std: standard deviation.

On other hand, the mean classification accuracy of PDALC for tasks 7, 8, and 9 is 79.08% which is lower compared to other tasks. This suggests that when the size of the source dataset (ND = 201) is considerably lower than the target datasets (Mj = 1755, NT = 710, and Nz = 6244), the effectiveness of the DA framework is limited. Conversely, in tasks where line Nz serves as the source dataset (tasks 10, 11, and 12), the mean classification accuracy is 83.52%. This indicates that when the number of samples in the source data exceeds that of the target data, UDA algorithms can more effectively learn the underlying distribution of each class of dataset.

A comparison of the computing time required for DA across different algorithms is also done, and the results are illustrated in Figure 12. It is notable that, for most tasks, the PDALC algorithm demands more computing time, and this difference escalates in tasks with larger datasets, such as tasks 3, 6 and 10. In these cases, the computing time of the PDALC method exceeds that of the other algorithms by more than fourfold.

Figure 12.

Comparison on training time of four UDA algorithm on different task. UDA: unsupervised domain adaptation.

To illustrate the changes in data distribution before and after applying DA, as well as the variance in performance of different UDA algorithms in aligning data, Figure 13 depicts the histogram of data before and after DA for second task. It can be seen that the PDALC algorithm has exhibited better performance and both marginal and conditional distributions of source and target dataset aligned after DA.

Figure 13.

Data distribution of different UDA algorithms: (a) original space, (b) PDALC, (c) TCA, (d) GFK, (e) McDA, and (f) LRSR. UDA: unsupervised domain adaptation; PDLAC: progressive distribution alignment based on label correction; TCA: transfer component analysis; McDA: maximum mean and covariance discrepancy; LRSR: learning via low-rank and sparse representation; GFK: geodesic flow kernel.

To quantitatively assess this discrepancy alignment, Equation (16) is employed to calculate the values of PADs for all UDA algorithms for second task. The resulting PAD values are as follows: GFK: 1.74, TCA: 1.75, PDALC: 1.06, LRSR: 1.22, and McDA: 1.68.

Among these, the PAD value associated with the proposed PDALC method is the lowest, indicating better alignment of the feature distributions between the source and target domains. This reduced domain discrepancy highlights the effectiveness of PDALC in facilitating DA.

A lower PAD value reflects a smaller divergence between source and target feature distributions, suggesting more successful domain alignment. In the context of railway track monitoring, PAD values typically range between 1 and 2. Values approaching 1 indicate effective adaptation between datasets from different railway lines, which is essential for ensuring robust and reliable defect detection across varying operational and geographic condition.

Sensitivity analysis of the PDALC to machine learning approaches and number of features

To evaluate the effect of the considered classifier approaches on the accuracy of UDA framework based on PDALC, three algorithms of SVM, FNN, and kNN have been selected and their results are compared with classification accuracy of SRM for all tasks. Figure 14 shows the classification accuracy of these algorithms and it be seen that SRM and SVM outperforms other methods. On the other hand, Figure 15 illustrates a comparison between the average classification accuracy of different classification algorithms across 12 tasks before and after applying the PDALC algorithm. Regardless of the classification algorithm used, it is evident that the application of PDALC results in a minimum increase in average accuracy of 5% (for FNN) and a maximum increase of 10% (for SRM).

Figure 14.

Performance comparison on the Harmotrack dataset with respect to different machine learning approaches.

Figure 15.

Comparison of mean of classification accuracy of different classifier: (a) before applying UDA and (b) after applying UDA. UDA: unsupervised domain adaptation.

In most of data-driven approaches within the field of SHM, the careful selection of damage-sensitive features significantly influences the system’s overall performance.⁵⁷ Therefore, this section explores the impact of feature selection on proposed system’s performance, utilizing the minimum redundancy maximum relevance (MRMR) algorithm as a filter-based feature selection method.⁵⁸ The MRMR algorithm identifies an optimal feature set that is mutually dissimilar yet effectively represents the response variable. By minimizing redundancy and maximizing relevance within the feature set, the algorithm ensures efficient representation. Further details on this algorithm can be found in the study by Peng et al.⁵⁸

To assess the effects of employing MRMR as a feature selection step, a process similar to the framework shown in Figure 5 is implemented, with a key variation being the sorting of features based on their sensitivity to defects using MRMR’s feature importance index.⁵⁸ Subsequently, a subset of these features is inputted into the PDALC algorithm. Notably, as the target data label is presumed to be unavailable during the UDA training phase, the same subset of features used in the source data matrix is also applied for testing the PDALC algorithm on the target dataset. The impact of utilizing 5, 10, 20, 40, and 80 selected features on the framework’s performance is depicted in Table 7. Additionally, the system’s classification accuracy when employing all features is provided for comparison.

Table 7.

Performance comparison with respect to different number of features.

This image presents the classification accuracy of PDALC with various parameters: (a) dimension, (b) beta $\\beta$, (c) lambda $\\lambda$, (d) mu $\\mu$, and (e) kernel.

No. of features	1stTask	2ndTask	3rd Task	4thTask	5th Task	6thTask	7thTask	8thTask	9thTask	10thTask	11thTask	12thTask
5	72.8	74.1	68.7	82.8	77.6	77.4	78.3	67.5	74.9	79.0	70.0	73.1
10	74.2	70.1	69.0	81.0	79.6	76.3	76.9	73.5	79.1	80.2	71.8	83.1
20	77.9	68.7	78.0	81.7	80.6	78.9	77.5	69.6	80.4	75.7	73.0	72.1
40	80.0	77.1	77.1	80.6	80.6	76.2	75.6	73.1	75.7	75.0	78.5	73.6
80	79.9	76.1	78.6	75.0	77.6	75.9	72.9	73.7	75.2	78.8	78.5	80.6
96	82.9	79.9	80.9	79.9	81.6	79.5	78.6	78.9	79.8	82.6	83.4	84.6

It can be seen that in most tasks, feature selection based on source dataset not only fails to improve system performance but also reduced it. This indicates that while conventional classification methods in the SHM domain typically demonstrate performance improvements with feature selection within a single domain,⁵⁹ the presence of multiple domains introduces domain shifts. Consequently, the features selected based on the source dataset may not necessarily be optimal for the target dataset. Moreover, given that discrepancy-based UDA algorithms map data to a latent space where domain differences are minimized, reducing input data dimensions in the primary space negatively impacts overall system performance.

It should be noted that, as demonstrated in our previous work,⁴ applying a filter-based feature selection algorithm on data from one of the railway lines identified the Std and RMS of sensor direction 2, along with Std of sensor direction 3, as highly informative features for assessing track condition. Notably, about 95% of the selected features originated from sensors mounted on the train’s bogie, highlighting their greater sensitivity to track geometry defects. Furthermore, 95% of the most relevant features were concentrated among three statistical indices: RMS, Std, and peak value, underscoring their significance despite some correlation between them. The MRMR algorithm inherently addresses such correlations by selecting features with high relevance and minimal redundancy. However, as shown in Table 7, applying MRMR in the current study did not yield a significant improvement in performance. This is likely due to the domain shift present, where features selected from the source domain do not necessarily generalize well to the target domain.

Sensitivity analysis of the PDALC parameters

As mentioned in the second section, PDALC has three adjustment parameters $λ$ , $β$ , and to $μ$ . In addition, the dimension of subspace and the type of kernel used for the algorithm can also be changed. To explore the parameter sensitivity of PDALC algorithm, a sensitivity analysis is carried out on the tasks 1, 2, and 3. It can be seen from Figure 16(a) to (c) that $λ$ , $β$ and dimension remain stable across various numerical intervals. However, the proposed method exhibits robustness to variations in $μ$ within a specific range. As $μ$ increases, the performance of the method on the datasets improves, eventually stabilizing within the range of 0.8 to 1.1. In addition, the effect of the kernel type on PDALC accuracy is also evaluated. As can be seen in the Figure 16(e), the best performance of PDALC is achieved when a linear kernel is chosen to map the data into the kernel space.

Figure 16.

Classification accuracy of PDALC with different parameters: (a) dimension, (b) beta $β$ , (c) lambda $λ$ , (d) mu $μ$ , and (e) kernel. PDLAC: progressive distribution alignment based on label correction.

Sensitivity analysis of PDALC to sensors layout

This section examines the influence of accelerometer locations and orientations on the detection accuracy of the PDALC algorithm by defining six distinct combination layouts. The layouts are organized based on the sensors’ positions along the train (front, middle, and rear) and at two vertical levels (car body and bogie). Additionally, since acceleration data are captured in both horizontal and vertical directions, the influence of sensor orientation on the detection performance of the proposed framework is examined for each layout. This ensures a comprehensive analysis of how sensor placement and direction affect the system’s detection capabilities. Therefore, three sensor configurations are defined for each sensor layout: (1) only transversal direction, (2) vertical direction, and (3) all available directions. Table 8 gives the detail of the layouts.

Table 8.

Proposed sensor layouts based on sensor locations.

	Sensor direction	Sensor configuration	Tail, middle, and head	Sensorconfiguration	Tail and head
Car body and bogie			Layout 1		Layout 2
	Transversal	$SC 1$	$S 1, S 2, S 3, S 4, S 5, S 6$	$SC 4$	$S 1, S 3, S 4, S 6$
	Vertical	$SC 2$	$S 7, S 8, S 9, S 10, S 11, S 11$	$SC 5$	$S 7, S 9, S 10, S 12$
	All direction	$SC 3$	$S 1 - to S 12$	$SC 6$	$S 1, S 3, S 4, S 6, S 7, S 9, S 10, S 12$
Car body			Layout 3		Layout 4
	Transversal	$SC 7$	$S 4, S 5, S 6$	$SC 10$	$S 4, S 6$
	Vertical	$SC 8$	$S 10, S 11, S 12$	$SC 11$	$S 10, S 12$
	All direction	$SC 9$	$S 4, S 5, S 6, S 10, S 11, S 12$	$SC 12$	$S 4, S 6, S 10, S 12$
Bogie			Layout 5		Layout 6
	Transversal	$SC 13$	$S 1, S 2, S 3$	$SC 16$	$S 1, S 3$
	Vertical	$SC 14$	$S 7, S 8, S 9$	$SC 17$	$S 7, S 9$
	All direction	$SC 15$	$S 1, S 2, S 3, S 7, S 8, S 9$	$SC 18$	$S 1, S 3, S 7, S 9$

Same as section 5.4, first task (Mj → NT) is chosen for comparative study. The classification accuracy for different layouts before and after implementing UDA scheme on source and target dataset is plotted in Figure 17. In this figure, the black circles indicate the positions of the accelerometers in each layout, while the blue bodies and red bogies represent those equipped with installed accelerometers.

Figure 17.

Comparative results of various sensors layout in using UDA framework based on PDALC. UDA: unsupervised domain adaptation; PDLAC: progressive distribution alignment based on label correction.

According to the figure, layouts generally exhibit greater efficiency when employing both body and bogie accelerometers (layouts 1 and 2). Additionally, the accuracy of layouts utilizing sensors installed on the bogie (layouts 5 and 6) tends to be slightly higher than those using sensors on the car body (layouts 3 and 4) in most cases. This finding is logical as the secondary suspension mitigates some of the track defects’ impact on the car body acceleration. This also indicates that utilizing sensors placed on the axle is necessary to achieve better detection results. In other word, while the implementation of the UDA framework enhances classification accuracy by mapping data to the latent space, the quality of the input data significantly influences the system’s final performance. Thus, selecting more sensitive sensors may increase the likelihood of accuracy improvement.

Another representation of the results is shown in Figure 18. As can be seen, the UDA approaches based on PDALC reduced classification error in all sensor configurations, with the maximum of reduction 12% in SC3 and SC6. However, it’s important to note that since the measurements of the vehicle’s response are obtained real-time and under normal operating conditions, and the datasets are not drawn from a pure multivariate Gaussian distribution, a relatively high number of false alarms are expected for the PDALC. This is evidenced in the latent feature space, where the classification error (17.13%) of the best configuration (SC3) exceeds the level of significance of 5%. Nevertheless, the higher frequency of trips, compared to the less frequent inspections by TGCs, allows for more frequent passes over critical areas of interest. This results in greater statistical confidence regarding the condition of the track, enabling more reliable monitoring and maintenance decisions.

Figure 18.

Comparison of classification error of baseline model versus PDALC. PDLAC: progressive distribution alignment based on label correction.

Finally, to assess the impact of noise on the robustness of the proposed system, a sensitivity analysis was conducted by introducing artificial noise into one of the tasks. Specifically, 5% Gaussian noise was added to the acceleration response data used in task 1 (line Mj → line NT), and the same procedure outlined in Figure 5 was followed. The results show a decrease in classification accuracy from 82 to 78%. Although the accuracy was reduced, the system demonstrated acceptable robustness to 5% noise, which aligns with commonly accepted noise levels in the SHM domain.⁶⁰

Discussion

The results indicate that the proposed PDALC algorithm effectively mitigates the distribution discrepancy between source and target data from different lines in the task of developing a generalizable model for railway track geometrical defect detection. By integrating progressive learning, label correction, and geometric structure learning, PDALC demonstrates a synergistic effect that enhances the model’s ability to preserve data integrity across domains. This results in consistently better performance compared to other UDA algorithms evaluated in this study.

However, as illustrated in Figure 12, the computational demand of the PDALC algorithm across the 12 tasks examined is approximately two to four times higher than that of other algorithms. The highest runtime occurs when the Nz dataset—the largest one, comprising 6244 samples, each representing 60 m of track—is used as the source or target domain. In these cases, PDALC achieves classification accuracy that is at least 10% higher than the baseline model and 5% higher than the next-best UDA algorithm, but with a runtime of approximately 17 min. This underscores a key trade-off between computational efficiency and model performance. In real-time monitoring scenarios, this trade-off must be carefully considered when deploying the method in practice.

Furthermore, noise remains a critical challenge in real-world applications of data-driven methods, particularly those that rely on statistical features derived from acceleration signals. The dataset used in this work was collected from various high-speed railway lines in France, each operating under different EOVs, which naturally introduce a level of noise into the data. This variability is a significant factor contributing to the observed classification accuracy remaining below 90%, even when using different machine learning/UDA models.

Addressing the influence of noise is an essential direction for future research—particularly through the development of preprocessing techniques⁶¹ or robust denoising methods—to further improve the system’s reliability and performance under real-world conditions.

Conclusions

This paper introduces a discrepancy-based UDA framework designed to tackle the domain shift issue in assessing the condition of railway tracks across various lines. A data-driven method utilizing recorded acceleration responses from high-speed trains has been developed to identify track geometric defects. By leveraging data from labeled source lines and unlabeled target lines, the PDALC algorithm facilitates the transfer of both datasets into a common subspace, allowing for effective integration of samples from different domains and reduction of domain discrepancies.

The framework’s effectiveness is demonstrated using a dataset collected from field measurements obtained from the IRIS320 train monitoring system as it traverses four distinct lines. The results highlight the potential of utilizing a discrepancy-based UDA approach for detecting geometric defects. Notably, PDALC surpasses traditional classification methods by achieving higher average classification accuracies, minimizing domain distances, enhancing sample clustering, and ensuring more uniform feature data distributions across domains. This enhanced performance demonstrates the effectiveness of PDALC in handling cross-DA challenges. Among the four UDA methods evaluated, PDALC demonstrates the best performance for distribution alignment.

Furthermore, the framework’s performance is assessed across 12 tasks, highlighting its robustness in cross-domain condition assessment under varied operational conditions. Comparative results reveal that the proposed framework can enhance classification accuracy by up to 12% compared to traditional methods. Additionally, a study evaluates the influence of 18 sensor configurations within six sensor selection layouts on the framework’s accuracy, demonstrating reasonable accuracy levels using an acceleration dataset from sensors mounted on the bogie. However, incorporating a feature selection step on the source dataset results in reduced classification performance due to domain discrepancy shifts.

In addition to the insights gained through the proposed UDA framework and PDALC algorithm, further work can be directed towards investigating the integration of environmental and operational variables such as weather conditions, seasonal effects, and train loading conditions. These variables may introduce further domain shifts in railway track monitoring, potentially affecting defect detection accuracy. Moreover, expanding the dataset to cover a broader range of railway networks, including those with more diverse track geometries and defect types, could provide a more comprehensive validation of the framework’s scalability and robustness. Lastly, incorporating a semi-supervised DA strategy could further improve performance by utilizing a small amount of labeled data from the target domain, potentially offering a balance between fully supervised and unsupervised approaches.

Footnotes

Acknowledgements

The authors are thankful to SNCF Réseau Track and Surroundings Department colleagues and IP3M experts who provided treated HS train IRIS320 data and shared their knowledge and technical results through the international project UIC Harmotrack. The UIC Harmotrack project is an international program that gathers more than 200 railway experts working from 65 railway companies and research institutions in 40 countries.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This publication has emanated from research conducted with the financial support of Research Ireland under grant number 20/FFP-P/8706.

ORCID iDs

Ramin Ghiasi

Abdollah Malekjafarian

References

Peinado Gonzalo

Horridge

Steele

, et al. Review of data analytics for condition monitoring of railway track geometry. IEEE Trans Intell Transp Syst 2022; 23: 22737–22754.

Jamshidi

Hajizadeh

, et al. A decision support approach for condition-based maintenance of rails based on big data analysis ☆. Transp Res Part C 2018; 95: 185–206.

Malekjafarian

Obrien

Quirke

, et al. Railway Track loss-of-stiffness detection using bogie filtered displacement data measured on a passing train. Infrastructures 2021; 6: 93.

Ghiasi

Khan

Sorrentino

, et al. An unsupervised anomaly detection framework for onboard monitoring of railway track geometrical defects using one-class support vector machine. Eng Appl Artif Intell 2024; 133: 108167.

Malekjafarian

OBrien

Quirke

, et al. Railway track monitoring using train measurements: an experimental case study. Appl Sci 2019; 9: 4859.

Malekjafarian

Obrien

Quirke

, et al. Railway track loss-of-stiffness detection using bogie filtered displacement data measured on a passing train. Infrastructures 2021; 6: 1–17.

Malekjafarian

Khan

OBrien

, et al. Indirect monitoring of frequencies of a multiple span bridge using data collected from an instrumented train: a field case study. Sensors 2022; 22: 7468.

Quirke

Obrien

Bowe

, et al. The calibration challenge when inferring longitudinal track profile from the inertial response of an in-service train. Can J Civ Eng 2022; 49: 274–288.

Liu

Bergés

, et al. HierMUD: hierarchical multi-task unsupervised domain adaptation between bridges for drive-by damage diagnosis. Struct Heal Monit 2023; 22: 1941–1968.

10.

Chenariyan Nakhaee

Hiemstra

Stoelinga

, et al. The recent applications of machine learning in rail track maintenance: a survey. In: Reliability, safety, and security of railway systems. modelling, analysis, verification, and certification: third international conference, RSSRail 2019, Lille, France, 4–6 June 2019, Proceedings 3, 2019, pp. 91–105. Cham, Switzerland: Springer.

11.

Arvidsson

Karoumi

. Train–bridge interaction–a review and discussion of key model parameters. Int J Rail Transp 2014; 2: 147–186.

12.

Jos

Flammini

Vittorini

, et al. A systematic review of artificial intelligence public datasets for railway applications. Infrastructures 2021; 6: 136.

13.

OBrien

McCrum

Khan

, et al. Wavelet-based operating deflection shapes for locating scour-related stiffness losses in multi-span bridges. Struct Infrastruct Eng 2023; 19: 238–253.

14.

Pires

Viana

MCA

Scaramussa

, et al. Measuring vertical track irregularities from instrumented heavy haul railway vehicle data using machine learning. Eng Appl Artif Intell 2024; 127: 107191.

15.

De Rosa

Kulkarni

Qazizadeh

, et al. Monitoring of lateral and cross level track geometry irregularities through onboard vehicle dynamics measurements using machine learning classification algorithms. Proc Inst Mech Eng Part F J Rail Rapid Transit 2021; 235: 107–120.

16.

La Paglia

Carnevale

Corradi

, et al. Condition monitoring of vertical track alignment by bogie acceleration measurements on commercial high-speed vehicles. Mech Syst Signal Process 2023; 186: 109869.

17.

Tsunashima

. Condition monitoring of railway tracks from car-body vibration using a machine learning technique. Appl Sci; 9. Epub ahead of print 2019. DOI: 10.3390/APP9132734.

18.

Yuan

Zhu

Chang

, et al. An unsupervised method based on convolutional variational auto-encoder and anomaly detection algorithms for light rail squat localization. Constr Build Mater 2021; 313: 125563.

19.

Jiang

G-F

Wang

S-M

Y-Q

, et al. Unsupervised discrepancy-based domain adaptation network to detect rail joint condition. IEEE Trans Instrum Meas 2023; 72: 1–19.

20.

Gardner

Liu

Worden

. On the application of domain adaptation in structural health monitoring. Mech Syst Signal Process 2020; 138: 106550.

21.

Omori Yano

Figueiredo

da Silva

, et al. Foundations and applicability of transfer learning for structural health monitoring of bridges. Mech Syst Signal Process 2023; 204: 110766.

22.

Gardner

Bull

Gosliga

, et al. Foundations of population-based SHM, Part III: Heterogeneous populations – mapping and transfer. Mech Syst Signal Process 2021; 149: 107142.

23.

Giglioni

Poole

Venanzi

, et al. A domain adaptation approach to damage classification with an application to bridge monitoring. Mech Syst Signal Process 2024; 209: 111135.

24.

Zhao

Zhang

, et al. Conditional adversarial domain adaptation with discrimination embedding for locomotive fault diagnosis. IEEE Trans Instrum Meas 2020; 70: 1–12.

25.

Qin

Huang

, et al. Stepwise adaptive convolutional network for fault diagnosis of high-speed train bogie under variant running speeds. IEEE Trans Ind Informatics 2022; 18: 8389–8398.

26.

Chen

S-X

Zhou

Y-Q

. Wheel condition assessment of high-speed trains under various operational conditions using semi-supervised adversarial domain adaptation. Mech Syst Signal Process 2022; 170: 108853.

27.

, et al. Progressive distribution alignment based on label correction for unsupervised domain adaptation. In: 2021 IEEE international conference on multimedia and expo (ICME), Shenzhen, China, 5–9 July 2021, pp. 1–6. IEEE.

28.

Diaine

Sorrentino

Panunzio

. Detection of track geometry defects using machine learning techniques on onboard accelerations. UIC Harmotrack Project – Technical Report, Sub Working Group 1 B Beta 3, 2022.

29.

Sorrentino

Latestere

, et al. Improved condition monitoring of railway tracks through the analysis of on-board dynamic and geometry measurements on High-Speed Lines. In: World congress on railway research 2022 (WCRR2022), Birmingham, 6–10 June 2022, pp. 1–6. Paris, France: International Union of Railways (UIC).

30.

Gatin

L’Henoret

Isasi

, et al. Track geometry condition monitoring system for non intrusive measurements on commercial trains based on wireless sensor networks. In: 10th world congress railway research (WCRR), Sydney, NSW, Australia, 24–28 November 2013, pp. 25–27. Paris, France: International Union of Railways (UIC).

31.

Ghiasi

Lestoille

Diaine

, et al. Unsupervised domain adaptation for drive-by condition monitoring of multiple railway tracks. Eng Appl Artif Intell 2025; 139: 109516.

32.

Fang

, et al. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Trans Image Process 2015; 25: 850–863.

33.

Gong

Shi

Sha

, et al. Geodesic flow kernel for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Providence, Rhode Island, 16–21 June 2012, pp. 2066–2073. New York: IEEE.

34.

Pan

Tsang

Kwok

, et al. Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 2011; 22: 199–210.

35.

Zhang

Lan

, et al. Maximum mean and covariance discrepancy for unsupervised domain adaptation. Neural Process Lett 2020; 51: 347–366.

36.

Ozdagli

Koutsoukos

. Domain adaptation for structural health monitoring. In: Proceedings of the Annual Conference of the PHM Society, 2020, vol. 12, p. 9. Detroit, Michigan: The PHM Society.

37.

Fang

Liu

, et al. Open set domain adaptation: theoretical bound and algorithm. IEEE Trans Neural Netw Learn Syst 2021; 32: 4309–4322.

38.

Jamil

Verstraeten

Nowé

, et al. A deep boosted transfer learning method for wind turbine gearbox fault detection. Renew Energy 2022; 197: 331–341.

39.

Wickramarachchi

Gardner

Poole

, et al. Damage localisation using disparate damage states via domain adaptation. Data Centric Eng; 5. Epub ahead of print 2024. DOI: 10.1017/dce.2023.29.

40.

Gholenji

Tahmoresnezhad

. Joint discriminative subspace and distribution adaptation for unsupervised domain adaptation. Appl Intell 2020; 50: 2050–2066.

41.

Wang

Zhang

X-L

. Improving pseudo labels with intra-class similarity for unsupervised domain adaptation. Pattern Recognit 2023; 138: 109379.

42.

Belkin

Niyogi

Sindhwani

. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res; 7: 2399–2434.

43.

EN CEN. 13848-1 Railway applications-track-track geometry quality part 1: characterization of track geometry. London: BSI Standards Publication.

44.

Lestoille

Soize

Funfschilling

. Stochastic prediction of high-speed train dynamics to long-term evolution of track irregularities. Mech Res Commun 2016; 75: 29–39.

45.

Lestoille

. Stochastic model of high-speed train dynamics for the prediction of long-time evolution of the track irregularities. Université Paris-Est, 2015.

46.

Lestoille

Soize

Funfschilling

. Sensitivity of train stochastic dynamics to long-term evolution of track irregularities. Veh Syst Dyn 2016; 54: 545–567.

47.

Ballereau

. The train that is revolutionizing railway maintenance and safety, https://www.eurailtest.com/en/train-that-is-revolutionizing-railway-maintenance-and-safety/ (accessed 4 October 2021).

48.

MT 40200 internal SNCF réseau national standard. Paris, France: SNCF Réseau, 2022.

49.

El-Sibaie

Jamieson

Tyrell

, et al. Engineering studies in support of the development of high-speed track geometry specifications. In: ASME international mechanical engineering congress and exposition. New York: American Society of Mechanical Engineers, 1997, pp. 143–150.

50.

Silik

Noori

Altabey

, et al. Selecting optimum levels of wavelet multi-resolution analysis for time-varying signals in structural health monitoring. Struct Control Heal Monit; 28. Epub ahead of print 2021. DOI: 10.1002/stc.2762.

51.

Lei

, et al. Fault diagnosis of rotating machinery based on multiple ANFIS combination with GAs. Mech Syst Signal Process 2007; 21: 2280–2294.

52.

Buckley

Ghosh

Pakrashi

. A feature extraction & selection benchmark for structural health monitoring. Struct Heal Monit 2023; 22: 2082–2127.

53.

Jawalageri

Ghiasi

Jalilvand

, et al. A data-driven approach for scour detection around monopile-supported offshore wind turbines using Naive Bayes classification. Mar Struct 2024; 95: 103565.

54.

Taunk

Verma

, et al. A brief review of nearest neighbor algorithm for learning and classification. In: 2019 international conference on intelligent computing and control systems (ICCS), 2019, pp. 1255–1260. IEEE.

55.

Ben-David

Blitzer

Crammer

, et al. Analysis of representations for domain adaptation. Adv Neural Inf Process Syst 2007: 19: 137–144.

56.

Anowar

Sadaoui

Selim

. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev 2021; 40: 100378.

57.

Ghiasi

Malekjafarian

. Feature subset selection in structural health monitoring data using an advanced binary slime mould algorithm. J Struct Integr Maint 2023; 8: 1–17.

58.

Peng

. Maximum relevance mRMR : discrete variables mRMR : continuous variables additive combination multiplicative combination, 2005, pp. 1–17.

59.

Ghiasi

Malekjafarian

. Monitoring of railway tracks maintenance needs using dynamic responses collected by an in-service train. Railw Eng Sci 2025; 1–28. https://doi.org/10.1007/s40534-025-00380-w

60.

de Castro

Baptista

Ciampa

. Comparative analysis of signal processing techniques for impedance-based SHM applications in noisy environments. Mech Syst Signal Process 2019; 126: 326–340.

61.

Ashkarkalaei

Ghiasi

Pakrashi

, et al. Optimum feature selection for the supervised damage classification of an operating wind turbine blade. Struct Heal Monit 2025; 14759217251313816. https://doi.org/10.1177/14759217251313815

A novel domain adaptation method for drive-by railway track monitoring using progressive distribution alignment based on label correction

Abstract

Keywords

Introduction

Theoretical background

Problem statement based on the DA concept

Progressive distribution alignment based on label correction

Learning the geometric structure

Labels correction

Experimental dataset

The proposed framework

Data pre-processing module

UDA module

Target domain prediction module

The results

Data distribution shift for train-track systems

Performance comparison with baseline method

Sensitivity analysis of the PDALC to machine learning approaches and number of features

Sensitivity analysis of the PDALC parameters

Sensitivity analysis of PDALC to sensors layout

Discussion

Conclusions

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

References