Sage Journals: Discover world-class research

Abstract

Robust and safe industrial operations emphasize asset criticality, reliability and availability through effective maintenance which minimizes failures and costly overhauls. Existing maintenance strategies often fall short of interpreting the underlying mechanisms driving equipment degradation patterns. Prescriptive maintenance advances beyond failure prediction, generating tailored, granular and actionable strategies as knowledge-driven decision supports. Causality discovery is crucial in prescriptive maintenance through investigating root degradation causes and failure propagation, enabling evidence-grounded proactive interventions. However, systematic methodologies for uncovering and quantifying causality driving sensor data dependencies, temporal degradation progression and early warning indicator relationships remain underdeveloped. This paper presents a new causal discovery framework to facilitate prescriptive maintenance. It applies: (1) Transfer Entropy to identify features exhibiting temporal causal influence on degradation, (2) Dynamic Time Warping-based clustering to form “symptom groups” representing shared degradation behaviors, and (3) a deep learning architecture revealing intra-cluster causal relationships and temporal delays. Case studies demonstrate framework utility of transforming multimodal truck sensor data into explainable causal knowledge. Results indicate failure-prone interconnections on individual trucks and fleet-wide diagnostics where key triggers and their evolving roles across different degradation stages are presented. Optimized with expert calibration, this work delivers a prescriptive decision-support foundation to enhance equipment safety and promote operation sustainability.

Keywords

equipment health monitoring prescriptive maintenance causal discovery transfer entropy dynamic time warping deep learning asset reliability

Background

Equipment maintenance is an integral component of industrial operations. It is recognized as a cooperative partner and a profit contributor in organizational planning,¹ given its direct impact on operational safety, asset life longevity, and cost control. Over the decades, maintenance strategies have expanded and diversified, transitioning from corrective maintenance where interventions occur only after failures, to preventive maintenance that follows scheduled servicing typically determined by equipment manufacturers, industry standards and engineer experience. The advent of Internet of Things (IoT), artificial intelligence (AI), and big data analytics has further propelled this evolution, leading to the rise of predictive maintenance. This strategy leverages data modeling and analytics to perform proactive health condition diagnosis and forecast potential failure occurrence, mitigating operational risk and elevating operational efficiency.^2,3 However, its scope remains primarily on temporal failure predictions. This narrow focus, although valuable, limits its ability to provide insights into failure mechanisms and translate raw maintenance data into actionable maintenance solutions. To address this gap, prescriptive maintenance emerges as the most advanced paradigm.^4,5 It represents an analytical framework that integrates sensor diagnostics, machine learning algorithms, and decision support systems. Beyond failure prognosis, its core capability lies in generating high-granular, contextually-aware maintenance directives that consider both equipment-level needs and broad operational constraints.

A building block of prescriptive maintenance framework lies in systematic approach to maintenance knowledge extraction. Prescriptive maintenance is also referred to as knowledge-based maintenance.^6–8 Knowledge discovery synthesizes diverse data sources-sensor readings, past failure and repair records, the factors surrounding equipment, maintenance histories, operational parameters, and domain expertise to establish comprehensive understanding into equipment behavioral patterns. Given that engineering systems require quick and efficient repair or replacement to minimize the loss associated with production losses, lower-level knowledge discovery (i.e. part or component level beyond system level) is particularly important. The resultant knowledge base enables not only pattern identification and anomaly detection but, crucially, reveals the underlying mechanisms of degradation and deterioration across equipment pieces. This knowledge-driven characteristic rationalizes the implemented maintenance activities, as it ensures that the proposed recommendations are technically sound and operationally justifiable. From the organizational management perspective, the capacity of prescriptive maintenance to explain its directives through knowledge richness enhances stakeholder trust and enables continuous refinement of maintenance strategies, ultimately supporting sustained asset reliability.

Causality constitutes an important form of maintenance knowledge which directly maps to the physical mechanisms of failure propagation. It describes a directional relationship where one state (known as the cause) exerts a deterministic or probabilistic influence on another state (known as the effect), with this influence persisting under intervention or manipulation.⁹ Since industrial equipment sensor data is typically time series data characterized by continuous temporal measurements and complex mechanical interactions, the “temporal precedence” principle,^9,10 that is, the cause must precede its effect in time, should be used to describe causality in the present context. Granger¹¹ introduces a widely accepted definition of causality for time series data based on predictability: A variable $V$ is said to Granger-cause another variable $V^{'}$ if knowledge of the past values of $V$ contributes to a statistically significant improvement in the prediction of future values of $V^{'}$ , beyond what can be achieved using only past values of $V^{'}$ itself. Causal discovery is the process of the identification and quantification of causality from observational data,^12,13 which centers on learning the presence, strength, and direction of causal influences among variables. Causal relationships in complex equipment systems are rarely known as a priori. They often remain implicit in traditional maintenance approaches and their acquisition is typically based on engineering expertise and manual analysis. In modern equipment settings, this reliance proves time-intensive, and the results may be incomplete and inaccurate. Automated data-driven causal discovery from continuous sensor data streams presents an intelligent alternative empowering prescriptive maintenance. Through computational causal discovery algorithms and temporal dependency analysis, the discovered relationships identify root causes such as specific component interactions, environmental triggers, and operational stress patterns. They reveal not just “what” is failing (symptoms) but allows engineers to absorb the “why” factors (mechanistic understandings) behind degradation. This allows the development of targeted maintenance strategies optimizing resource allocation and preventing cascading system failures before they manifest. As mentioned in Rotari and Kulahci,¹⁴ moving from predictive to prescriptive models necessitates causal relationship discovery that can guide specific operational interventions, rather than relying solely on observed associations.

Despite the presence in various domains such as healthcare,¹⁵ mineral economics,¹⁶ and transportation,¹⁷ causal discovery analysis in the industrial maintenance field remains a new area.^18,19 Notably, studies on sensor data-driven causal discovery approaches is limited in scope.^20–22 On the specific subject of equipment health monitoring and maintenance, research on causal discovery is comparatively thin as many present studies prioritize failure prediction over explainable causality for maintenance guidance. Pang and Lodewijks²³ introduces a Bayesian inference-based causal modeling method combined with fuzzy logic to diagnose failures and support maintenance decision-making in for conveyor belt braking system. Zhou et al.²⁴ introduces CausalKG, a knowledge graph integrating spot-inspection data and causal relationship knowledge, and develops the CausalKG-ALBERT model to enhance root cause analysis of equipment failures. Vanderschueren et al.²⁵ proposes a causal inference-based approach to optimize preventive maintenance frequency by learning maintenance effect on overhaul and failure rates followed by individualized and cost-effective maintenance schedules. Nadim et al.²⁶ combines machine learning, process mining, and human expertise to generate multi-level causal models for diagnosing abnormal events on real-world mill machine data. Gui et al.²⁷ develops a causal-factors-aware attention network to improve equipment fault prediction by incorporating causal contribution weights of features and time-aware attention mechanisms. However, the notion of causality here is primarily associated with feature contribution to prediction rather than uncovering degradation mechanisms. In general, existing methodologies face limitations. Many approaches are tailored to specific domains or rely heavily on predefined structural models, which constrains their generalizability and adaptability to diverse, data-driven maintenance contexts. The reliance on static causal structures, event data and rule-based frameworks limits their capacity to capture dynamic, time-varying causal relationships in high-dimensional sensor data that is critical for understanding progressive system degradation. Furthermore, methods that integrate probabilistic updates or intervention-based frameworks often focus on high-level causal reasoning without adequately modeling granular, temporal interactions between variables, and thus overlooking subtle degradation patterns. Some approaches also rely on strong prior assumptions or domain expertise before obtaining model-generated causal relationships, which can introduce bias and reduce robustness when applied to complex, multi-component systems with limited prior knowledge. Another critical problem is the lack of explicit mechanisms to compute causal delays between cause and effect. This practice is vital as it quantifies the temporal propagation of causal dependencies among degradation parameters, which is beneficial for early detection of degradation patterns. This in turn optimizes maintenance schedules and minimizes downtime. With respect to prescriptive maintenance, notably, related studies so far are mainly associated with the formation of theoretical structure and the design of a holistic analytical model.^6,28–30 Systematic causal discovery approaches for temporal sensor data in prescriptive maintenance contexts remain underexplored.

This paper proposes a causal discovery framework to support sensor-based equipment prescriptive maintenance. The goal is to reveal underlying degradation mechanism from the provided data and build an evidence-based maintenance knowledge base facilitating maintenance decision-making. In essence, the framework integrates three complementary analytical techniques: (1) Transfer Entropy (TE),³¹ (2) an clustering algorithm driven by Dynamic Time Warping (DTW)^32,33 and (3) Temporal Causal Discovery Framework (TCDF),³⁴ a deep learning causal discovery architecture based on convolutional neural networks with attention mechanism. Each method has exhibited individual promise across various domains. TE has proven effective for causal detection in industrial maintenance settings,^35–37 cryptocurrency forecasting,³⁸ neuroscience,³⁹ etc. DTW, originally developed to handle temporal distortion for speech recognition, has expanded to image analysis,⁴⁰ seismology,⁴¹ human gait capture,⁴² and maintenance-related studies.^43–45 TCDF excels in temporal causal discovery for complex multi-variable time series, finding applications in both maintenance^46–48 and other domains.^49,50 However, effective prescriptive maintenance knowledge discovery presents multifaceted analytical requirements that individual methods would struggle to fulfill. Modern equipment generates high-dimensional sensor data characterized by complex temporal dependencies, nonlinear interactions, and asynchronous degradation patterns across assemblies, components, and parts. Prescriptive maintenance demands systematic extraction of actionable causal knowledge: identifying which sensor parameters causally influence degradation, understanding failure patterns and their organization, and quantifying specific directional causal relationships among these subsystems to enable informed, targeted, and proactive interventions. Deploying these techniques in isolation encounter inherent limitations when addressing these requirements. TE, while powerful for detecting information transfer, may identify relationships that lack practical relevance to failure dynamics when applied to unfiltered high-dimensional sensor streams, and provides limited insight into temporal pattern structures within identified causal features. DTW effectively captures temporal similarity but fundamentally lacks causal detection mechanisms. TCDF alone offers advanced causality discovery capabilities with delay quantification but could face computational and interpretability challenges when processing large-scale, heterogeneous sensor datasets without strategic feature reduction and organization. This methodological fragmentation leaves a critical gap: no systematic knowledge discovery approach exists for cohesively addressing causal feature identification while reducing dimensionality, temporal pattern organization, and granular causal relationship extraction within a cohesively arranged workflow tailored to prescriptive maintenance requirements.

To tackle the challenges aforementioned, this paper proposes a new modular approach establishing synergistic integration of the three aforementioned techniques. Specifically, the contributions are as follows:

A multi-stage causality mining framework that unifies TE, DTW, and TCDF within a unified methodological workflow, where each technique complementarily builds upon the preceding analysis, thereby transforming fragmented techniques into a scalable methodology;

A dual-scale causal investigation strategy that employs TE for causal feature identification with respect to system health index computed via a proposed approach, and TCDF for local causality analysis, where complex non-linear temporal dynamics among sensor measurements are captured, and causal delay information between cause and effect is computed, thereby supporting proactive maintenance planning;

A DTW-based clustering algorithm serving as a strategic intermediary that enables systematic hierarchical transition from macro to micro-scale causal analysis while effectively accommodating heterogeneous sensor response patterns, thereby structuring high-dimensional data into interpretable groups for targeted causality discovery; and

An application to real-world heavy truck sensor data, the SCANIA Component X dataset,⁵¹ representing a new case study focused on causal discovery tasks, with results validating framework applicability and enabling the development of interpretable, degradation class-specific truck causal knowledge that systematically transforms multi-dimensional sensor data into structured, actionable decision supports based on extracted causal features, cluster plots, causal network graphs, and patterns.

The remainder of the paper is organized as follows. The second section provides detailed descriptions of the methodologies. The third section presents the case study along with results. The fourth section discusses the interpretation of the results and their practical implications. The final section concludes the paper and suggests avenues for future research.

Methodology

The developed causal discovery framework operates on structural, equal-sized, multi-feature datasets collected from equipment monitoring systems. Mathematically, one can denote the dataset $X$ as (1):

X = {X_{1}, X_{2}, X_{3}, \dots, X_{u}, y} \in R^{m \times (u + 1)},

(1)

where $X_{i, i \in {1, 2, 3, \dots, u}}$ is the records of one of the $u$ parameters (i.e. features) of the equipment system monitored by sensors, while the total number of records of each is $m$ . $y$ is the target variable, which often reflects the health state variation or overall degradation of the equipment. An overview of the framework architecture is depicted in Figure 1. The framework processes data through three complementary modules: (1) causal feature discovery module, (2) causal feature clustering module, and (3) intra-cluster deep causality learning module. They are introduced in the following sections.

Figure 1.

Diagrammatic illustration of the proposed causal discovery framework.

Identification of causal features for equipment health via transfer entropy

The first step selects features causally linked to system health degradation. These features are crucial for prescriptive actions as they directly influence system behavior when structural equipment failure datasets contain numerous parameters. Hence, this step simultaneously performs dimensionality reduction to enhance computational efficiency. In general, two strategies exist to reach this goal: transformation-based techniques and feature selection techniques. The former, such as Principal Component Analysis,⁵² Independent Component Analysis,⁵³ and Non-Negative Matrix Factorization,⁵⁴ reduce computational burden through lower-dimensional representations but typically create new composite features through mathematical transformations of the original variables. This limits interpretability of individual parameter contributions—a consideration particularly relevant in maintenance applications where tracing system behavior back to specific components is crucial. As such, this research employs a direct reduction of the number of original features. However, common methods in this category such as Pearson correlation,⁵⁵ Spearman rank correlation,⁵⁶ or Mutual Information^57,58 are primarily designed to capture statistical associations. Pearson correlation measures linear relationships between variables, Spearman rank correlation captures monotonic relationships, and Mutual Information detects both linear and non-linear dependencies between variables. These methods however do not reveal the causal relationships between features and the target variable as they do not account for the directionality or temporal dynamics required to infer causality.

This research applies TE to identify causal features. Introduced by Schreiber,³¹ TE is based on the concept of Shannon entropy⁵⁸ to quantify directional information transfer between data series. It evaluates additional information gained about the target’s future by considering the source’s past, effectively isolating the source’s distinct causal contribution from the target’s inherent temporal dynamics. This explainability characterizes the causal influence from source to target. TE offers a non-parametric approach eliminating distributional assumptions, enabling analysis of complex system dynamics without predefined constraints. Unlike correlation-based methods or Granger Causality¹¹ which assumes static linear relationships and requires parametric models, TE quantifies directional information flow and captures non-linear, temporal dependencies, providing robust identification of influential features in high-dimensional equipment systems.

Mathematically, define $X$ as a discrete random variable with possible outcomes { $x_{1}$ , $x_{2}$ , $x_{3}$ , …, $x_{m}}$ , with corresponding probabilities ${p (x_{1})$ , $p (x_{2})$ , $p (x_{3})$ , …, $p (x_{m})}$ . Shannon entropy is a measure of the uncertainty or unpredictability associated with the outcomes of $X$ . It measures the average amount of information required to determine the outcome of $X$ considering all possible outcomes and their probabilities. A low entropy indicates that $X$ is highly predictable, and vice versa. Mathematically, it can be formulated in equation (2):

H (X) = - \sum_{i = 1}^{m} p (x_{i}) \log_{2} p (x_{i}) .

(2)

In the context of causal feature discovery, let $X_{i} = {x_{i, 1}, x_{i, 2}, x_{i, 3}, \dots, x_{i, m}}$ be a series where each $x_{i, r}$ for $r \in {1, 2, 3, \dots, m}$ denotes an individual sensor reading. Let $y$ represent the target variable, expressed as $y = {y_{1}, y_{2}, y_{3}, \dots, y_{m}}$ , with $j \in {1, 2, 3, \dots, m}$ for each $y_{j}$ . TE assumes that both the source and target series follow finite-order Markov process. As such, $y_{j + 1}$ , the next state of $y$ , depends only on its past $q$ states ${y_{j}, y_{j - 1}, y_{j - 2} \dots, y_{j - q + 1}}$ . Similarly, the influence of $X_{i}$ on $y$ is modeled by the embedding of past $p$ samples ${x_{i, r}, x_{i, r - 1}, x_{i, r - 2} \dots, x_{i, r - p + 1}}$ . Here, $p$ and $q$ denote the respective orders of the underline two Markov processes. The consideration of the past observations from $X_{i}$ and $y$ helps capture the temporal dependencies between the two variables over time. Define $H_{1}$ as the entropy reflecting the required information to encode $y_{j + 1}$ given both $X_{i}$ and $y$ with orders $p$ and $q$ respectively. Similarly, define $H_{2}$ as the entropy reflecting the required information to encode $y_{j + 1}$ while assuming independence from $X_{i}$ . Mathematically, the two conditional entropies are formulated in equations (3) and (4), respectively:

\begin{matrix} H_{1} = - \sum_{y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}} p (y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}) \\ \log_{2} p (y_{j + 1} | {y_{j}}^{(q)}, {x_{i, r}}^{(p)}), \end{matrix}

(3)

\begin{matrix} H_{2} = - \sum_{y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}} p (y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}) \\ \log_{2} p (y_{j + 1} | {y_{j}}^{(q)}) . \end{matrix}

(4)

Then, TE from $X_{i}$ to $y$ can be formulated in equation (5) which is the result of $H_{2} - H_{1}$ :

\begin{matrix} T E_{X_{i} \to y} = \sum_{y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}} p (y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}) \\ \log_{2} (\frac{p (y_{j + 1} | {y_{j}}^{(q)}, {x_{i, r}}^{(p)})}{p (y_{j + 1} | {y_{j}}^{(q)})}) . \end{matrix}

(5)

The expanded form of equation (5) can be written as equation (6):

\begin{matrix} T E_{X_{i} \to y} = \sum_{y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}} p (y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}) \\ \log_{2} (\frac{p (y_{j + 1}, {y_{j}}^{(q)}, {x_{i, r}}^{(p)}) p ({y_{j}}^{(q)})}{p (y_{j + 1}, {y_{j}}^{(q)}) p ({y_{j}}^{(q)}, {x_{i, r}}^{(p)})}) . \end{matrix}

(6)

In fact, $T E_{X_{i} \to y}$ measures the reduction in uncertainty of future values of $y$ given past values of both $X_{i}$ and $y$ , compared to only considering alone the past values of $y$ . $T E_{X_{i} \to y} > 0$ indicates that the feature $X_{i}$ is causally linked to $y$ . A higher TE value indicates that the present feature provides more information about the future state of the target, beyond what can be predicted from merely using the target’s past information. Hence, it suggests a stronger causal relationship from the feature to the target. This way, TE can effectively facilitate the determination of a subset of causal features with respect to the given target variable. Note that TE equals to zero indicates the target series, conditioned on its past, is independent of the past of the source series.⁵⁹

Clustering causal features based on dynamic time warping

While TE reduces dimensionality and uncovers initial causality with respect to system degradation, a core challenge is sensor parameter measurements can respond asynchronously to system-wide failures. Sensor signals exhibit high variability due to diverse operational dynamics—components operating under varying demands, conditions, loads, or environments—complicating data comparison and interpretation. Additionally, they are often unaligned due to measurement errors, sensor failures, or data processing inconsistencies, which can result in partial signal loss, unsynchronized timestamps, or irregular patterns. Maintenance engineers typically diagnose by examining individual sensor outputs or following standardized troubleshooting protocols, but this strategy is labor-intensive and may fail to extract critical causal chains explaining system degradation, as these relationships remain hidden in noisy data streams.

To address these problems and optimize diagnosis management, this paper introduces a clustering module following causal feature selection. An algorithm empowered by DTW^32,33 to cluster causal features based on temporal patterns is proposed. Since sensor data exhibit temporality and continuity, they contain trends and numerical patterns emerging during degradation. This paper argues that similar features with trends and patterns constitute a “symptom group” that conveys partial but unique and augmented diagnostic information linked to degradation. Each cluster is composed of signal sets presenting similar temporal behaviors and collectively responding to system degradation. In practical terms, the cluster can correspond to a group of performance parameters from the same system where the underlying time series show shared trends and patterns from task initiation to repair intervention. Alternatively, the cluster can include sensor series from different assemblies, subassemblies or parts that are physically distinct yet exhibit similar responses to the system-wide anomaly. For instance, data transmitted from the brakes, the radiator, and the cooling fan of a truck may display comparable patterns during an engine overheating event even though these parts are functionally separate. DTW-based clustering is chosen over conventional methods (e.g. k-means, hierarchical clustering) specifically for its capacity to handle temporal misalignment and varying degradation rates across sensors. Standard distance metrics fail to capture similarity in temporally distorted signals, whereas DTW’s elastic matching aligns degradation patterns regardless of temporal shifts. This clustering step assists maintenance teams in structurally exploring degradation-related causal relationships embedded within multisource noisy data through intra- and cross-cluster sensor examination. In what follows, the DTW method and the clustering algorithm are presented.

Dynamic time warping

DTW determines the similarity between two data series that are usually asynchronous through finding a globally optimal alignment, a.k.a. an optimal warping path. Also returned along with this path is the shortest distance between the two sequences, which quantifies the minimum cumulative cost of warping one series to align with the other. This distance effectively captures their similarity while accommodating potential shifts and distortions in time. The optimal warping path and the corresponding distance are coexistent; the path indicates how the two sequences are aligned, while the distance reflects the degree of similarity.

While DTW can accommodate series of different lengths, for simplicity, this paper considers series of equal lengths. As such, define in equation (7) the collection of causal features $X_{cf}$ determined by TE:

X_{cf} = {X_{1}, X_{2}, X_{3}, \dots, X_{n}} \in R^{m \times n},

(7)

where $X_{i} = {x_{i 1}, x_{i 2}, x_{i 3}, \dots, x_{im}}$ and $X_{j} = {x_{j 1}, x_{j 2}, x_{j 3}, \dots, x_{jm}}$ denote any two series of the same length $m$ . $X_{i} \in X_{cf}$ and $X_{j} \in X_{cf}$ for $i, j \in {1, 2, 3, \dots, n}$ and $i \neq j$ . $n$ is the total number of the causal features. $x_{ir}$ and $x_{js}$ for $r \in {1, 2, 3, \dots, m}$ and $s \in {1, 2, 3, \dots, m}$ represent individual sensor readings of $X_{i}$ and $X_{j}$ respectively. A distance measure needs to be defined to quantify the similarity between the two points. A larger distance indicates a higher cost, thereby indicating a lower similarity, and vice versa. Subsequently, a distance matrix $M \in R^{m \times m}$ is constructed with each entry $M (r, s) = dist (x_{ir}, x_{js})$ being the pairwise distance between each pair of points in $X_{i}$ and $X_{j}$ .

The next step is to align the two series by computing a warping path that quantifies their similarity. Denote a warping path as a sequence $T = {(t_{1}), (t_{2}), (t_{3}), \dots, (t_{L})}$ , where each element $t_{l} = (r_{l}, s_{l})$ represents an index pair that aligns with the sensor records from $X_{i}$ and $X_{j}$ for $l \in {1, 2, 3, \dots, L}$ . The warping path must satisfy the following constraints:

Boundary Constraint. It requires that the warping path starts at the initial point of both series and ends at the final point of both series. The constraint can be expressed as equation (8):

t_{1} = (1, 1) \land t_{L} = (m, m) .

(8)

Monotonicity Constraint. It requires that the warping path must progress through time without reverting to previous points. The constraint can be expressed as equation (9):

r_{l} \leq r_{l + 1} \land s_{l} \leq s_{l + 1} \forall l \in {1, 2, 3, \dots, L - 1} .

(9)

Step Size Constraint. It requires that the warping path must maintain a smooth transition between points in the series without abrupt jumps in the indices. The constraint can be expressed as equation (10):

\begin{matrix} r_{l + 1} - r_{l} \in {0, 1} \land s_{l + 1} - s_{l} \in {0, 1} \forall l \\ \in {1, 2, 3, \dots, L - 1} . \end{matrix}

(10)

The total distance of $T$ , defined as $D_{T} (X_{i}, X_{j})$ , is equal to the sum of the distance measures for each aligned pair of the points along the path. It can be mathematically formulated as equation (11):

D_{T} (X_{i}, X_{j}) = \sum_{l = 1}^{L} (x_{{ir}_{l}}, x_{{js}_{l}}) .

(11)

Here, $L$ represents the length of the warping path (i.e. the number of alignment pairs in $T$ ), which can differ from $m$ due to the one-to-many alignments permitted by DTW to handle temporal distortions. $x_{{ir}_{l}} \in X_{i}$ and $x_{{js}_{l}} \in X_{j}$ denote the specific elements aligned at position $l$ in the warping path $T$ , with $r_{l}$ and $s_{l}$ being the respective position indices in sequences $X_{i}$ and $X_{j}$ . The objective now is to find the minimum cumulative distance $D_{T^{*}} (X_{i}, X_{j})$ that corresponds to the optimal warping path $T^{*}$ . This minimum distance can be formulated as equation (12):

D_{T^{*}} (X_{i}, X_{j}) = min_{T} {D_{T} (X_{i}, X_{j})} .

(12)

The minimum distance is computed using a dynamic programming approach, where a cumulative cost matrix $M_{c} \in R^{m \times m}$ is recursively filled based on the optimal alignment path. The matrix entry $M_{c} (r, s)$ denotes the minimum cumulative distance to align the first $r$ elements of $X_{i}$ with the first $s$ elements of $X_{j}$ . The initial conditions are set as follows:

M_{c} (1, 1) = dist (x_{i 1}, x_{j 1}),

(13)

\begin{matrix} M_{c} (r, 1) = M_{c} (r - 1, 1) + dist (x_{ir}, x_{j 1}) \forall r \\ \in {2, 3, 4, \dots, m}, \end{matrix}

(14)

\begin{matrix} M_{c} (1, s) = M_{c} (1, s - 1) \\ + dist (x_{i 1}, x_{js}) \forall s \in {2, 3, 4, \dots, m} . \end{matrix}

(15)

$M_{c} (1, 1)$ in equation (13) corresponds to aligning the first elements $x_{i 1}$ and $x_{j 1}$ from both series. In equation (14), $M_{c} (r, 1)$ represents the minimum cumulative cost to align the first $r$ elements of $X_{i}$ with only the first element $x_{j 1}$ of $X_{j}$ . Equation (15) can be defined in a similar way. Thereafter, the cumulative cost for $M_{c} (r, s)$ is computed recursively by the following equation in equation (16):

\begin{matrix} M_{c} (r, s) = dist (x_{ir}, x_{js}) \\ + min {M_{c} (r - 1, s - 1), M_{c} (r - 1, s), M_{c} (r, s - 1)} . \end{matrix}

(16)

Effectively, the cumulative cost at each point is determined by adding the current distance between $x_{ir}$ and $x_{js}$ to the minimum of the cumulative costs of the three preceding points. The optimal DTW distance between $X_{i}$ and $X_{j}$ is given by $M_{c} (m, m)$ being the final entry of $M_{c}$ , whereas the optimal warping path can be traced back from this entry following the path of minimum cumulative costs.

The proposed clustering algorithm

The following introduces the proposed clustering algorithm:

DTW distance matrix establishment

Once the DTW distances between each series pair are obtained, these values form a distance matrix $ds \in R^{n \times n}$ that reflects pairwise temporal similarities of all causal features.

Cluster formation through optimal number of neighbors

The algorithm now performs causal feature clustering via a neighbor-based approach. In particular, it automatically determines the optimal number of neighbors evaluated by silhouette score.⁶⁰ The silhouette score informs the quality of clusters by assessing how similar an element is to its own cluster compared to other existent clusters. For a given series $v$ , the silhouette score $s (v)$ is defined as equation (17):

s (v) = \frac{f (v) - e (v)}{\max (e (v), f (v))},

(17)

where $e (v)$ is the average distance from $v$ to all other series $w$ in the same cluster $C$ as defined in equation (18), and $f (v)$ is the minimum average distance from $v$ to all series $w$ in any neighboring cluster $C^{'}$ where $C^{'} \neq C$ , as defined in equations (19) and (20):

e (v) = \frac{1}{| C | - 1} \sum_{w \in C, v \neq w} dis t_{DTW} (v, w),

(18)

f {(v)}_{C^{'}} = \frac{1}{| C^{'} |} \sum_{w \in C^{'}} dis t_{DTW} (v, w) for each C^{'} \neq C,

(19)

f (v) = \min f {(v)}_{C^{'}} .

(20)

The algorithm begins by initializing the best silhouette score to −1 and the optimal number of neighbors to 1. The maximal number of neighbors $\max_neighbors$ is defined based on user needs. In general, for each neighbor number $q$ to be tested:

The algorithm performs initial clustering based on the current $q$ , where each feature is analyzed in relation to its closest $q$ neighbor(s) within $ds$ .

A merging process refines the initial clusters to eliminate redundancies. This step employs a transitivity-based approach, wherein clusters exhibiting overlapping members are combined to generate coherent groupings.

The quality of the merged clusters is evaluated using the average silhouette score via the function $EvaluateClusteringQuality (ds, merged_clusters)$ . The DTW distance metric denoted as $dis t_{DTW}$ is employed for silhouette computation.

If only a single cluster is formed during any iteration, the algorithm terminates early by returning the current optimal configuration. Otherwise, if the calculated average silhouette score exceeds the current optimal score, the algorithm updates the clustering configuration accordingly. This process continues iteratively, and after all values of $q$ are evaluated, the algorithm returns the optimal configuration based on the highest silhouette score.

The pseudocode in Figure 2 summarizes the algorithm. It is important to note that this algorithm is motivated by the proposed clustering algorithm in Mitici and De Pater⁴⁴ The present approach introduces several distinctions. The previous work focused on the clustering of health indicators for degradation modeling and Remaining Useful Life (RUL) estimation using normalized DTW distances based on graph-theory, where the number of neighbors is assigned statically. In this research, the algorithm is designed for grouping sensor measurements for a sounding causality discovery process. It develops an iterative process that allows that the definition of the number of the neighbors is based on user preference. Furthermore, the optimal number of neighbors is determined automatically based on a quantitative clustering quality computation.

Figure 2.

Pseudocode of the proposed clustering algorithm.

Intra-cluster deep causality learning via attention-based convolutional neural networks

Following causal feature cluster obtention, the next step extracts actionable insights to guide maintenance activities and improve health monitoring. This research argues that identified features within each cluster are inherently linked through causal relationships due to shared causal responses to system failure evolution. These relationships can be obscured in raw sensor data by high feature numbers, data noise, and non-linear dependencies. Relationship complexity varies—causative interactions can be binary, ternary, or more complex causality chains. Causality chains involve multiple sensor series, systematically explaining how changes propagate over time. This cascading effect occurs sequentially, reflecting the sequential nature of failure propagation. Quantifying these temporal delays is critical for prescriptive maintenance, as delay values indicate the available intervention windows between detecting causal triggers and observing their downstream effects. For example, haul truck tire pressure abnormalities temporally precede braking system pressure increases, followed by brake pad deviations as pads wear rapidly due to increased demand. This simple example suggests a causative chain with time delay where the nature and timing of sensor responses to the fault mechanism(s) can vary. To illustrate further, in a hypothetical symptom group cluster identified in an excavator hydraulic system, a drop in hydraulic pressure is anterior to a reduction in actuator velocity, which in turn may lead to an increase in motor current as the system compensates for the loss in efficiency. Discovering such cascading mechanisms and examining temporal order among features is critical for understanding how factors influence one another, revealing event sequences contributing to equipment degradation.

Figure 3 depicts an illustrative graph of the causal relationships among four hypothetical sensor parameters. Each vertex (P1, P2, P3, and P4) represents a causal feature. Arrowheads indicate causal influence direction, while edge values indicate time delays between cause and effect. Zero implies instantaneous causal effects. For example, P1 causatively affects P2 with a delay of three timesteps. it is important to note that not all features within the same cluster would contribute equally to failure progression. Some features are primary drivers directly influencing system behavior, while others present less significance, only amplifying effects. This distinction can equally exist across observed clusters. Although clusters are based on temporal similarities, features from different clusters may share functional similarities or monitoring categories. Therefore, effective causality analysis should identify dominant factors or recurring elements most critical for maintenance from cross-cluster interactions. This refines explainability of extracted causal features, ensuring relevant elements are emphasized such that maintenance teams can concentrate efforts on critical aspects with improved efficiency.

Figure 3.

An example of causal relationship graph.

To address the above challenges, this research applies attention-based convolutional neural networks known as TCDF introduced by Nauta et al.³⁴ This approach is selected due to its maintenance-specific advantages. Constraint-based methods (e.g. PC algorithm⁶¹) can struggle with high-dimensional time series and cannot compute delays. Classical statistical approaches—whether requiring pre-specified causal structures such as Structural Equation Modeling⁶² or assuming linear functional forms such as Granger causality—impose parametric assumptions incompatible with the complex, non-linear, and exploratory nature of machinery degradation dynamics. Conversely, the deep learning architecture of TCDF allows to automatically learn complex non-linear patterns from sensor data without parametric constraints. The attention mechanism quantifies feature causal contribution, while causal validation distinguishes true causality from spurious associations. Furthermore, the temporal delay computation uniquely provides the intervention timing windows essential for proactive maintenance. TCDF is a composite architecture of convolutional neural networks (CNN)⁶³ that can extract causative relationships from multivariate data. A CNN is a deep learning model built on convolutional layers. CNNs apply filters (i.e. kernels) to detect temporal patterns and dependencies in the data, capturing both local and global behaviors over time. This structure allows efficient processing of multi-dimensional data and identification of pertinent patterns. The ability to automatically learn features without manual extraction makes CNNs particularly suitable for detecting subtle temporal relationships. Additionally, CNNs benefit from parameter sharing, which reduces computational load when handling complex data streams. The following provides an overview of the core TCDF mechanism in the context of this research. Readers are encouraged to consult Nauta et al.³⁴ for more details.

TCDF aims at analyzing the potential causal relationship among multiple sensor features and determining the time delays between causes and effects, if any. For each CNN used under TCDF, the input is complete cluster-specific causal features. Let $P = {X_{1}, X_{2}, X_{3}, \dots, X_{h}} \in R^{m \times h}$ represent a cluster of size $h$ containing causal features as returned by the developed clustering method. $X_{i} = {x_{i 1}, x_{i 2}, x_{i 3}, \dots, x_{im}}$ for $i \in {1, 2, 3, \dots, h}$ represents a certain causal feature, whereas $X_{j} = {x_{j 1}, x_{j 2}, x_{j 3}, \dots, x_{jm}}$ for $j \in {1, 2, 3, \dots, h}$ and $j \neq i$ is another feature within the same cluster. The execution of TCDF includes primarily the following steps:

Feature prediction: The first step involves predicting each future values of $X_{i}$ based on its historical values together with those of the rest features in $P$ . TCDF employs $h$ independent CNNs, with each being dedicated to one of the features in the cluster. Specifically, denoted $CN N_{i, i \in {1, 2, 3, \dots, h}}$ , each CNN is responsible for predicting the values of $X_{i}$ . $CN N_{i}$ has $h$ channels which corresponds to the number of associated causal features within the cluster. As such, the input of $CN N_{i}$ is associated with all features within the cluster, including $X_{i}$ itself. As an example, when predicting the value $x_{i τ, τ \in {1, 2, 3, \dots, m}}$ at timestep $τ$ , $CN N_{i}$ applies a kernel of size $σ$ to $X_{j}$ in channel $j$ , such that the kernel weights perform an element-wise product with the sequence ${x_{j (τ - σ + 1)}, x_{j (τ - σ + 2)}, x_{j (τ - σ + 3)}, \dots, x_{j τ}}$ . This is similar for the rest channels except channel i, where the sequence ${x_{i (τ - σ + 1)}, x_{i (τ - σ + 2)}, x_{i (τ - σ + 3)}, \dots, x_{i (τ - 1)}}$ excludes the present value to avoid data leakage. Each channel within $CN N_{i}$ uses a different kernel for its respective feature. Furthermore, TCDF considers left zero padding to address the lack of available past values when predicting the first elements in a sequence. The overall objective is to minimize the loss $L$ between the true and predicted values. The final prediction for $X_{i}$ is obtained by merging the results across all channels. TCDF uses a PReLU activation function after each convolution layer which could potentially improve training performance. The kernel size $σ$ , the dilation coefficient and the number of hidden layers are user-defined parameters.

Attention-driven potential causal feature determination: Traditional CNNs are limited when being used to discover causal relationships in data due to their uniform treatment of input features and inability to selectively focus on relevant signals. Furthermore, they rely on local receptive fields, which restrict their ability to capture long-range dependencies between features over time. TCDF architecture addresses this limitation by adopting an attention mechanism. Each $CN N_{i}$ is assigned with a vector $μ$ composed of attention scores, where:

μ = {μ_{i, j} | j \in {1, 2, 3, \dots, h} .

(21)

Here, $μ_{i, j}$ quantifies the level of attention $CN N_{i}$ allocates to $X_{j}$ when predicting $X_{i}$ . $j$ equals to $i$ is possible since TCDF allows self-attention. Intuitively, a higher value of $μ_{i, j}$ implies that $X_{j}$ is more probable to be a cause to $X_{i}$ . The score values are initialized as 1 prior to training. During the training, they are element-wise multiplied with the inputted features and receive updates during backpropagation until the predefined training epoch is reached. Eventually, the application of a semi-binarization function called HardSoftmax³⁴ allows for identifying a group of potentially causative features to $X_{i}$ whose attention scores are above a computed threshold.

Causal feature validation: To determine genuine causal features, TCDF shuffles each potential cause $X_{j}$ as discovered in the previous step. The rearranged series and the unaltered series constitute a new input to trained $CN N_{i}$ for predicting $X_{i}$ . Define $L_{init}$ as the training loss at the first epoch on the original dataset, $L_{fin}$ as the final loss using original dataset, and $L_{fin}^{'}$ as the final loss using intervened dataset. After shuffling, $L_{fin}^{'}$ is expected to be greater than $L_{fin}$ . The validation process involves a scaler $η (0 \leq η \leq 1)$ and results in two scenarios:

1) If $(L_{init} - L_{fin}^{'}) \leq η \cdot (L_{init} - L_{fin})$ , then the alternation in the order of $X_{j}$ disrupts the prediction performance significantly.³⁴ Consequently, $X_{j}$ is considered as a true cause of $X_{i}$ .

2) If ( $L_{init} - L_{fin}^{'}) > η \cdot (L_{init} - L_{fin})$ , then the alternation in the order of $X_{j}$ does not significantly worsen the prediction performance.³⁴ Consequently, $X_{j}$ is unlikely a cause of $X_{i}$ .

Temporal delay identification: The temporal delay $d_{j, i}$ represents the number of timesteps after which $X_{i}$ is causally affected by $X_{j}$ . TCDF detects temporal delay between $X_{i}$ and $X_{j}$ based on kernel weights. In essence, the kernel weights serve as an indicator of the strength of connection between neurons in adjacent layers. A higher weight represents a stronger temporal dependency between the past value and the current output. As such, TCDF computes the temporal delay by identifying the path with the highest weights through which the input propagates its influence on the output. The delay is then calculated as the difference between the time index of the output node and the input node along that path, which effectively represents the time lag in the causal relationship.

Temporal causal graph generation: TCDF will finally present a graph illustrating extracted causal relationships among features and cause-effect delays.

Figure 4 illustrates the causal discovery workflow of TCDF using “TCDF 1” from Figure 1 as an example.

Figure 4.

Mechanism by which TCDF achieves deep causal discovery (adapted from Nauta et al.³⁴).

Case studies

This paper applies the proposed causality discovery methodology to the publicly available SCANIA Component X dataset⁵¹ to validate its effectiveness in a real-world context.

Introduction to dataset

SCANIA Component X is a large-scale, multivariate dataset recently published in the field of maintenance prognosis. This dataset involves a range of data types that provide a comprehensive view of health, performance, and maintenance history of an engine component, all collected from SCANIA heavy trucks operating under varying conditions. In essence, the dataset covers information of three perspectives explained as follows:

Truck operational records: Temporal sensor measurements reflecting real-time conditions across the trucks’ available operational lifespans. They reflect the evolution of component health of each truck.

Truck repair records: For each truck, the duration of operational time of component X is provided, together with a binary marking if the component receives a repair (a value of 1) or not (a value of 0) at the end of the lifecycle.

Truck specifications: Technical configurations across eight categories of each involved truck.

Due to proprietary reasons, some data handling measures have been applied in the original dataset. For example, sensor feature names in the operational records are fully anonymized, each truck is assigned a unique ID instead of its actual identity number, and data are subjected to scaling and transformations.

Study design and multi-stage data process

General preparation

The case studies aim to discover features causally influencing Component X degradation in each investigated truck, identify symptom group clusters, and explore key factors and causal relationships reflecting how these features interact and contribute to the degradation process. This requires run-to-failure data—temporal records capturing equipment’s full operational lifecycle. Among available training, validation, and test datasets, the training set is selected since it includes uninterrupted operational records for each truck and exclusively possesses failure time records. To align with study objectives, only operational sensor records from trucks experiencing recorded failures (repair indicator = 1) are retained, resulting in 105,852 temporal sensor records from 2272 failed trucks. Component X is a complex system recorded by 105 anonymized features: eight numeric counters providing accumulative condition measurements, and 97 features derived from six histogram variables aggregating sensor readings into discrete bins based on predefined operational conditions or measurement ranges. Each bin feature captures measurement frequency at each timestep, offering integrated operational pattern views over time. The preprocess continues with low-variance feature removal followed by NaN value imputation using linear interpolation, which preserves degradation trends without introducing bias. Low-entropy features are then removed by calculating entropy based on equation (2). While low-variance removal targets minimal temporal variation, low-entropy removal focuses on insufficient value distribution diversity. These strategies altogether preserve only informative, dynamic degradation-related features, with the low-entropy threshold set at entropy value 10 based on Figure 5.

Figure 5.

Feature entropy distribution.

A key task is to categorize the trucks in the training dataset into distinct classes based on the temporal proximity of the last operational readout to the failure event. In fact, Kharazian et al.⁶⁴ introduces a classification scheme for the validation set. The scheme divides the failed trucks into five categories defined by specific time windows leading up to the failure. These time windows are as follows: (1) more than 48 timesteps, (2) 48 to 24 timesteps, (3) 24 to 12 timesteps, (4) 12 to 6 timesteps, and (5) 6 to 0 timesteps before the failure. Each case corresponds to a class label ranging from 0 to 4. By adopting the same scheme, the difference between the duration of operational time and the last recorded timestep is computed for each truck in the training set, followed by the assignment of correct class to the truck. The distribution of severity classes of 2272 trucks is plotted in Figure 6. For the case studies, Class 0 truck units are excluded due to the smallest size and the lowest severity. For the remaining classes, 4 trucks from Class 1, 6 trucks from Class 2, 10 trucks from Class 3 and 23 trucks from Class 4 are randomly selected. More trucks are selected from Classes 3 and 4 than from Classes 1 and 2 to reflect the greater criticality of later classes due to their proximity to failure. This selection approach prioritizes methodological clarity and practicality while ensuring sufficient representation of critical degradation-related cases. Table 1 lists the IDs of the selected trucks. The causal discovery framework will be applied to the trucks within each class independently to ensure detailed examinations of the degradation patterns and causal relationships specific to each subset.

Figure 6.

Class distribution of trucks in the training set.

Table 1.

Trucks selected for the case studies.

Class	Truck ID
1	20967, 14470, 1732, 14370
2	1998, 15525, 4686, 3420, 6302, 18173, 1872
3	370, 14639, 3039, 23507, 3880, 1472, 16327, 10413, 15334, 721
4	5865, 7684, 22350, 12862, 15119, 14746, 17107, 18564, 21007, 4426, 7552, 9241, 9639, 9997, 27801, 31303, 33056, 1659, 6973, 3929, 25117, 26703, 30745

Truck-wise health quantification

Due to noise and high variability in the raw data, Health Indicator (HI) can be used to quantitatively define the degree of tear and wear of the system within a given timeframe. This research considers virtual HI through fusing multi-sensor signals.⁶⁵ Initially, the Root Mean Square (RMS) values for each feature series are computed. RMS is a commonly adopted approach to build HI.^66,67 It quantifies the average energy content in the signal.⁶⁸ RMS is not sensitive to sudden signal fluctuation and increases as failure progresses,⁶⁹ which renders this approach appropriate to simulate long-term declining trend. Mathematically, RMS for a discrete signal input can be defined in equation (22) as⁶⁹:

RMS = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} ({γ_{i}}^{2})},

(22)

where $γ_{i}$ denotes the value of $i$ -th data point, and denotes the size of the involved data. This research computes RMS over a series based on a rolling time window. Since HI will serve as the target variable modeling the degradation of Component X for each individual truck, timestep-wise representation of condition variations is required to capture continuous local changes in the signal. For demonstration purposes, a window size of 3 is considered. RMS values are computed starting from the third data point, with initial points zero-filled, and Min-Max normalized to ensure consistent scaling across features.

To construct the final HI, an Exponentially Weighted Moving Average (EWMA) technique is applied. EWMA assigns higher weights to recent RMS values while accounting for historical fluctuations through exponentially decaying weights, enabling the HI to capture both short-term variations and long-term degradation trends. A smoothing factor $ω$ to control the rate of weight decay is required. Feature-level HIs filter out transient noise in the raw RMS data and accentuates degradation trend extraction. Afterwards, they are fused to produce a time-dependent system-level HI considering the contributions of all features. Higher values indicate closer proximity to failure. Mathematically, HI for the $i$ -th feature at timestep $t$ is formulated in equation (23) followed by the fused HI at $t$ formulated in equation (24):

\begin{matrix} H I_{t, i} \\ = \frac{x_{t, i} + (1 - ω) x_{t - 1, i} + {(1 - ω)}^{2} x_{t - 2, i + \dots +} {(1 - ω)}^{t} x_{0, i}}{1 + (1 - ω) + {(1 - ω)}^{2} + \dots + {(1 - ω)}^{t}}, \end{matrix}

(23)

H I_{t} = \sum_{i = 1}^{N} H I_{t, i},

(24)

where $x_{t, i}$ represents the normalized RMS value of feature $X_{i}$ at timestep t, and $N$ is the number of features. HI effectiveness is evaluated using two standard metrics: monotonicity (measuring unidirectional degradation progression, ranging 0–1) and trendability (measuring correlation with time, ranging −1 to 1), with values near extremes indicating stronger performance.^65,70 The two metrics are formulated in equations (25) and (26), respectively, where $m$ represents the length of the HI sequence, $t_{j}$ represents the time value at step $j$ since operation starts, $H I_{t_{j}}$ represents the HI value at timestep $t_{j}$ , $\frac{dH I_{t_{j}}}{d t_{j}}$ represents the first difference between successive HI values, and $1$ is the indicator function that equals 1 if the condition is true and 0 otherwise.

monotonicity (HI) = \frac{1}{m - 1} | \sum^{1_{\frac{dH I_{t_{j}}}{d t_{j}} > 0}} - \sum^{1_{\frac{dH I_{t_{j}}}{d t_{j}} < 0}} |

(25)

\begin{matrix} trendability (HI, time) \\ = \frac{m (\sum_{j = 1}^{m} t_{j} H I_{t_{j}}) - (\sum_{j = 1}^{m} t_{j}) (\sum_{j = 1}^{m} H I_{t_{j}})}{\sqrt{[m \sum_{j = 1}^{m} {t_{j}}^{2} - {(\sum_{j = 1}^{m} t_{j})}^{2}] [m \sum_{j = 1}^{m} H {I_{t_{j}}}^{2} - {(\sum_{j = 1}^{m} H I_{t_{j}})}^{2}]}} \end{matrix}

(26)

Figure 7 illustrates HI versus time-before-failure (TbF) for trucks 14470 and 21007 as examples. $ω$ is set as 0.3 and remains constant for all other trucks throughout the case study. Results demonstrates clear degradation trajectories. Results from Table 2 confirms the viability of the established HI for these representative trucks.

Figure 7.

HI versus TbF for truck 14470 and truck 21007.

Table 2.

Monotonicity and trendability of the two trucks.

	Monotonicity	Trendability
Truck 14470	0.96	0.98
Truck 21007	0.98	0.99

Table 2 lists the monotonicity and trendability values for the two trucks; the calculations are based on normal time progression instead of TbF. Results confirm the viability of the proposed HI calculation strategy.

Stationarity test

TE calculation traditionally requires that data is stationary³¹— time series with statistical properties unchanged over time. Since the SCANIA Component X dataset exhibits non-stationary characteristics with observable rising trends approaching failure, a systematic workflow is designed in this study to transform data into stationary forms. Three complementary tests are involved: Augmented Dickey-Fuller (ADF, null hypothesis: non-stationary),⁷¹ Kwiatkowski-Phillips-Schmidt-Shin (KPSS, null hypothesis: stationary),⁷² and Levene’s test (detecting variance instability).⁷³ Series are classified as stationary only when passing both ADF (p<0.05) and KPSS (p>0.05) tests which primarily focus on trends and autocorrelation detection. For non-stationary series, Levene’s test (p $< 0.05$ indicating unstable variance) provides additional diagnosis of variance-driven non-stationarity.⁷⁴ Series failing Levene’s test undergo logarithmic transformation followed by up to second-order differencing if non-stationarity persists. Series passing Levene’s test proceed directly to up to second-order differencing. Series remaining non-stationary after these transformations are removed to maintain data quality. Resultant NaN values are zero-filled. Finally, the stationary dataset undergoes Min-Max normalization followed by 10× scaling across all columns (including HI), enhancing magnitude representation and interpretability for subsequent analyses.

Framework implementation and results presentation

The elaborated operational data of each selected truck is passed to the developed causality discovery framework. In the first module, TE is calculated as an averaged variant of local TE using Python library PyInform.⁷⁵ This TE realization embeds the past states of the target series (the HI series in this research) over a chosen number of history lags specified by the user, whereas only a lag of 1 is embedded for each source series.³¹ Due to the anonymization of the dataset and the absence of additional domain-specific insights that could guide the selection of optimal lag values, the case studies examine target history lengths between 2 and 5 for each truck for methodological demonstration purposes. the case studies assume a lag history between 2 and 5 for each truck for methodological demonstration purposes. Determined through empirical testing, this range is found to be able to retain approximately 10%–30% of the features from each truck’s initial dataset, which is generally considered reasonable in the sense of achieving a balance between dimensionality reduction and meaningful information preservation.

Posterior to TE computation on each feature with respect to the target series, a significance test is required to eliminate bias or spurious relations⁵⁹ in order to determine genuine causal features. The null hypothesis states that there is no causal relationship between the source and the target. The significance test follows the procedure described in Lizier et al.⁷⁶ For each feature, surrogate datasets are created through random permutation (1,000 times in this research) to simulate the null hypothesis. This approach is valid given the history embedding of 1 in the source features.⁵⁹ Next, TE is computed for each surrogate dataset, and as such, a reference distribution under the null hypothesis is established. This step is followed by the derivation of the p-value by comparing the observed TE against the surrogated distribution. This way features with p-values below the defined significance level (0.1 in this research) are considered statistically significant causal features.

Afterwards, the obtained causal features are clustered through the proposed algorithm. In the case study, DTW computations are realized via DTAIDistance library⁷⁷ which defaults to squared Euclidean distance metric. In the clustering stage, the maximum number of neighbors to be tested is set as 5. Figure 8 show four examples as the result of the algorithm execution, where certain causal feature clusters associated with four trucks from each class are visualized.

Figure 8.

DTW-aligned causal feature Cluster 1 of truck 14470 from Class 1, Cluster 6 of truck 3420 from Class 2, Cluster 1 of truck 16327 from Class 3, and Cluster 1 of truck 17107 from Class 4.

During the intra-cluster causality discovery phase, the TCDF is trained for 1,000 epochs on 70% of the dataset for each truck, with the remaining 30% used for prediction evaluation. All experiments employ Adam optimizer with a learning rate of 0.01. The kernel size and the dilation coefficient are both set as 2. For each model, a single hidden layer is considered to enhance prediction performance (compared to models without any hidden layer) while minimizing the risk of overfitting (compared to models having deeper architectures). The mean absolute scaled error (MASE) averaged over all feature series and its standard deviation (SD) are adopted as the evaluation metrics for model prediction. When validating significant causal features, the study follows the recommended $η$ value of 0.8.³⁴ Given the exploratory nature of the study, the aforementioned parameters stem from empirical testing instead of in-depth tuning. Figures 9 to 12 present four causal graphs illustrating the identified causal interactions within four randomly selected clusters from Class 1 to Class 4, respectively. Note that the depicted edge lengths are not to be construed as measures of delay.

Figure 9.

Causal relationship graph of Cluster 2 of truck 14370.

Figure 10.

Causal relationship graph of Cluster 1 of truck 3420.

Figure 11.

Causal relationship graph of Cluster 1 of truck 16327.

Figure 12.

Causal relationship graph of Cluster 1 of truck 5865.

Tables 3 to 6 present the complete results from applying the three modules of the proposed framework to all trucks investigated in each class, with each table corresponding to one class (e.g. Table 3 for Class 1, Table 4 for Class 2, and so on). Furthermore, the case studies propose two metrics that are used to evaluate the influence of each feature series on the obtained causal network in each class. Essentially, the metric in equation (27) evaluates the proportion of a feature series acting as a cause in the obtained complete causal relationships, whereas the metric in equation (28) quantifies the overall involvement of a feature series in the causal network by considering its contributions as both a cause and an effect. For each class, the distributions of feature series contribution based on the two metrics are plotted in four figures from Figures 13 to 16 respectively. Also can be found are summaries that highlight the key discoveries for each class, including influential feature series, recurring causal patterns, and notable trends derived from the causal analysis.

\begin{matrix} Cause Contribution Rate \\ = \frac{occurrence of feature series as a cause}{total number of causal relations} \end{matrix}

(27)

\begin{matrix} Cause - Effect Contribution Rate \\ = \frac{occurrence of feature series as both cause and effect}{total number of causes and effects} \end{matrix}

(28)

Table 3.

Causal feature identification, clustering, and intra-cluster analysis results for Class 1 trucks.

Truck ID	HI historical lag	Number of causal feature found	Number of cluster found	Averaged silhouette score	Number of discovered intra-cluster causal relationship	Averaged MASEand SD
1732	4	5	1	NaN	Cluster 1: 5	Cluster 1: 9.61/17.74
14370	3	24	2	0.52	Cluster 1: 13 Cluster 2: 6	Cluster 1: 0.32/0.19 Cluster 2: 0.33/0.21
20967	2	5	1	NaN	Cluster 1: 1	Cluster 1: 1.07/1.27
14470	1	3	1	NaN	Cluster 1: 2	Cluster 1: 0.76/0.01

Table 4.

Causal feature identification, clustering, and intra-cluster analysis results for Class 2 trucks.

Truck ID	HI historical lag	Number of causal feature found	Number of cluster found	Averaged silhouette score	Number of discovered intra-cluster causal relationship	Averaged MASEand SD
1872	4	17	6	0.08	Cluster 1: 3 Cluster 2: 3 Cluster 3: 4 Cluster 4: 2 Cluster 5: 2 Cluster 6: 3	Cluster 1: 0.16/0.03 Cluster 2: 0.07/0.03 Cluster 3: 0.09/0.07 Cluster 4: 0.25/0.01 Cluster 5: 0.14/0.01 Cluster 6: 0.49/0.19
1998	4	10	2	0.68	Cluster 1: 0 Cluster 2: 5	Cluster 1: 0.54/0.17 Cluster 2: 1.00/0.82
3420	4	18	2	0.58	Cluster 1: 8 Cluster 2: 7	Cluster 1: 0.42/0.31 Cluster 2: 0.36/0.10
4686	4	6	2	0.58	Cluster 1: 4 Cluster 2: 0	Cluster 1: 0.66/0.62 Cluster 2: 0.59/0.21
6302	3	14	5	0.15	Cluster 1:4 Cluster 2: 2 Cluster 3: 6 Cluster 4: 1 Cluster 5: 0	Cluster 1: 0.44/0.17 Cluster 2: 0.44/0.09 Cluster 3: 4.31/6.33 Cluster 4: 0.21/0.03 Cluster 5: 0.90/0.36
15525	3	6	2	0.34	Cluster 1: 7 Cluster 2: 2	Cluster 1: 0.51/0.31 Cluster 2: 1.68/0.77
18173	5	15	2	0.68	Cluster 1: 9 Cluster 2: 0	Cluster 1: 0.37/0.07 Cluster 2: 0.42/0.56

Table 5.

Causal feature identification, clustering, and intra-cluster analysis results for Class 3 trucks.

Truck ID	HI historical lag	Number of causal feature found	Number of cluster found	Averaged silhouette score	Number of discovered intra-cluster causal relationship	Averaged MASEand SD
370	3	12	4	0.13	Cluster 1: 1 Cluster 2: 6 Cluster 3: 3 Cluster 4: 3	Cluster 1: 2.64/1.48 Cluster 2: 0.62/0.26 Cluster 3: 0.51/0.32 Cluster 4: 0.45/0.07
721	5	14	2	0.40	Cluster 1: 4 Cluster 2: 8	Cluster 1: 0.20/0.08 Cluster 2: 0.25/0.11
1472	2	6	1	NaN	Cluster 1: 6	Cluster 1: 0.78/0.79
3039	4	10	4	0.19	Cluster 1: 2 Cluster 2: 2 Cluster 3: 4 Cluster 4: 0	Cluster 1: 0.26/0.23 Cluster 2: 0.07/0.02 Cluster 3: 4.25/6.71 Cluster 4: 0.23/0.04
3880	3	18	2	0.74	Cluster 1: 13 Cluster 2: 0	Cluster 1: 0.25/0.21 Cluster 2: 0.06/0.02
10413	2	7	2	0.48	Cluster 1: 3 Cluster 2: 3	Cluster 1: 0.26/0.07 Cluster 2: 2.61/2.09
14639	2	10	5	0.45	Cluster 1: 2 Cluster 2: 2 Cluster 3: 2 Cluster 4: 0 Cluster 5: 2	Cluster 1: 0.18/0.00 Cluster 2: 0.40/0.01 Cluster 3: 1.30/0.53 Cluster 4: 0.32/0.41 Cluster 4: 6.24/1.04
15334	4	9	3	0.14	Cluster 1: 4 Cluster 2: 3 Cluster 3: 0	Cluster 1: 0.15/0.03 Cluster 2: 0.80/0.97 Cluster 3: 1.67/1.58
16327	4	18	2	0.20	Cluster 1: 9 Cluster 2: 6	Cluster 1: 0.36/0.32 Cluster 2: 0.38/0.15
23507	2	15	2	0.95	Cluster 1: 10 Cluster 2: 0	Cluster 1: 0.43/0.30 Cluster 2: 0.36/0.07

Table 6.

Causal feature identification, clustering, and intra-cluster analysis results for Class 4 trucks.

Truck ID	HI historical lag	Number of causal feature found	Number of cluster found	Averaged silhouette score	Number of discovered intra-cluster causal relationship	Averaged MASEand SD
1659	4	8	2	0.88	Cluster 1: 0 Cluster 2: 6	Cluster 1: 0.79/0.24 Cluster 2: 0.82/0.41
3929	3	17	6	0.13	Cluster 1: 2 Cluster 2: 6 Cluster 3: 3 Cluster 4: 2 Cluster 5: 2 Cluster 6: 0	Cluster 1: 0.18/0.06 Cluster 2: 0.64/0.28 Cluster 3: 0.42/0.19 Cluster 1: 0.30/0.00 Cluster 2: 0.15/0.00 Cluster 3: 1.38/0.00
4426	5	17	7	0.19	Cluster 1: 4 Cluster 2: 2 Cluster 3: 2 Cluster 4: 5 Cluster 5: 2 Cluster 6: 2 Cluster 7: 0	Cluster 1: 0.37/0.20 Cluster 2: 0.28/0.03 Cluster 3: 0.22/0.03 Cluster 4: 0.47/0.29 Cluster 5: 0.21/0.00 Cluster 6: 0.09/0.01 Cluster 7: 0.44/0.18
5865	3	19	3	0.31	Cluster 1: 13 Cluster 2: 2 Cluster 3: 1	Cluster 1: 0.59/0.85 Cluster 2: 0.17/0.04 Cluster 3: 0.47/0.24
6973	4	10	3	0.45	Cluster 1: 5 Cluster 2: 0 Cluster 3: 4	Cluster 1: 0.21/0.01 Cluster 2: 0.39/0.01 Cluster 3: 2.28/1.96
7552	3	10	3	0.42	Cluster 1: 0 Cluster 2: 1 Cluster 3: 0	Cluster 1: 2.16/1.59 Cluster 2: 0.98/0.45 Cluster 3: 0.37/0.10
7684	4	9	4	0.73	Cluster 1: 2 Cluster 2: 0 Cluster 3: 0 Cluster 4: 2	Cluster 1: 0.11/0.00 Cluster 2: 0.28/0.09 Cluster 3: 0.37/0.10 Cluster 4: 5.75/0.94
9241	3	9	2	0.48	Cluster 1: 7 Cluster 2: 0	Cluster 1: 0.64/0.16 Cluster 2: 0.27/0.10
9639	5	23	2	0.63	Cluster 1: 5 Cluster 2: 6	Cluster 1: 0.44/0.08 Cluster 2: 0.19/0.22
9997	3	15	2	0.54	Cluster 1: 9 Cluster 2: 1	Cluster 1: 0.17/0.09 Cluster 2: 0.70/0.13
12862	2	9	2	0.74	Cluster 1: 8 Cluster 2: 0	Cluster 1: 0.70/0.41 Cluster 2: 0.72/0.14
14746	3	9	3	0.19	Cluster 1: 5 Cluster 2: 2 Cluster 3: 2	Cluster 1: 0.18/0.18 Cluster 2: 0.41/0.06 Cluster 3: 0.50/0.05
15119	5	13	2	0.49	Cluster 1: 1 Cluster 2: 1	Cluster 1: 0.26/0.12 Cluster 2: 0.20/0.06
17107	4	12	4	0.59	Cluster 1: 1 Cluster 2: 0 Cluster 3: 3 Cluster 4: 0	Cluster 1: 0.14/0.00 Cluster 2: 0.17/0.01 Cluster 3: 0.49/0.11 Cluster 4: 0.30/0.09
18564	3	11	3	0.12	Cluster 1: 3 Cluster 2: 4 Cluster 3: 4	Cluster 1: 0.19/0.09 Cluster 2: 0.53/0.26 Cluster 3: 0.54/0.14
21007	5	14	5	0.14	Cluster 1: 5 Cluster 2: 2 Cluster 3: 3 Cluster 4: 3 Cluster 5: 3	Cluster 1: 1.23/0.80 Cluster 2: 0.20/0.01 Cluster 3: 0.28/0.15 Cluster 4: 0.22/0.03 Cluster 5: 0.18/0.02
22350	4	9	4	0.10	Cluster 1: 4 Cluster 2: 2 Cluster 3: 2 Cluster 4: 2	Cluster 1: 0.18/0.05 Cluster 2: 0.18/0.00 Cluster 3: 0.21/0.02 Cluster 4: 0.40/0.00
25117	3	18	7	0.40	Cluster 1: 2 Cluster 2: 2 Cluster 3: 2 Cluster 4: 2 Cluster 5: 2 Cluster 6: 6 Cluster 7: 2	Cluster 1: 0.36/0.08 Cluster 2: 1.29/0.59 Cluster 3: 0.15/0.01 Cluster 4: 0.48/0.09 Cluster 5: 0.10/0.00 Cluster 6: 0.20/0.12 Cluster 7: 0.13/0.01
26703	3	8	3	0.43	Cluster 1: 2 Cluster 2: 2 Cluster 3: 5	Cluster 1: 0.10/0.02 Cluster 2: 0.12/0.00 Cluster 3: 0.28/0.12
27801	3	9	3	0.14	Cluster 1: 4 Cluster 2: 1 Cluster 3: 2	Cluster 1: 0.22/0.04 Cluster 2: 0.18/0.04 Cluster 3: 0.11/0.00
30745	3	6	3	0.61	Cluster 1: 2 Cluster 2: 2 Cluster 3: 1	Cluster 1: 0.18/0.01 Cluster 2: 0.35/0.02 Cluster 3: 1.37/0.67
31303	5	10	2	0.38	Cluster 1: 14 Cluster 2: 3	Cluster 1: 1.98/1.40 Cluster 2: 0.52/0.20
33056	4	8	2	0.81	Cluster 1: 8 Cluster 2: 0	Cluster 1: 1.00/1.59 Cluster 2: 0.29/0.10

Figure 13.

Contribution of feature series as causes, and total contribution of feature series as causes and effects, for selected Class 1 trucks.

Figure 14.

Contribution of feature series as causes, and total contribution of feature series as causes and effects, for selected Class 2 trucks.

Figure 15.

Contribution of feature series as causes, and total contribution of feature series as causes and effects, for selected Class 3 trucks.

Figure 16.

Contribution of feature series as causes, and total contribution of feature series as causes and effects, for selected Class 4 trucks.

Class 1 key discoveries: The 397 series has the highest cause-effect contribution across the causal network, represented by a rate of approximately 44%. Being effects, the 397 series features are existent in almost all discovered clusters. Meanwhile, it emerges as a strong causal driver within the interactions, accounting for 48.15% of the total causal relationships. It is found that the features within the 397 series predominantly interact with those from the same series, accounting for 25.93% of its total causal interactions. This is followed by the interactions with the features from the 167 and 459 series, each contributing 7.41% to the overall causal relationships. For example, in the first cluster of truck 14370, being causes, 397_8 and 397_26 are both causal to 397_14 with no delay, whereas 397_31 shows one-step delayed causality with 167_6 and two-step delayed causality with 459_11. Besides, 397_9 is found to cause 397_10 with no delay in truck 20967.

The 459 series serves as the secondary key factor in the extracted causal relationships. Contributing 29.63% as causes, its features exert equal causal influence on the 397 series and within the series, each accounting for around 11% of the total causal relationships. In truck 1732, causal interactions are reported among 459_15, 459_16, and 459_17 with no delays (459_15 causes 459_16 and vice versa, and 459_17 causes 459_16). In the same truck it is also found that 459_15 causes 397_30 with a delay of one timestep. Additionally, 459_11 causes 291_0 with zero timestep in the first cluster of truck 14370, and 459_0 causes 100_0 and 397_27 with no delay in its second cluster. These findings illustrate the dual role of the 459 series in causally driving both intra-series and inter-series interactions across clusters.

The remaining series play less impactful roles, with the 291 series being 11.11% and the others being less than 4% in terms of their cause-effect contribution rates. The 167 and 666 series are only observed in serving as effects in the extracted causal relationships in this class.

Class 2 key discoveries: Causal relationships concerning 397 series are the most frequently identified. This series plays both causal and effectual roles in over 50% of the identified relations, with approximately 21% being inter-series. For example, in the second cluster of truck 1998, 397_34 causes 397_28 with a two-step delay, while 397_14 causes 397_27 with zero delay in truck 6302, and 397_8 causes 397_27 with no delay in truck 18173. In truck 15525, the 397 series interactions are primarily within the series itself. Furthermore, the causality between 397 series and the 459 series contributes the second-largest share (6.94%) of its overall relations. Additionally, 397_28 in truck 1998 is caused by 158_9 with no delay, and 397_2 in truck 3420 is causal to 171_0 with no delay.

A large portion of the remaining causal relationships involve the 459, 272, and 167 feature series, each with a cause-effect contribution rate of approximately 11% of all discovered relations. Notably, compared to Class 1, the cause-effect contribution rates of the 272 and 167 series have increased by over 7% each in Class 2. Their interactions mainly consist of causalities within each series and between the two series. Conversely, the cause-effect contribution rate of the 459 series decrease by nearly 16%. The 459 series primarily exhibits local causality instead of system-wide interaction. With the 397 series, some instances include 459_5 causing 397_13 in truck 1872 and 459_5 causing 397_24 in truck 4686. Additionally, 459_16 shows mutual causality with 397_21 in truck 3420, alongside interactions with the 291 series in trucks 1872 and 4686.

Class 2 introduces three new feature series—171, 837, and 427—each with a cause and cause-effect contribution rate of approximately 2.78%, 1.39%, and 1.39%, respectively, in the overall causal relationships. These series may offer additional insights into the component X’s dynamics in this particular class.

Class 3 key discoveries: The result reveals two exactly recurring causal relationships: (1) 167_2 causes 272_1, and (2) 272_1 causes 167_2. Both are with no delay. This pair of symmetrical interaction is observed in the third cluster of truck 370 and the second cluster of truck 721.

Key feature series-wise, the 167 series demonstrates an increasing impact among the extracted causal relationships, with its presence as causes rising to 15.31% and its overall presence rising to 14.80%. Notably, it is observed that the 167 series features often function as causal intermediaries that causally link multiple features in various contexts. For example, in the second cluster of truck 721, 167_2 causes 272_1 and 167_2 causes 272_0 without delay. Simultaneously, 272_1 causes 167_2 without delay. Another example is in the first cluster of truck 16327, where 167_3 causes 158_9 with a delay of one step and 167_4 causes 167_3 with no delay, forming a causal chain. Nevertheless, this relaying role is not universal, as in some cases the 167 series features act as a localized or terminal node, such as a bidirectional causality between 167_1 and 272_0 in the first cluster of truck 3880, or a unidirectional causality from 397_35 to 167_8 with three-step delay in the fifth cluster of truck 14639.

The 291 series becomes increasingly noticeable role in Class 3 causal relationships. It shows a nearly 10% increase in its cause-effect contribution rate, and its role as causes is further strengthened with a percentage rise from around 5.56% in Class 2 to 16.33% in Class 3. Examples include 291_0 causing 100_0 in the second cluster of truck 370, as well as 291_10 causing 171_0 in truck 23507. Additionally, mutual causality loops involving the 291 series are observed in various trucks such as truck 1472, truck 3880, and truck 14639.

The 397 series shows a decline in influence compared to Class 2. Its presence as causal roles drops to around 28.57% and its cause-effect contribution rate drops to around 29.08%. The majority of 397 series features function as causes in the extracted causal relationships. For example, in the first cluster of truck 721, 397_20 causes 158_5 with no delay. In truck 3880, 397_15 causes 456_16 with a delay of three steps and 837_0 with no delay. Some inter-series relations are also observed, such as mutual causality between 397_21 and 397_15 in truck 3880, and additional interactions in Truck 16327.

The 459 series shows a slight decrease in its overall presence, dropping from 11.81% in Class 2 to 10.20% in Class 3. Nevertheless, its influence experiences a substantial reduction compared to its nearly 28% cause-effect contribution rate and its nearly 30% cause contribution rate in Class 1. In contrast, the contribution of the 158 series remains stable at around 9.69% in overall causal relationships and around 9.18% as causes, which shows little change from its combined cause and effect contribution rate of 9.72% in Class 2. Compared to Class 1, the 158 series plays a more critical role in the causal network, with both its cause and cause-effect contribution rates nearly 6% higher than those observed in Class 1. Features from the 158 series appear in multiple trucks, such as 158_5 causally influencing both 397_20 and 397_22 in truck 721, and 158_9 in truck 16327 causally influencing 397_0 and 291_4.

Compared to Class 2, two new feature series appear in the discovered relations: the 309 and 835 series. These series play a minor role as both their cause and cause-effect contribution rates are approximately 1.02% of the total relations.

Class 4 key discoveries: The results include multiple recurring causal relationships across the involved trucks. While these relations are consistent in terms of cause and effect, some involve slight differences in terms of the delay step, as outlined in Table 7.

Table 7.

Recurring causal relationships in Class 4 trucks.

Causal relationship	Found in	Delay step
459_11 causes 837_0	truck 3223 (cluster 3), truck 22350 (cluster 4)	0
397_35 causes 397_23	truck 30745 (cluster 3), truck 31303 (cluster 1)	2
397_13 causes 397_7	truck 26703 (cluster 3), truck 22350 (cluster 3)	0
158_4 causes 397_20	truck 3929 (cluster 1), truck 18654 (cluster 2)	0
397_28 causes 459_17	truck 12862 (cluster 1), truck 31303 (cluster 1)	2
158_7 causes 397_26	truck 5865 (cluster 1), truck 25117 (cluster 7)	0
427_0 causes 171_0	truck 25117(cluster 5), truck 14746 (cluster 1)	0
397_34 causes 397_28	truck 1659 (cluster 2), truck 15119 (cluster 3)	2, 0
158_8 causes 397_2	truck 27801 (cluster 1), truck 4426 (cluster 2)	2, 0
397_20 causes 158_4	truck 3929 (cluster 1), truck 18564 (cluster 2)	0
459_7 causes 459_8	truck 25117 (cluster 6), truck 18564 (cluster 1)	0
167_0 causes 167_0	truck 7684 (cluster 4), truck 6973 (cluster 3)	1, 2
397_7 causes 397_13	truck 22350 (cluster 3), truck 26703 (cluster 3)	0
666_0 causes 158_9	truck 5865 (cluster 2), truck 9997 (cluster 1)	0
397_2 causes 158_8	truck 27801 (cluster 1), truck 4426 (cluster 2)	0
397_28 causes 397_34	truck 17107 (cluster 3), truck 1659 (cluster 2)	0, 2

In respect of key feature series, the 397 series once again plays a dominant role. Its cause-effect contribution rate reaches 43.41%, markedly higher than that in Class 3. In particular, the feature 397_28 has the highest appearance of 17 times in the discovered causal relationships. In general, it is found that the 397 series principally interact with 459 and 167 feature series, such as 397_22 causing 167_8 with a delay of three timesteps in the first cluster of truck 5865, 397_24 causing 167_2 with a delay of one timestep in the second cluster of truck 9639, and 397_27 causing 459_2 with no delay in the third cluster of truck 18564. Furthermore, inter-series causalities are observed.

The 158 series demonstrates a unique prominence in Class 4. Its cause contribution rate and cause-effect contribution rate increases to 14.09% and 12.50%, respectively, which surpass those in the other three classes. Its features mainly causally drive those from the 397 series, the 167 series, and the 158 series itself, which in total account for 25 causal relationships out of 220 relations reported. Furthermore, their role as effects is also captured, particularly in the relations that are associated with the features from the 666, 459, and 272 series. Notably, features from 158 series often function as a bidirectional influencer, such as the mutual causality between 158_1 and 167_3 as well as 397_7 in truck 4426, and that between 158_8 and 459_10 in truck 26703.

The 167 series shows a decrease in its overall presence in Class 4 compared to Class 3, but its role as a connector in fault propagation is more pronounced, since its features are more embedded in extensive causal networks. For instance, in the first cluster of truck 4426, 167_6 being a coordinating node causes 158_9 and 272_5, with bidirectional interactions also observed between 167_6 and 272_5. In the second cluster of truck 9639, 397_25, 397_24, and 291_6 cause 167_2 with delays of two, one, and zero steps, respectively, whereas 167_2 causes 397_24 with a delay of two steps. In addition, in the second cluster of truck 31303, 167_2 and 272_1 share bidirectional causal links; meanwhile, 167_2 also causes 272_0, which indicates its influence extends across related features.

The 291 series shows a sharp decline in its presence, contributing only 5.23% to the extracted causal relationships, an apparent reduction from its higher cause-effect contribution rates in Class 1 (11.11%), Class 2 (8.33%), and Class 3 (17.35%). While this series continues to act as a local driver, interactions with other series are less frequent, with exceptions such as 158_9 causing 291_10 with no delay in truck 9997 and 100_0 causing 291_4 with a two-step delay in truck 12862. On the contrary, Class 4 witnesses a resurgence in the influence of the 459 series. Its cause contribution rate and cause-effect contribution rate rise to 18.18% and 17.05%, respectively, regaining its status as the second most critical feature series. Most of the 459 series features continue to exhibit within-group causalities, typical examples being the sixth cluster of truck 25117 and the sole cluster of truck 33056 where all discovered causal relationships include 459 series features. Inter-feature interactions involving the 459 series are less common. Examples include 459_11 causing 837_0 with no delay in the second cluster of truck 3929, 167_0 causing 459_17 with a delay of one step, and 397_12 causes 459_14 with a delay of three timesteps in the first cluster of truck 9639.

In addition, the case studies find some causal relationships that are present in more than one class. They are reported in Table 8.

Table 8.

Recurring causal relationships existing in more than one class.

Causal relationship	Found in	Class	Delay step
397_21 causes 397_15	Truck 3420 (cluster 2), truck 3880 (cluster 1)	2, 3	0
397_15 causes 459_16	Truck 3420 (cluster 2), truck 3880 (cluster 1)	2, 3	0, 3
397_34 causes 397_28	Truck 1998 (cluster 2), truck 17107 (cluster 3), truck 1659 (cluster 2)	2, 4, 4	2, 0, 2
397_28 causes 397_34	Truck 1998 (cluster 2), truck 17107 (cluster 3), truck 1659 (cluster 2)	2, 4, 4	0, 0, 2
167_6 causes 272_5	Truck 18173 (cluster 1), truck 4426 (cluster 1)	2, 4	0
272_5 causes 167_6	Truck 18173 (cluster 1), truck 4426 (cluster 1)	2, 4	0
397_27 causes 459_2	Truck 18173 (cluster 1), truck 18564 (cluster 3)	2, 4	2, 0
167_2 causes 272_1	Truck 370 (cluster 3), truck 721 (cluster 2), truck 31303 (cluster 2)	3, 3, 4	0
158_7 causes 397_26	Truck 15334 (cluster 1), truck 5865 (cluster 1), truck 25117 (cluster 7)	3, 4, 4	0
459_7 causes 459_5	Truck 16327 (cluster 1), truck 25117 (cluster 6)	3, 4	0
158_1 causes 397_0	Truck 16327 (cluster 1), truck 9997 (cluster 6)	3, 4	0
171_0 causes 427_0	Truck 23507 (cluster 1), truck 22350 (cluster 5)	3, 4	0

To summarize, the 397, 459, 291, 167, and 158 series in the present case studies appear to be the principal players in the extracted causal relationships of the selected trucks in four classes. They may be treated as warning indicators for maintenance personnels. The 397 series takes the lead on the whole in terms of its cause and cause-effect contributions spanning both intra- and inter-series interactions, despite a diminished existence in Class 3. The 459 series is also involved in intra- and inter-series interactions, but its influence declines in the first three classes before regaining the status as secondary contributor to the causal relationships in Class 4. The 291 series shifts from overall moderate contributions in classes 1–3 to a more limited participant in Class 4. Its surging cause and cause-effect contribution rates and active involvement in causal chains in Class 3, however, may denote its importance as a signature series for that class. The 167 series transitions from being a minor contributor in early classes to a crucial connector to later classes where its role as a bridge enables causality pathways. Lastly, the 158 series shows a continuously rising causality contributions across the four classes with the peak appearing in Class 4. As such, it may act as an identifier of last-phase degradation dynamics of Component X. With respect to the given tasks and data, maintenance teams may opt to prioritize these five series and their associated recurring patterns for further diagnostic purposes while not overlooking the potential effect of less influential series (e.g. 171 and 837 series) on the failure of Component X.

Result discussion and implication

Results from TE application demonstrate the method’s capability in both dimensionality reduction and causal feature identification, while exhibiting variability in lag value selection and identified feature numbers. Higher lag values generally identify more causal features (evident in Classes 1–2), although Classes 3 and 4 show more complex patterns. For instance, in Class 3, truck 3880 with lag 3 identifies 18 features, surpassing truck 721 with lag 5 finding 14 features. Similarly, in Class 4, truck 9639 with lag 5 identifies 23 features, yet trucks 25117 and 5865 with lag 3 achieve comparable results. This suggests diminishing returns beyond certain thresholds. Higher lags capture deeper dependencies but risk introducing noise or ignoring recent critical patterns, while lower values preserve short-term patterns but sacrifice historical information. The same logic applies to causal feature quantity: comprehensive sets provide richer diagnostic signals but may introduce redundancy, whereas targeted subsets enable focused interventions but limit explanatory power. While computing the ideal lag value and causal feature number is not the aim of this research, this variability implies that no universal parameterization exists. Effective TE causal feature selection requires adaptation to specific operational needs, maintenance records, and desired analytical granularity for each system, even within homogenous fleets. For prescriptive maintenance, this adaptability requirement suggests effective strategies should be tailored to both equipment characteristics and organizational constraints: fewer features may enable rapid response protocols, while comprehensive sets support detailed degradation mechanism analysis for strategic planning.

The clustering performance showcases both strengths and limitations of the DTW-based algorithm. The algorithm manages to isolate symptom groups with shared temporal patterns, correctly aligning abnormal oscillations (e.g. sharp fluctuations in the final 50–100 time units before failure in trucks 3420, 16327, and 17107) that potentially signify subsystem responses to degradation. The algorithm maintains functionality despite noise and signal variability However, limitations appear in such cases as trucks 14470 and 17107 where signals cluster imperfectly (e.g. feature 272_4) or exhibit weak synchronization, reflecting sensor data realities due to inherent response differences, operating condition variations, and measurement errors (e.g. sensor malfunctions). Nonetheless, maintenance applications prioritize overall trend identification over perfect alignment, and the algorithm robustly captures broader trends essential for degradation characterization and subsequent causality analysis. Numerically, silhouette score evaluation quantify clustering quality, revealing strategic implications for maintenance planning, Higher scores (e.g. 0.88 for truck 1659) indicate cohesive, well-separated clusters ideal for identifying clear subsystems or fault mechanisms. Lower scores (e.g. 0.08 for truck 1872) reflect weaker cohesion or overlap possibly caused by data variability or inferior data quality. Cluster counts ranging from one to seven reveal relationship complexity. This clustering diversity translates directly into strategical differentiation: in practice, fewer clusters with high scores suggest densely correlated sensor behaviors pointing to simpler diagnostic processes on smaller subsystems, while more clusters with low scores indicate diverse sensor responses requiring additional scrutiny to identify failure patterns. While not all clusters are perfectly defined and cluster numbers may not align directly with physical components, this does not undermine utility. This intermediary provides maintenance personnel with a new computational perspective. It augments domain knowledge by revealing hidden signal relationships that support systematic causal discovery and improved diagnostics.

TCDF’s causal discovery demonstrates strong prediction performance with low MASE and SD values across most clusters, validating identified relationships. Exceptions with elevated metrics (e.g. first cluster of truck 1732 in Class 1) indicate instances where noise or unpredictability limits temporal dependency learning. Varying causal relationship counts within clusters provide prescriptive insights: high numbers (e.g. 13 in truck 3880’s first cluster) represent complex cascading mechanisms requiring careful maintenance prioritization. Conversely, fewer relationships may indicate isolated or straightforward propagation pathways, simpler to diagnose but potentially not capturing complete causal flows. Importantly, not every TE-identified feature induces intra-cluster causality, which demonstrates TCDF’s complementary refinement power in isolating truly relevant features. This interpretable prediction-validation mechanism aligns with the viewpoint of Pashami et al.⁷⁸ that explainability within predictive models fosters confidence in prescriptive maintenance solutions.

Causal relationship interpretation is crucial for prescriptive maintenance realization. These relationships constitute the knowledge base that explains failure formation and propagation, showing how sensor behaviors transmit across subsystems with observable delays and supporting proactive, targeted maintenance strategies. Three relationship types emerge as pivotal. Unidirectional relations reveal sequential fault propagation and clear causal chains useful for identifying primary triggers and intervention points. Bidirectional relations represent mutual causality indicating interdependence between sensor-monitored components. They can result from genuine feedback loops or shared but hidden confounders, which requires more judicious examination of possible shared systemic influences. Self-causal relationships signify autoregressive tendencies or persistent fault states useful for prioritizing inspection, recalibration, or early warning systems. Recurring relationships across multiple units represent diagnostic commonalities that, despite low recurrence frequencies (majority being 2, maximum being 3), effectively highlight problematic sources and local causal chains across machines with varying service conditions, benefiting fleet health management and part inventory optimization.

Temporal delays provide physically interpretable intervention timing for maintenance planning. Delay values can be interpreted as mechanical response times, wear progression speeds or cascading failure sequences depending on the systems and sensor types involved. Non-zero delays (e.g. 397_28 causes 459_17 with two-timestep lag shown in Table 7) could suggest a measurable lag in how changes in one subsystem manifest in another, providing intervention windows for inspection or corrective action before downstream effects occur to reduce further damage. Zero-delay relationships (e.g. 167_2 causes 272_1 in Table 8) could indicate tightly coupled components requiring simultaneous monitoring, which would necessitate real-time response protocols or increased sampling rates for early detection as causality propagates faster than the measurement frequency can resolve. The delay information also informs maintenance resource allocation—longer delays permit scheduled interventions during planned downtime, whereas shorter delays require rapid-response capabilities. However, misinterpretation of delays can lead to unnecessary resource consumption (too early) or untimely intervention (too late), emphasizing the need for context-driven, human-supervised maintenance rather than exclusive computational reliance. For effective prescriptive maintenance, these data-driven delay patterns should be interpreted alongside domain expertise, system dynamics, and historical maintenance records to determine whether they align with known mechanical response times, buffering effects, or cascading failure mechanisms in the equipment under study.

The multi-stage architectural design delivers comprehensive diagnostics through interdependent module functionality. Case study demonstrates that this layered approach manages to reveal both fleet-wide commonalities and individual nuances that would otherwise remain obscured through isolated single-method approaches. However, the framework utility adapts to dataset dimensionality and problem complexity in real-world applications. While high-dimensional systems could benefit from the complete workflow’s systematic feature reduction and structured causality discovery, strategic module deployment becomes essential when sensor parameters are limited. For instance, practitioners may choose to bypass TE for direct clustering and causality analysis when domain knowledge sufficiently identifies causal relationships. The decision is ultimately constrained by organizational maintenance objectives, available computational resources, and the criticality of equipment-specific failure mechanisms. Regardless of dimensionality, the framework’s prescriptive value lies in systematic temporal relationship characterization that augments human expertise, ensuring maintenance interventions integrate computational insights with domain knowledge for optimized timing and resource allocation.

The current implementation establishes proof-of-concept viability for the proposed approach within the scope of the provided data tasks. Success with data subject to confidential treatment highlights the framework’s adaptability and suggests its promise for broader use. At the same time, parameter optimization and how the resultant knowledge should be interpreted can vary when applied to systems with differing characteristics, such as higher dimensionality, different sampling frequencies, or more complex interaction patterns among components. validation on a wider scale across diverse equipment types and operational environments would therefore strengthen generalizability claims and enable systematic benchmarking against alternative causal discovery methodologies under controlled conditions, thereby allowing continuous modular refinement. It is also important to note that the confidential data treatment in this study limits direct domain validation and exemplifies a common constraint in industrial research where ground truth causal relationships are rarely documented. Consequently, the interpretative aspects of discovered relationships necessitate reliance on domain-agnostic validation approaches such as those employed in this study. Future developments could benefit from establishing systematic parameter selection protocols and developing standardized evaluation frameworks for maintenance-oriented causal discovery tasks. Additionally, as noted in Chatterjee and Dethlefs⁴⁶ integrating human-in-the-loop mechanisms would enhance deployment confidence. These considerations collectively highlight pathways for evolving the current analytical foundation into a comprehensive prescriptive maintenance solution supporting real-time monitoring and automated decision-making capabilities.

Conclusion

This paper proposes a causality discovery framework as a novel approach to understanding and managing sensor data-tracked equipment health in maintenance applications. The framework establishes a foundation for equipment prescriptive maintenance in industrial IoT environments through three interrelated modules. The first module adopts TE to determine features causally influencing degradation processes. The second module employs a DTW-driven algorithm decomposing causal feature space into clusters based on temporal pattern similarity in response to system degradation. The third module applies the TCDF deep learning network to identify causal relationships and compute delay times through attention mechanisms and causality validation. Case studies implementing the framework on real-world SCANIA truck datasets demonstrate functionality and performance across individual trucks, proposing generalizable maintenance insights. Results show framework adaptability in causality discovery at varying granularity levels and potential for fleet-wide applications. The paper emphasizes that the framework requires expert discretion in parameter design and result interpretation, considering equipment-specific and organizational factors to elevate credibility, avoid over-simplification, and minimize execution outcome misinterpretation. From a practical perspective, the systematic causality discovery approach enables proactive interventions that enhance equipment safety and reliability through evidence-based, targeted interventions, thereby increasing asset availability and preventing catastrophic failures.

Future work is expected to address the following aspects. User interactivity can be enhanced by enabling domain experts to customize DTW clustering beyond silhouette score reliance, allowing maintenance specialists to adjust configurations based on domain knowledge and diagnostic requirements. Real-time capability should be pursued through adaptive systems leveraging historical causal patterns while continuously updating causal feature space, symptom groupings, and delay estimations as new data becomes available, facilitating immediate anomaly detection and shifting from retrospective analysis to proactive monitoring. Additionally, automation of maintenance decision-making should translate discovered causal relationships and temporal delays into actionable recommendations via intelligent advisory systems providing guidance on optimal timing for replacements, resource allocation priorities, and targeted repair strategies. This would extend the current analytical framework into a comprehensive prescriptive maintenance solution ready for industrial deployment.

Footnotes

ORCID iDs

Zhixuan Shao

Mustafa Kumral

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is funded by the Natural Sciences and Engineering Research Council of Canada (NSERC; No: NSERC RGPIN-2019-04763). The authors are grateful for this support.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Pintelon

Parodi-Herz

Maintenance: an evolutionary perspective. In: Kobbacy

KAH

Prabhakar Murthy

(eds) Complex system maintenance handbook. Springer, 2008, pp.21–48.

Dash

Prakash

Samantaray

AK.

Failure prognosis of the components with unlike degradation trends: a data-driven approach. Proc Inst Mech Eng Part O: J Risk Reliab 2023; 237: 1132–1149.

Wang

Chen

Machine cross-domain remaining useful life prediction via contrastive adversarial variational recurrent method. Proc Inst Mech Eng Part O: J Risk Reliab 2025; 239: 703–719.

Lemoine

Castanier

How to use prescriptive maintenance to construct robust Master production schedules. In: Proceedings of the 31st European safety and reliability conference (ESREL 2021), Angers, France, 2021, pp.3280–3286. Research Publishing.

Padovano

Longo

Nicoletti

, et al. A prescriptive maintenance system for intelligent production planning and control in a smart cyber-physical production line. Proc CIRP 2021; 104: 1819–1824.

Ansari

Glawar

Nemeth

PriMa: a prescriptive maintenance model for cyber-physical production systems. Int J Comput Integr Manuf 2019; 32: 482–503.

Giacotto

Marques

Martinetti

Prescriptive maintenance: a comprehensive review of current research and future directions. J Qual Maint Eng 2025; 31: 129–173.

Nemeth

Ansari

Sihn

, et al. PriMa-X: a reference model for realizing prescriptive maintenance and assessing its maturity enhanced by machine learning. Proc CIRP 2018; 72: 1039–1044.

Pearl

Causality. 2nd ed. Cambridge University Press, 2009.

10.

Eichler

Causal inference in time series analysis. In: Berzuini

Dawid

Bernardinelli

(eds) Causality: statistical perspectives and applications. 2012, pp.327–354.

11.

Granger

CW.

Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969; 37(3): 424–438.

12.

Huber

An introduction to causal discovery. Swiss J Econ Stat 2024; 160: 14.

13.

Vuković

Thalmann

Causal discovery in manufacturing: a structured literature review. J Manuf Mater Process 2022; 6: 10.

14.

Rotari

Kulahci

Correlation to causality. Qual Eng 2025; 37: 162–172.

15.

Bronstein

Meyer-Kalos

Vinogradov

, et al. Causal discovery analysis: a promising tool for precision medicine. Psychiatr Ann 2024; 54: e119–e124.

16.

Magzumov

Kumral

Cointegration and causality testing in time series for multivariate analysis through minerals industry case studies. Miner Econ 2025; 38: 21–35.

17.

Markham

Wang

, et al. Data-driven causal behaviour modelling from trajectory data: a case for fare incentives in public transport. J Public Transp 2025; 27: 100114.

18.

Nagahata

Saitoh

Hiraoka

. Application of causality model to propose maintenance action of parts. In: Kishita

Matsumoto

Inoue

, et al. (eds) EcoDesign and sustainability I: products, services, and business models. Springer, 2021, pp.325–338.

19.

Smith

JQ.

Causal chain event graphs for remedial maintenance. Risk Anal 2024; 45: 896–909.

20.

Chen

H-S

Yan

Zhang

, et al. Root cause diagnosis of process faults using conditional Granger causality analysis and Maximum Spanning Tree. IFAC-PapersOnLine 2018; 51: 381–386.

21.

Kliangkhlao

Haruehansapong

Yeranee

, et al. Causal artificial intelligence–driven approach for HVAC preventive maintenance explanation. IEEE Access 2024; 12: 121064–121076.

22.

Mariani

Martinelli

Morandi

, et al. Towards intelligent monitoring and control of industrial Internet of Things deployments with causality-aware digital twins. In: 2025 21st international conference on distributed computing in smart systems and the Internet of Things (DCOSS-IoT), Tuscany, Italy 9–11 June, 2025, pp.544–551. New York: IEEE.

23.

Pang

Lodewijks

Large-scale conveyor belt system maintenance decision-making by using fuzzy causal modeling. In: Proceedings 2005 IEEE intelligent transportation systems, 13–16 September 2005, Vienna, Austria, 2005, pp.563–567. New York: IEEE.

24.

Zhou

, et al. Leveraging on causal knowledge for enhancing the root cause analysis of equipment spot inspection failures. Adv Eng Inform 2022; 54: 101799.

25.

Vanderschueren

Boute

Verdonck

, et al. Optimizing the preventive maintenance frequency with causal machine learning. Int J Prod Econ 2023; 258: 108798.

26.

Nadim

Ragab

Ouali

M-S.

Data-driven dynamic causality analysis of industrial systems using interpretable machine learning and process mining. J Intell Manuf 2023; 34: 57–83.

27.

Gui

Lin

, et al. CaFANet: causal-factors-aware attention networks for equipment fault prediction in the internet of things. Sensors 2023; 23: 7040.

28.

Choubey

Benton

Johnsten

Prescriptive equipment maintenance: a framework. In: 2019 IEEE international conference on big data, Los Angles, CA, 9–12 December 2019, pp.4366–4374. New York: IEEE.

29.

Strack

Frank

Stich

, et al. Prescriptive maintenance for onshore wind turbines. In: Proceedings of the conference on production systems and logistics: CPSL 2021, 10–1 August 2021, pp.489–498. Publish-Ing.

30.

Weller

Migenda

Kühn

, et al. Prescriptive analytics data canvas: strategic planning for prescriptive analytics In smart factories. In: Proceedings of the conference on production systems and logistics, CPSL 2024, Honolulu, HI, 9–12 July 2024, pp.292–302. Hannover: Publish-Ing.

31.

Schreiber

Measuring information transfer. Phys Rev Lett 2000; 85: 461–464.

32.

Berndt

Clifford

Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, Seattle, WA, July 31–1 August 1994, pp.359–370. Washington, DC: AAAI Press.

33.

Sakoe

Chiba

Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 1978; 26: 43–49.

34.

Nauta

Bucur

Seifert

Causal discovery with attention-based convolutional neural networks. Mach Learn Knowl Extr 2019; 1: 19.

35.

Cao

Yang

, et al. Data-driven dynamic inferential sensors based on causality analysis. Control Eng Pract 2020; 104: 104626.

36.

Shi

, et al. The causality analysis of incipient fault in industrial processes using dynamic data stream transfer entropy. J Process Control 2023; 128: 103022.

37.

Sauter

Boukhobza

Aubrun

Data-based fault diagnosis using causality graph models derived from transfer entropy computation. IFAC-PapersOnLine 2023; 56: 2921–2926.

38.

Barak

Parvini

Transfer-entropy-based dynamic feature selection for evaluating Bitcoin price drivers. J Futures Mark 2023; 43: 1695–1726.

39.

Gao

Wang

Potter

, et al. Single-trial EEG emotion recognition using Granger causality/transfer entropy analysis. J Neurosci Methods 2020; 346: 108904.

40.

Petitjean

Ketterlin

Gançarski

A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 2011; 44: 678–693.

41.

Kumar

Legendre

Zhao

, et al. Dynamic time warping as an alternative to windowed cross correlation in seismological applications. Seismol Res Lett 2022; 93: 1909–1921.

42.

Sempena

Maulidevi

Aryan

PR.

Human action recognition using dynamic time warping. In: Proceedings of the 2011 international conference on electrical engineering and informatics, Bandung, Indonesia, 17–19 July 2011, pp.1–5. New York: IEEE.

43.

Kim

Choi

J-H.

Prediction of remaining useful life by data augmentation technique based on dynamic time warping. Mech Syst Signal Process 2020; 136: 106486.

44.

Mitici

De Pater

Online model-based remaining-useful-life prognostics for aircraft cooling units using time-warping degradation clustering. Aerospace 2021; 8: 168.

45.

Jiang

Fault diagnosis of mine shaft guide rails using vibration signal analysis based on dynamic time warping. Symmetry 2018; 10: 500.

46.

Chatterjee

Dethlefs

Temporal causal inference in wind turbine scada data using deep learning for explainable AI. J Phys Conf Ser 2020; 1618: 022022.

47.

Chen

Zhao

Multi-lag and multi-type temporal causality inference and analysis for industrial process fault diagnosis. Control Eng Pract 2022; 124: 105174.

48.

Shavit

Davidovits

Kushnirsky

, et al. Temporal causality-based feature selection for fault prediction in rotorcraft flight controls. IFAC-PapersOnLine 2022; 55: 235–239.

49.

Xie

, et al. Large-scale chemical process causal discovery from big data with transformer-based deep learning. Process Saf Environ Prot 2023; 173: 163–177.

50.

Llanos

Kristjanpoller

Michell

, et al. Causal treatment effects in time series: CO₂ emissions and energy consumption effect on GDP. Energy 2022; 249: 123625.

51.

Lindgren

Steinert

Andersson Reyna

, et al. SCANIA Component X dataset: a real-world multivariate time series dataset for predictive maintenance, https://researchdata.se/sv/catalogue/dataset/2024-34/3 (2025, accessed June 2025).

52.

Hotelling

Analysis of a complex of statistical variables into principal components. J Educ Psychol 1933; 24: 417.

53.

Comon

Independent component analysis, a new concept?

Signal Process 1994; 36: 287–314.

54.

Lee

Seung

. Algorithms for non-negative matrix factorization. In: Proceedings of the 14th international conference on neural information processing systems, Denver, CO, 27 November-2 December 2000, pp.535–541. Cambridge, MA: MIT Press.

55.

Pearson

VII . Mathematical contributions to the theory of evolution.—III. Regression, heredity, and panmixia. Philos Trans R Soc A 1896; 187: 253–318.

56.

Spearman

The proof and measurement of association between two things. Am J Psychol 1904; 15: 72–101.

57.

Kreer

A question of terminology. IRE Trans Inf Theory 1957; 3: 208–208.

58.

Shannon

CE.

A mathematical theory of communication. Bell Syst Tech J 1948; 27: 379–423.

59.

Bossomaier

Barnett

Harré

, et al. Transfer entropy. Springer, 2016, p.90.

60.

Rousseeuw

PJ.

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987; 20: 53–65.

61.

Spirtes

Glymour

Scheines

Causation, prediction, and search. 2nd ed. MIT Press, 2000, p.84.

62.

Pearl

Graphs, causality, and structural equation models. Soc Methods Res 1998; 27: 226–284.

63.

LeCun

Bottou

Bengio

, et al. Gradient-based learning applied to document recognition. Proc IEEE 1998; 86: 2278–2324.

64.

Kharazian

Lindgren

Magnússon

, et al. SCANIA Component X dataset: a real-world multivariate time series dataset for predictive maintenance. Sci Data 2025; 12: 493.

65.

Lei

Guo

, et al. Machinery health prognostics: a systematic review from data acquisition to RUL prediction. Mech Syst Signal Process 2018; 104: 799–834.

66.

Klausen

Huynh

Robbersmyr

KG.

RMS based health indicators for remaining useful lifetime estimation of bearings. MIC J Mod Identif Control 2022; 43: 21–38.

67.

Pei

Gao

Remaining useful life prediction of machinery based on performance evaluation and online cross-domain health indicator under unknown working conditions. J Manuf Syst 2024; 75: 213–227.

68.

Zhu

Nostrand

Spiegel

, et al. Survey of condition indicators for condition monitoring systems. In: Proceedings of the annual conference of the PHM society, Fort Worth, TX, 29 September-2 October 2014, New York: IEEE.

69.

Večeř

Kreidl

Šmíd

Condition indicators for gearbox condition monitoring systems. Acta Polytech 2005; 45: 35–43.

70.

Zhang

Degradation feature selection for remaining useful life prediction of rolling element bearings. Qual Reliab Eng Int 2016; 32: 547–554.

71.

Dickey

Fuller

WA.

Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 1979; 74: 427–431.

72.

Kwiatkowski

Phillips

Schmidt

, et al. Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root? J Econom 1992; 54: 159–178.

73.

Levene

Robust tests for equality of variances. In: Olkin

(ed.) Contributions to probability and statistics: essays in honor of Harold Hotelling. Stanford University Press, 1960, pp.278–292.

74.

Bawdekar

Prusty

Bingi

Sensitivity analysis of stationarity tests’ outcome to time Series facets and test parameters. Math Probl Eng 2022; 2022: 2402989.

75.

Moore

. PyInform: a Python library for information-theoretic measures of time series data (Version 0.2.0), https://elife-asu.github.io/PyInform/index.html# (2019, accessed December 2024).

76.

Lizier

Heinzle

Horstmann

, et al. Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity. J Comput Neurosci 2011; 30: 85–107.

77.

Meert

Hendrickx

Van Craenendonck

, et al. DTAIDistance, https://github.com/wannesm/dtaidistance (2020, accessed December 2024).

78.

Pashami

Nowaczyk

Fan

, et al. Explainable predictive maintenance. arXiv preprint arXiv:230605120 2023.

A causal knowledge discovery framework as an enabler for equipment prescriptive maintenance

Abstract

Keywords

Background

Methodology

Identification of causal features for equipment health via transfer entropy

Clustering causal features based on dynamic time warping

Dynamic time warping

The proposed clustering algorithm

DTW distance matrix establishment

Cluster formation through optimal number of neighbors

Intra-cluster deep causality learning via attention-based convolutional neural networks

Case studies

Introduction to dataset

Study design and multi-stage data process

General preparation

Truck-wise health quantification

Stationarity test

Framework implementation and results presentation

Result discussion and implication

Conclusion

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

References