Sage Journals: Discover world-class research

Abstract

Objective

This study introduces LiteFallNet, a lightweight and interpretable deep learning model for real-time fall detection using only inertial sensor data. It aims to overcome key limitations in current systems, including high computational demands, latency, and privacy concerns, while delivering accurate and reliable performance.

Methods

LiteFallNet integrates a Gated Recurrent Unit (GRU) layer, a Temporal Convolutional Network (TCN) block, depthwise separable convolutions, and a Squeeze-and-Excitation (SE) block to efficiently extract temporal features from tri-axial accelerometer, gyroscope, and magnetometer signals. The model was trained and evaluated on the FallAllD and the UMAFall datasets. To enhance transparency, one-dimensional gradient-weighted class activation mapping (1D Grad-CAM) and local interpretable model-agnostic explanations (LIME) were used to interpret how the model made its predictions.

Results

The model on the FallAllD dataset achieved an accuracy of 97.81%, a recall of 98.55%, and an F1-score of 97.88%, with an area under the receiver operating characteristic curve of 99.33%. With a size of just 0.312 MB and an inference time of 7.07 ms, LiteFallNet combines strong performance with efficiency. These attributes make it highly suitable for deployment in real-time, resource-constrained environments.

Conclusion

LiteFallNet offers a privacy-preserving and real-time solution for fall detection. Its accuracy, transparency, and lightweight design make it suitable for smart homes, eldercare facilities, and wearable health technologies.

Keywords

Fall detection lightweight privacy-preserving deep learning model explainability

Introduction

Falls are a significant concern in public health, particularly among older adults and individuals with limited mobility.^1,2 They represent one of the leading causes of injury-related morbidity and mortality in older adults, with far-reaching physical, psychological, and economic consequences.^1–4 Approximately 37.3 million falls annually require medical attention, many of which result in life-altering injuries such as hip fractures and traumatic brain injuries.^5,6

In older adults, even minor falls can be catastrophic. Approximately 30% to 50% of falls result in minor injuries, and around 10% cause serious harm.^5,7 Notably, about 1% of falls among the elderly lead to hip fractures, which are strongly associated with postfall complications, increased dependency, and mortality.^7,8 Beyond physical harm, falls also impose psychological burdens such as fear of falling again,⁹ which often leads to reduced activity levels, social isolation, and diminished quality of life.¹⁰ Additionally, prolonged isolation can contribute to depression and anxiety, further discouraging activity.¹¹ This fear-driven inactivity contributes to a cycle of muscular atrophy, impaired balance, and greater fall risk.^12,13 Economically, fall-related injuries place a significant strain on healthcare systems due to emergency care, hospitalizations, rehabilitation, and long-term support,^4,14 especially for individuals with chronic conditions like diabetes and arthritis.^15,16

The interplay of physical decline, psychological distress, and financial hardship creates a vicious cycle of fear-induced inactivity and restriction of access to essential healthcare remedies.^17,18 These challenges underscore the critical need for efficient fall detection systems that provide timely alerts and interventions to reduce injuries and enhance quality of life.

Timely and accurate fall detection systems are crucial to reducing medical complications associated with falls and facilitating emergency response. Early systems relied on wearable push-button alerts, which were ineffective if the user was unconscious or unable to activate the device.¹⁹ This limitation led to the development of threshold-based algorithms that use motion parameters like acceleration and angular velocity. While improvements in smartphone-based systems have enhanced signal capture, especially when devices are worn on the hip,²⁰ threshold-based models struggle to adapt to varied user behavior, sensor positions, and environmental conditions,^21,22 often leading to high false alarm rates or missed falls.

Recently, machine learning and deep learning have opened the door to more reliable detection methods. Classical ML techniques like Support Vector Machines (SVM), Random Forests, and K-Nearest Neighbors have shown better performance by learning from labeled datasets.^23,24 However, their performance relies heavily on dataset diversity and is computationally inefficient, especially on edge devices.^22,25

In contrast, deep learning architectures such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and their hybrids have shown strong potential in modeling complex spatiotemporal patterns in raw sensor data.^26,27 These models offer higher accuracy and adaptability^21,28 but often have increased computational requirements. Vision-based deep learning models, while accurate, introduce privacy risks due to continuous video monitoring, even when techniques like skeletonization are used.^21,29–31 This constant surveillance can feel intrusive, especially in sensitive environments like bedrooms, bathrooms, or elderly care facilities. In addition, vision-based fall detection systems often rely on cloud computing, resulting in latency and security concerns.³² Some state-of-the-art models require millions of parameters and specialized hardware, making them impractical for real-time use in resource-constrained settings.

High-performing video-based models often come with significant computational demands. Many of them require complex deep learning architectures, such as hybrid CNN-RNN models, which are not optimized for edge deployment. For example, Dutt et al.'s CNN-based video fall detection model achieved a high accuracy of 98% but required real-time processing at 60 frames per second on a GPU, making it unsuitable for deployment on low-power edge devices.³³ Similarly, state-of-the-art models such as SDES-YOLO by Huang et al. and LFD-YOLO by Wang et al., despite their strong detection performance, require 2.9 million and 5.67 million parameters, respectively.^34,35

Sensor-based fall detection using Inertial Measurement Units (IMUs) offers a more privacy-preserving and efficient alternative. IMUs record only motion-related data, such as acceleration, angular velocity, and magnetic field strength, without capturing identifiable personal information.³⁶ Moreover, these systems can operate entirely on-device, enabling real-time responsiveness without cloud connectivity. This local processing also enhances user privacy by avoiding external data transmission and allows low latency for timely emergency responses.³⁷ However, existing sensor-based deep learning models still face tradeoffs between accuracy, model complexity, and computational generalizability. For instance, CNNs are good at capturing local spatial features but less effective at modeling temporal dependencies. RNNs like Long Short-Term Memory models (LSTMs) and GRUs excel in temporal modeling but are computationally intensive.^33,38

To address these limitations, this study introduces LiteFallNet, a novel lightweight deep learning model designed for real-time fall detection using only IMU sensor data. LiteFallNet addresses four significant challenges: (1) reducing computational complexity to enable real-time processing on edge devices; (2) eliminating privacy concerns associated with visual data collection; (3) maintaining high classification accuracy and model interpretability; and (4) maintaining a low latency to support real-time responsiveness.

LiteFallNet combines Gated Recurrent Units (GRUs) for short-term temporal modeling, Temporal Convolutional Network (TCN) block for capturing long-range dependencies, Depthwise Separable Convolutions to minimize parameter count, and Squeeze-and-Excitation (SE) blocks for channel-wise feature recalibration. To promote interpretability, we used one-dimensional gradient-weighted class activation mapping (1D Grad-CAM) and local interpretable model-agnostic explanations (LIME) to visualize sensor contributions during classification.

This article presents a comprehensive evaluation of LiteFallNet, including training on the FallAllD dataset, architectural robustness testing on the UMAFall dataset, and ablation studies to quantify the contribution of each architectural component. We demonstrate that LiteFallNet achieves high accuracy with low latency and memory footprint, making it suitable for deployment in smart homes, eldercare facilities, and wearable devices.

Methodology

LiteFallNet is a lightweight deep learning model capable of real-time fall detection, developed, and evaluated on two public inertial sensor datasets: FallAllD and UMAFall. FallAllD served as the primary dataset for model training and validation, while UMAFall was used to examine further the architectural robustness of LiteFallNet across different activity profiles. The model captures temporal and spatial patterns from tri-axial accelerometer, gyroscope, and magnetometer signals. The development pipeline included four stages: dataset acquisition, data preprocessing, model design, and evaluation, as outlined in Figure 1.

Figure 1.

Model pipeline.

Study design

This retrospective study was conducted between January and March 2025 at the Kwame Nkrumah University of Science and Technology, Ghana, to develop and evaluate the performance of LiteFallNet using publicly available datasets. Model development and training were performed in Python 3.10 with the TensorFlow 2.11 framework. All experiments were executed on a local machine (Intel Core i7, 16 GB RAM) and a cloud environment (Kaggle GPU: T4 x2, 29 GB RAM).

Data description

FallAllD dataset

The FallAllD dataset, obtained from the IEEE DataPort website,³⁹ was the primary dataset. It contains 6605 labeled instances across 23 activities, including diverse fall types and activities of daily living (ADLs), collected from 15 participants using wearable sensors at the neck, wrist, and waist. Table 1 summarizes the activity instances, showing that FallAllD is unbalanced.

Table 1.

Number of instances per activity.

Activity	Number of instances
Activities of daily living (ADL)	4883
Fall	1722

Despite the relatively small sample size of 15 participants, the dataset captures a wide range of activities, sensor placements, and environmental settings (both indoor and outdoor). This level of diversity reduces the risk of overfitting and promotes the development of models that generalize well across different real-world scenarios. Table 2 provides a summary of the FallAllD dataset characteristics.

Table 2.

Summary description of the FallAllD dataset.³⁹

Category	Description
Dataset name	FallAllD
Data format	Comma-separated values (CSV) files, MATLAB structure (.mat), HDF (.h5), and pickle (.pkl) formats
File duration	Each file contains 20 seconds of data. Files are centered around the transition moment for falls and transient ADLs, while cyclic ADLs are in the entire file
Sensor devices	3-Axial accelerometer, 3-axial gyroscope, 3-axial magnetometer, and barometer
Sensors used	Neck-worn, wrist-worn, and waist-worn devices with three identical data loggers
Number of activities	35 types of falls, 44 types of activities of daily living (ADLs)
Sensor ranges	Accelerometer: ±8g, gyroscope: ±2000 dps, magnetometer: ±4 Gauss
Sensor sampling rates	Accelerometer: 238 Hz, gyroscope: 238 Hz, magnetometer: 80 Hz, barometer: 10 Hz

UMAFall dataset

The UMAFall⁴⁰ dataset is a publicly available benchmark developed by the University of Málaga for human activity recognition and fall detection research. It comprises inertial data collected from 17 subjects (11 males and 6 females) aged between 18 and 60 years, who performed 11 types of activities, including 5 types of falls and 6 ADLs. A total of 728 fall instances and 2184 ADL instances were recorded using a waist-worn Shimmer3 device, which collected synchronized signals from a tri-axial accelerometer, gyroscope, and magnetometer. All signals were sampled at 20 Hz, providing a reliable temporal resolution for detecting dynamic motion patterns.

UMAFall was selected as a secondary dataset to test the architectural robustness of the model due to its high-quality annotations, consistent sampling rate, and sensor configuration, which align with the modalities used in the FallAllD dataset. Its inclusion in this study enabled a robust evaluation of LiteFallNet's reproducibility and architectural stability.

Data preprocessing

In the FallAllD dataset, tri-axial accelerometer, gyroscope, and magnetometer recordings were used for analysis, while barometric pressure signals were excluded because they do not capture motion-related information relevant to fall detection and may introduce inconsistencies in temporal alignment across modalities. No corrupted or incomplete recordings were identified during exploratory data analysis. Magnetometer recordings were upsampled to 238 from 80 Hz using linear interpolation to match the sampling rate of accelerometer and gyroscope data. Each instance was standardized to 4760 time-steps per 20 seconds by zero-padding for shorter sequences or truncation for sequences longer than 4760 time-steps.

All signal values were scaled by a factor of 1/10000 to normalize their wide dynamic range and stabilize the training process. Final inputs, stored as NumPy arrays, had a shape of (N, 4760, 9), where N is the number of instances, and 9 refers to tri-axial signals from the three sensor types. Class labels were then binarized: ADLs were labeled as 0, and falls as 1.

Regarding the UMAFall dataset, all available inertial signals (accelerometer, gyroscope, and magnetometer) were retained, after which the data was normalized and the input sequence length was adjusted to 400 time-steps to reflect a 20Hz sampling rate over a 20-second window.

Data augmentation and balancing

A data-level augmentation was used to address the class imbalance in the FallAllD dataset. Gaussian jittering,⁴¹ a time-series augmentation technique, was applied to the original fall samples by adding Gaussian noise (σ = 0.01). Each original fall instance was used to generate two augmented versions, effectively tripling the number of fall instances. After this augmentation, the final class distribution was nearly balanced, with 5166 fall samples and 4883 ADL samples.

Gaussian jittering (σ = 0.01) was also applied to rectify the class imbalance in the UMAFall dataset, obtaining a nearly balanced dataset of 2184 fall samples and 1733 ADL samples.

Datasets partitioning

The two augmented datasets were each partitioned independently into 60% training, 20% validation, and 20% test sets. A stratified splitting strategy was employed at the instance level to preserve the original class distribution (fall vs. ADL) across all subsets. Notably, the test set remained completely blind until the final evaluation after model development.

Table 3 presents the number of samples in each partition of the augmented FallAllD dataset.

Table 3.

Number of samples for training, validation, and test datasets.

Dataset	Samples
Training set	6029
Validation set	2010
Testing set	2010
Total	10,049

Model architecture

Model overview

LiteFallNet is a modular, multistage architecture consisting of:

Temporal Feature Extraction: Batch normalization, a Gated Recurrent Unit (GRU), and a Temporal Convolutional Network (TCN) layer to capture short-term and long-range temporal dependencies.

Feature Enhancement: A SE block for channel recalibration, depthwise separable convolutions, and max pooling for compact and informative representations.

Pooling Aggregation: Global average and max pooling are applied along the temporal dimension and concatenated to form a fixed-length embedding.

Classification: A fully connected layer followed by a sigmoid activation for binary fall or ADL prediction.

Figure 2 presents the model architecture of LiteFallNet, including its detailed layers and components.

Figure 2.

Model architecture of LiteFallNet.

Feature extraction

The model input is a time-series signal $X \in R^{T \times F}$ , where T denotes the number of time-steps and F = 9 represents tri-axial readings from the accelerometer, gyroscope, and magnetometer. T is defined as None at the input layer to accommodate variable-length inputs during training and inference, allowing flexibility based on the preprocessed data. Additionally, the input X is first passed through a batch normalization (BN) layer to stabilize the learning process and mitigate internal covariate shifts. The BN is applied independently to each feature channel as:

\begin{matrix} B N (x_{i}) = γ \cdot \frac{x_{i} - μ}{\sqrt{σ^{2} + ϵ}} + β \end{matrix}

(1)

where

μ

and

σ^{2}

are the mean and variance computed across the batch for each feature i;

γ

and

β

are trainable scale and shift parameters respectively; and

ϵ

, a small constant to ensure numerical stability.

In LiteFallNet, the learnable parameters $γ$ and $β$ adjust the normalized activations to task-specific requirements. Sensor channels that carry critical signals, such as sharp accelerometer spikes during a fall, can be amplified by $γ$ , while $β$ re-centers their activations when needed. Thus, BN stabilizes training without reducing the model's representational capacity. The resulting output is denoted $X_{B N} \in R^{T \times F}$ .

Following batch normalization, the normalized signal $X_{B N} \in R^{T \times F}$ is passed through a lightweight GRU layer comprising 16 memory cells. The GRU is selected for its efficiency in modeling sequential dependencies while mitigating vanishing gradient issues in traditional Recurrent Neural Networks.

At each time step t, the GRU receives as input the current feature vector $x_{t} \in R^{F}$ (a row from $X_{B N}$ ) and the hidden state from the previous step $h_{t - 1} \in R^{D}$ , where D = 16. These are combined to compute a new hidden state $h_{t}$ , which also serves as the output for that time step. The hidden state update is defined as:

\begin{matrix} h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ h_{t}^{\sim} \end{matrix}

(2)

where

h_{t}^{\sim}

is the candidate activation,

z_{t}

the update gate, and

⊙

element-wise multiplication.

The GRU employs two gating mechanisms, the update gate and the reset gate, that regulate how past information and new input are combined. The first gating mechanism, the update gate, determines how much of the previous hidden state should be preserved versus replaced by the candidate state:

\begin{matrix} z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1}) \end{matrix}

(3)

where

W_{z} \in R^{D \times F}

and

U_{z} \in R^{D \times D}

are learnable weight matrices applied to the current input

x_{t}

and the previous hidden state

h_{t - 1}

, respectively. A large

z_{t}

value favors incorporating more of the candidate

h_{t}^{\sim}

, while a small value prioritizes retaining information from the previous hidden state.

The second gating mechanism, the reset gate $r_{t}$ , controls how much of the past hidden state is ignored when computing the candidate state. The reset gate is defined as:

\begin{matrix} r_{t} = σ (W_{r} x_{t} + U_{r} h_{t - 1}) \end{matrix}

(4)

with

W_{r} \in R^{D \times F}

and

U_{r} \in R^{D \times D}

. By modulating the contribution of

h_{t - 1}

, the reset gate allows the GRU to selectively ‘forget’ irrelevant history when necessary.

The candidate hidden state $h_{t}^{\sim}$ integrates the current input $x_{t}$ and the reset-modulated past state as:

\begin{matrix} h_{t}^{\sim} = \tan h (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1})) \end{matrix}

(5)

where

W_{h} \in R^{D \times F}

and

U_{h} \in R^{D \times D}

are learnable parameters. The hyperbolic tangent bounds the activation values within [−1, 1], ensuring stable gradient flow and smooth transitions between states.

Together, the update and reset gates allow the GRU to strike a balance between retaining past information and incorporating new input. When the update gate $z_{t}$ takes values close to 1, the hidden state is strongly influenced by the candidate state $h_{t}^{\sim}$ ; conversely, when $z_{t}$ is close to 0, the GRU relies more heavily on the previous hidden state $h_{t - 1}$ . This dynamic control enables the network to model rapid changes and longer-term trends in the sequence. This property is essential in distinguishing abrupt fall events from the smoother, more gradual patterns of daily activities.

After temporal encoding by the GRU, the resulting sequence, $X_{G R U} \in R^{T \times 16}$ , is passed into a TCN block. The TCN is designed to model temporal dependencies at multiple scales through stacked dilated causal convolutions. The structure allows the model to efficiently capture both short-term patterns (e.g. quick body movements) and longer-term dependencies (e.g. gradual shifts in posture), both of which are essential in distinguishing fall events from daily activities.

The TCN employed in LiteFallNet uses 1D convolutional layers with 32 filters, a kernel size of 3, and a dilation rate of 2. These causal convolutions ensure that the outputs depend on the current, t, and past time-steps, t’, where t’ ≤t. By using dilation, the receptive field of the convolutional filters expands without requiring additional layers or parameters, allowing the model to “look back” farther in time while remaining computationally efficient. Mathematically, the dilation causal convolution at time t for a given output channel is expressed as:

\begin{matrix} X_{T C N} (t) = \sum_{i = 0}^{k - 1} W_{i} \cdot X_{G R U (t - d \cdot i)} + b \end{matrix}

(6)

where

W_{i} \in R^{D \times C}

are the convolutional filter weights, C is the number of output channels,

b \in R^{C}

is the bias vector, k is the kernel size, d is the dilation rate, and the summation aggregates contributions from multiple receptive fields. The output of this block,

X_{T C N} \in R^{T \times 32}

, captures temporal patterns across a wider range of past time steps than standard convolutions, since the dilation factor expands the receptive field without increasing the number of layers or parameters. The TCN output is passed through a Batch Normalization layer to stabilize the output further and improve convergence. Batch Normalization standardizes each channel's activations across the batch and applies a learned affine transformation to maintain representational flexibility. The result is a normalized feature map

X_{T C N}^{'} \in R^{T \times 32}

, which is passed through a LeakyReLU activation function defined as:

\begin{matrix} L e a k y R e L U (x) = f (x) = {\begin{matrix} x, i f x > 0 \\ α x, o t h e r w i s e \end{matrix} \end{matrix}

(7)

where α refers to the leakage rate set at 0.2. This leakage rate of 0.2 enables the network to maintain gradient flow for inputs that would otherwise be zeroed out by a standard ReLU, preserving relevant input features and ultimately leading to better accuracy, reduced overfitting, and faster convergence. The activated output denoted

X_{a c t} \in R^{T \times 32}

, represents a nonlinear transformation of the temporally encoded features, making them more expressive for subsequent processing.

The final output of the feature extraction block is given by:

\begin{matrix} X_{feat} = ϕ (B N (T C N (G R U (B N (X))))) \end{matrix}

(8)

where

ϕ ()

represents the LeakyReLU activation function. The resulting tensor

X_{feat} \in R^{T \times 32}

serves as the input to the subsequent enhancement block. Through this block, LiteFallNet effectively extends its temporal receptive field, enabling robust detection of both sudden falls and gradual temporal dynamics characteristic of daily living activities.

Feature enhancement

This stage is designed to refine and selectively amplify informative features by integrating an SE mechanism, a depthwise separable convolutional layer, and a max pooling operation. The enhancement process begins with the SE module, which recalibrates channel-wise features based on global temporal context. To achieve this, a global average pooling operation is applied across the temporal axis of each feature map. For a given feature f $\in {1, 2, \dots$ , F}, this operation computes a scalar descriptor:

\begin{matrix} s_{f} = \frac{1}{T} \sum_{t = 1}^{T} X_{feat} (t, f) \end{matrix}

(9)

The channel descriptor vector $s \in R^{F}$ encodes the average activation of each feature channel across time. This vector is then passed through a two-layer bottleneck network designed to model nonlinear interchannel dependencies. First, the dimensionality is reduced by a factor r = 16 using a 1D convolutional layer with a weight matrix $W_{1} \in R^{\frac{F}{r} \times F}$ , followed by a ReLU activation. Then, the reduced representation is projected back to the original channel dimension using a second 1D convolutional layer with a weight matrix $W_{2} \in R^{F \times \frac{F}{r}}$ , followed by a sigmoid activation. The operation is expressed as:

\begin{matrix} z = σ (W_{2} \cdot δ (W_{1} \cdot s)) \end{matrix}

(10)

where

δ (\cdot)

is the ReLU activation function and

σ (\cdot)

the sigmoid activation. The resulting vector

z \in R^{F}

contains channel-wise recalibration weights. These weights are then broadcast along the temporal axis and applied to the original feature map via element-wise multiplication:

\begin{matrix} X_{s e} (t, f) = z_{f} \cdot X_{feat} (t, f) \end{matrix}

(11)

This result in a recalibrated tensor $X_{s e} \in R^{T \times F}$ where informative channels are enhanced and less relevant ones are suppressed.

The output tensor $X_{s e}$ is passed through a depthwise separable convolutional layer to capture further localized temporal features and interchannel relationships with minimal computational burden. The operation consists of two stages: each input channel is convolved separately with its 1D filter (kernel size of 3, 64 filters), capturing local temporal dependencies within that channel. This results in an intermediate tensor $X_{d w} \in R^{T \times F}$ . Subsequently, a 1×1 convolution is applied across the depthwise outputs, enabling cross-channel feature integration while maintaining the same shape and enriching interchannel semantics. The output $X_{p w} \in R^{T \times F}$ is passed to the LeakyReLU activation function, introducing nonlinearity and yielding the output $X_{act} \in R^{T \times F}$ . The nonlinearity optimizes feature retention and gradient flow, which is essential for detecting critical yet subtle motion patterns when differentiating ADLs from fall events.

The final stage in the enhancement block is temporal downsampling via max pooling, which reduces the temporal resolution from T to T’ while preserving dominant local patterns. A max pooling operation does this with a stride and kernel size of 2, which is applied to $X_{a c t}$ . The operation is given by:

\begin{matrix} X_{enh} = maxpool (X_{act}) \end{matrix}

(12)

\begin{matrix} X_{enh} \in R^{T^{'} \times F} where T^{'} < T represents the time - steps \end{matrix}

(13)

The final output, $X_{enh}$ , summarizes both local and longer-range temporal dynamics.

Pooling aggregation block

An additional batch normalization layer is applied to the output of the enhancement block to prevent scale shifts across the channels, ensuring stable and balanced inputs for the subsequent pooling operations. This stabilized representation is then passed to the pooling aggregation stage, which is designed to generate a compact, fixed-length embedding of the temporal sequence. LiteFallNet employs a dual pooling strategy that combines global max pooling and global average pooling along the temporal axis to ensure that peak activations and overall trends are conserved, offering a rich and discriminative representation of temporal dynamics.

For a given feature f $\in {1, 2, \dots$ , F}, global max pooling selects the most dominant activation across all time steps, emphasizing strong local responses that may correspond to sudden or discriminative events such as falls:

\begin{matrix} \max_{f} = \max_{t \in [1, T^{'}]} X_{enh} (t, f) \end{matrix}

(14)

This operation results in a vector $\max \in R^{F}$ , where each element represents the strongest temporal response of a particular channel.

Subsequently, global average pooling summarizes the overall activity in each feature channel by taking the mean activation across time, thus capturing long-term trends or smoother activity patterns. The average for feature f is given by:

\begin{matrix} {avg}_{f} = \frac{1}{T^{'}} \sum_{t = 1}^{T^{'}} X_{enh} (t, f) \end{matrix}

(15)

The operation output yields a vector avg $\in R^{F}$ , which reflects general channel-level behavior across the entire sequence. The two resulting vectors, $\max$ and $avg$ , are then concatenated through the transformation:

\begin{matrix} Z = Concat (\max, avg) \in R^{2 F} \end{matrix}

(16)

By combining both the most salient activations (from max pooling) and the overall trends (from average pooling), the resulting embedding Z preserves both short bursts of activity and longer-duration context. This dual perspective is critical in fall detection, where fall events often appear as abrupt spikes superimposed on more gradual activity patterns. The rich embedding Z is then passed into the classifier block, where it is further processed for final binary classification between fall and ADL instances.

Classifier block

The classifier block takes the information-rich embedding Z $\in R^{2 F}$ produced by the pooling aggregation stage and maps it to the final binary decision (fall vs. ADL). This is achieved through a fully connected layer followed by a sigmoid output unit.

First, Z is passed through a dense (fully connected) layer with 64 neurons. This layer learns nonlinear interactions among the pooled features, thereby projecting the input Z into a higher-level latent space. The transformation is mathematically defined as:

\begin{matrix} h = L e a k y R e L U (W_{1} Z + b_{1}), h \in R^{64} \end{matrix}

(17)

where

W_{1} \in R^{64 \times 2 F}

is the learnable weight matrix of the dense layer, and

b_{1} \in R^{64}

the bias vector of the dense layer. The operation is then passed to the LeakyReLU activation function to ensure that gradient flow is preserved even for negative inputs. The resulting activation vector

h ϵ R^{64}

represents abstracted high-level features crucial for the final prediction. Next, h is fed into an output layer containing a single neuron with a sigmoid activation, which maps the transformed features to a scalar probability score. This final operation is expressed as:

\begin{matrix} \hat{y} = σ (W_{2} h + b_{2}), \hat{y} \in (0, 1) \end{matrix}

(18)

where

W_{2} \in R^{1 \times 64}

is the weight matrix of the output layer,

b_{2} \in R

the corresponding bias term, and

σ

the sigmoid activation function, which squeezes the output into the range (0, 1). The resulting scalar value

\hat{y}

represents the probability of the input sequence corresponding to a fall, with values close to 1 indicating a fall event and values close to 0 indicating an ADL. LiteFallNet achieves robust binary classification while maintaining computational efficiency by combining dense nonlinear transformation with sigmoid output. Refer to Supplemental Table 1 for the architectural layer summary.

Model training

The model was trained over 30 epochs using the Adam optimizer with a learning rate of 0.0001 and a batch size of 32. Adam was chosen for its adaptive learning capabilities and reliable convergence, particularly in deep learning tasks. A batch size of 32 provided a good balance between computational efficiency and stable gradient updates. The binary cross-entropy loss function was employed for the binary classification task, as it effectively captures differences in predicted probabilities and yields well-calibrated confidence scores.

Model performance evaluation

We conducted independent training and evaluation on FallAllD and UMAFall. The model was trained and tested exclusively on the FallAllD dataset in the first phase. The same architecture was retrained from scratch in the second phase and evaluated on the UMAFall dataset under similar preprocessing, class balance, and temporal windowing settings. This dual-phase evaluation allowed us to examine whether LiteFallNet consistently retains strong performance across datasets with comparable activity categories and sensor configurations, demonstrating its reproducibility and robustness to dataset-specific variations such as UMAFall.

Model performance was evaluated on both datasets using accuracy, precision, recall (sensitivity), specificity, F1-score, area under the receiver operating characteristic curve (AUC), and inference latency. Additionally, a confusion matrix was constructed for each experiment, detailing the distribution of true or false positives and negatives to visualize class-wise prediction behavior. Inference time was measured on a laptop (Intel Core i7, 16 GB RAM) using a batch of test samples.

Ablation studies

An ablation study investigated the contribution of each architectural component in LiteFallNet on the FallAllD dataset by systematically removing or modifying individual modules within the network. Five model variants were created by excluding one key component at a time: the GRU layer, the TCN block, the SE mechanism, and the depthwise separable convolutional layers (replaced with standard Conv1D layers). The complete LiteFallNet architecture served as the baseline.

Each model variant was trained and evaluated under identical conditions: consistent training, validation, test splits, optimizer configuration, learning rate, batch size, number of epochs, and evaluation metrics. Each configuration was run three times using different dataset seeds to ensure reproducibility and prevent stochastic effects. After training, metrics across the three runs were averaged for a statistically robust comparison. The average of each evaluation metric was computed, and paired t-tests were conducted to assess the statistical significance of performance differences between each ablated variant and the complete LiteFallNet model.

Explainability

To enhance the interpretability of LiteFallNet's fall and ADL classifications, we employed both LIME⁴² and 1D Grad-CAM⁴³ to provide local and temporal explanations. These methods brought transparency to the model's decision-making process by highlighting which features and time intervals most influenced each prediction.

LIME was used to generate local, post hoc explanations for individual predictions. Given LiteFallNet's multivariate input time-series data, each input window was flattened into a 1D vector. LIME created perturbed instances by selectively masking or altering short temporal patches within specific sensor axes to simulate minor motion variations. A custom wrapper function then reshaped these perturbed inputs into their original format before being passed to the model for inference. The LIME explainer was trained using the flattened training set, with each time-step and channel combination represented as a unique feature. The L2 norm was computed across all channels to facilitate visualization, producing a univariate motion signal. Feature importance weights were aggregated across sensor channels and overlaid as spans on the L2 signal to accentuate critical regions contributing to each classification.

1D Grad-CAM was applied to identify salient temporal regions that influenced model outputs. Gradients were computed with respect to the final convolutional layer, where channel-wise gradients were globally averaged and weighted against the corresponding activation maps to construct a temporal importance map. This map was passed through a ReLU activation function, normalized, and upsampled to match the original input length of the data (4760 time-steps for FallAllD and 400 time-steps for UMAFall). The resulting heatmap was overlaid on the mean signal across all sensor channels, visually emphasizing the time intervals most influential in LiteFallNet's prediction process.

Statistical analysis

Bootstrapping (1000 iterations) was used to compute 95% confidence intervals (CIs) for accuracy, precision, recall, F1-score, and AUC on the test set. This nonparametric approach allowed for robust estimation of metric variability without assuming normality, making it well-suited for evaluating model performance across resampled test distributions. Additionally, paired t-tests were conducted to establish the statistical significance of the ablation studies. All computations were performed using NumPy (v1.26.4), scikit-learn (v1.2.2), and scipy-stats (v1.15.2), with fixed random seeds to ensure reproducibility.

Results

This section presents the results of our evaluation of LiteFallNet, including classification performance, explainability insights, efficiency metrics, and architectural robustness testing across datasets. The model was first tested on the FallAllD dataset, followed by further training and testing on the UMAFall dataset to assess LiteFallNet's robustness.

Performance of model on FallAllD dataset

LiteFallNet achieved strong classification performance on the FallAllD test set when evaluated using the performance metrics of accuracy, precision, recall (sensitivity), F1-score, specificity, and AUC. Table 4 presents the performance of LiteFallNet on the test set of the FallAllD dataset.

Table 4.

Performance metrics of LiteFallNet on FallAllD dataset.

Proposed model	Accuracy	Precision	Recall (sensitivity)	F1-score	Specificity	AUC
LiteFallNet	97.81%	97.23%	98.54%	97.89%	97.03%	99.33%

AUC: area under the receiver operating characteristic curve.

To assess the stability of these results, we conducted a 1000-sample bootstrap resampling procedure. The resulting 95% confidence intervals (CIs) for the key metrics were narrow, indicating that performance variations were minimal. Specifically, the model achieved a mean test accuracy of 97.81% (95% CI: 97.16%–98.41%), precision of 97.24% (95% CI: 96.25%–98.21%), recall of 98.54% (95% CI: 97.75%–99.22%), F1-score of 97.89% (95% CI: 97.25%–98.53%), specificity of 97.03% (95% CI: 95.91%–98.04%), and AUC of 99.33% (95% CI: 99.01%–99.68%). These results confirm LiteFallNet's reliability and effectiveness across varying data distributions.

Training and validation curves showed early convergence, with accuracy stabilizing around 98% and a consistent decline in loss. Precision and recall remained closely aligned and stabilized at 97.23% and 98.54%, respectively. The AUC scores consistently reached 99%, indicating strong differentiation between fall and ADL instances. Figure 3 provides a detailed visualization of these performance trends.

Figure 3.

Graphs showing the performance metrics over epochs.

The confusion matrix is shown in Figure 4. It revealed that 1018 out of 1033 (1.45% false negatives) fall instances in the test set were correctly identified by LiteFallNet. For the ADLs, 948 out of 977 (2.97% false positives) were correctly identified. These low error rates demonstrate the model's reliability in distinguishing between falls and ADLs.

Figure 4.

Confusion matrix of LiteFallNet on the test set.

Model efficiency

LiteFallNet demonstrated high efficiency in both memory usage and processing speed, with 17,751 trainable parameters, a model size of 0.312 MB (TensorFlow format, including architecture metadata and training configuration), 69.34 KB of memory use at inference, a computational cost of 71 KFLOPS, and an inference time of 7.07 ms per sample. These results demonstrate the computational efficiency of LiteFallNet and confirm its suitability for deployment on real-time, resource-constrained edge devices such as wearables or embedded systems.

Ablation studies

We conducted an ablation study to evaluate each architectural component's contribution. Five LiteFallNet variants were created by removing or replacing key modules. Table 5 summarizes the average test performance across three random seeds on the FallAllD dataset, rounded to four decimal places for the whole model and its ablated variants.

Table 5.

Ablation study of LiteFallNet components.

Model variant	Average accuracy(%)	Average precision(%)	Average recall(%)	Average F1-score(%)	Average loss(%)	Total number of parameters
Without GRU	97.16	97.14	97.35	97.24	0.1158	15,559
Without TCN	96.00	95.99	96.26	96.12	0.1206	11,094
Without SE block	96.90	96.32	97.71	97.00	0.1189	17,589
Conv1D instead of SeparableConv1D	97.44	96.91	98.16	97.53	0.1073	21,751
Baseline (LiteFallNet)	97.78	96.75	99.00	97.86	0.0927	17,751

GRU: Gated Recurrent Unit; SE: Squeeze-and-Excitation; TCN: Temporal Convolutional Network.

When the GRU layer was removed, a slight reduction in recall and F1-score was observed. The TCN block was found to be the most critical component. Its removal caused the most significant performance drop, particularly in the recall by 2.74% and the F1-score by 1.74%. The decrease was statistically significant for accuracy, recall, and F1-score (p < 0.005). This observation was supported by a paired-sample t-test analysis across the three random seeds. The removal of the TCN block resulted in statistically significant declines in accuracy (t = 7.22, p = 0.0186), recall (t = 12.05, p = 0.0068), and F1-score (t = 7.84, p = 0.0159), confirming the TCN's critical role in modeling long-range dependencies. Replacing SeparableConv1D with a standard Conv1D, and singly removing the GRU and SE modules from the base model, also produced minor effects that led to performance degradation, but this was not statistically significant (p > 0.05). Removing the SE block caused moderate declines in recall and F1-score, while replacing the SeparableConv1D layer with a standard Conv1D introduced slight degradation in the F1-score and a small increase in test loss.

A table of t-statistics and p-values for all ablations is provided in Supplemental Table 2. These results reflect the additive contribution of each component to LiteFallNet's performance and computational behavior.

Architectural robustness testing on UMAFall dataset

To assess the robustness of LiteFallNet, it was retrained and evaluated on the UMAFall dataset. The model maintained excellent performance despite differences in sampling rates and activity sets. Table 6 presents the test performance metrics.

Table 6.

Test performance of liteFallNet on the UMAFall dataset.

Metric	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)	Specificity (%)	AUC (%)	Loss
Value	98.85	98.86	99.08	98.97	98.56	99.91	0.0381

AUC: area under the receiver operating characteristic curve.

The confusion matrix in Figure 5 further confirmed the model's discriminative strength, with only 8 misclassifications out of 784 test samples. The training and validation curves for loss, accuracy, precision, recall, and F1-score are shown in Figure 6. These curves reflect consistent convergence and minimal overfitting.

Figure 5.

Confusion matrix of LiteFallNet on the UMAFall test set.

Figure 6.

Graphs showing LiteFallNet's performance on the UMAFall dataset.

These results demonstrate LiteFallNet's ability to generalize effectively across heterogeneous fall detection datasets with varying sampling rates and activity profiles. Its consistent high performance underlines its robustness and practical applicability in real-world sensor-based fall detection scenarios.

Explainability

This study used Grad-CAM 1D and LIME to gain insights into LiteFallNet's decision logic across FallAllD and UMAFall datasets.

Grad-CAM 1D visualizations showed strong activation around abrupt motion spikes in fall signals (Figure 7b) and smooth distributions in ADL cases (Figure 7a), confirming that the model focuses on meaningful patterns. Red overlays indicate the most influential temporal segments contributing to each prediction. However, their patterns and alignment with the signal characteristics differed.

Figure 7.

Grad-CAM 1D visualizations of LiteFallNet prediction on FallAllD dataset. (a) ADL sample showing diffuse importance across periodic signals. (b) Fall sample with focused attention on abrupt signal changes. ADL: activities of daily living.

The LIME-generated explanations are presented in Figure 8, where Figure 8a corresponds to a fall instance and Figure 8b to an ADL instance. LIME provided localized insights by identifying time-steps with high positive (green) or negative (red) contributions. Fall samples consistently displayed green highlights near signal peaks, while ADL samples exhibited red segments across steady regions. The overlaid bands are mapped onto the univariate motion-intensity signal derived from the original sensor data.

Figure 8.

LIME explanations of LiteFallNet predictions on FallAllD dataset. (A) Fall sample with localized contributions around motion spikes. (B) ADL sample showing broader, dispersed contributions. ADL: activities of daily living; LIME: local interpretable model-agnostic explanations.

When applied to the UMAFall dataset, Grad-CAM 1D yielded activation patterns that were qualitatively consistent with those observed on the FallAllD dataset (Figure 9a and b). In fall instances, the model consistently exhibited strong activations around rapid signal transitions—particularly near sharp peaks in acceleration or angular velocity—indicating its sensitivity to abrupt, high-intensity motion typical of fall events. Conversely, ADL instances showed more diffuse and low-magnitude activations, primarily concentrated around stable signal regions, aligning with the smoother nature of routine activities.

Figure 9.

Grad-CAM 1D visualizations of LiteFallNet prediction on UMAFall dataset. (A) ADL sample showing diffuse importance across periodic signals. (B) Fall sample with focused attention on abrupt signal changes. ADL: activities of daily living.

LIME explanations demonstrated a similarly stable pattern across datasets. As illustrated in Figure 10a and b, time-steps corresponding to high-motion bursts in fall sequences were consistently highlighted with strong positive contributions (green), whereas steady or low-motion segments in ADL sequences were marked with negative contributions (red).

Figure 10.

LIME explanations of LiteFallNet predictions on UMAFall dataset. (A) Fall sample with localized contributions around motion spikes. (B) ADL sample showing broader, dispersed contributions. ADL: activities of daily living; LIME: local interpretable model-agnostic explanations.

These explainability tools confirm that LiteFallNet's decisions are grounded in intuitive, clinically relevant signal patterns.

Discussion

This study introduced LiteFallNet, a lightweight and interpretable deep learning model for real-time fall detection using only inertial measurement unit (IMU) data. LiteFallNet addresses three critical challenges in fall detection: computational inefficiency, privacy concerns, and latency in real-time systems. The model demonstrated strong performance across multiple datasets while maintaining a compact footprint, making it suitable for edge deployment.

LiteFallNet's architecture was designed to balance temporal modeling capacity with computational efficiency, leveraging the complementary strengths of GRUs, TCNs, SE blocks, and depthwise separable convolutions. Firstly, GRU effectively captures short-term temporal dynamics without the computational overhead of traditional RNNs or LSTMs, allowing it to recognize rapid or transient fall indicators. Next, TCNs, using dilated and causal convolutions, model long-range temporal dependencies crucial for detecting gradual or multiphase falls, such as slips, loss of balance, and descent, in the time-series input signal. The SE blocks recalibrate channel-wise feature importance, enhancing signal discrimination and noise suppression. Finally, the depthwise separable convolutions reduce the model's parameter count and computational load without compromising classification accuracy.

Ablation studies empirically confirmed the additive value of each component (Table 5). Removing the TCN block caused the most significant degradation: 1.81% in accuracy, 2.77% in recall, and 1.78% in F1-score. A paired t-test further confirmed that these reductions were statistically significant across accuracy (p = 0.00186), recall (p = 0.0068), and F1-score (p = 0.0159), emphasizing the importance of TCNs in capturing extended sequential features critical to fall detection and overall performance of LiteFallNet (see Supplementary file for more details). In comparison, the removal of the GRU layer led to a 1.66% drop in recall and 0.63% in F1-score, effects that, while statistically insignificant (p > 0.05), still reflect the importance of GRU's short-term temporal memory encoding in detecting quick transitions or subtle movements typical in falls, especially in borderline cases. Similarly, eliminating the SE block yielded a 0.88% decrease in F1-score, reinforcing the role of adaptive feature recalibration in refining signal discrimination, leading to LiteFallNet's precision and overall classification quality. Replacing SeparableConv1D with standard Conv1D incurred only a marginal 0.34% decline in F1-score. Still, it substantially increased computational load, confirming that depthwise separable convolutions are key to LiteFallNet's efficiency-performance tradeoff (high-performance, low parameter count). Though not all effects were statistically significant (p < 0.05), their combined effects suggest a meaningful contribution to the model's overall performance. While TCNs emerged as the most critical to performance, the other architectural elements each contributed meaningful, complementary functions. Their integration enables LiteFallNet to achieve a robust balance between high classification performance and computational efficiency, attaining an overall accuracy of 97.81%, a precision of 97.23%, a recall of 98.55%, an F1-score of 97.88%, and an AUC of 99.41%.

The comparative analysis presented in Table 7 demonstrates the strong real-world competitiveness of LiteFallNet against several state-of-the-art fall detection models trained on the same IMU-based datasets. When evaluating models trained on the FallAllD dataset, LiteFallNet consistently outperforms the Multilayer Mobile Edge Computing with Knowledge Distillation (MECKD)⁴⁴ model, which reported an accuracy of approximately 93.89% and an F1-score of 92.99% despite relying on a heavyweight architecture with approximately 14.64 million parameters. Compared to the MECKD model, LiteFallNet demonstrates a more favorable balance between performance and computational efficiency. Other models achieved comparable performance only on specific metrics. For example, the LSTM-based LSTM-Based Convolutional Variant Autoencoder (CVAE)⁴⁵ recorded high precision (97.61%) but fell short in accuracy (93.42%) and recall (91.96%). The discrepancies in precision, accuracy, and recall are suggestive that the model was cautious in predicting falls by reducing false positives, but at the cost of missing many true fall events. Such scenarios are often linked to the tendency of models to focus on the majority class, making them less responsive to rare but critical events like falls. Missing falls could lead to delayed interventions or undetected incidents in clinical settings, potentially putting individuals at risk. Similarly, the Coarse-Fine CNN-GRU⁴⁶ ensemble model achieved a relatively high accuracy of 97.95% but a low recall of 92.54% and a precision of 96.13%. This 4% gap between recall and precision suggests the model prioritizes minimizing false positives over capturing true falls. Such a prioritization bias mainly occurs when models are trained on imbalanced datasets without architectural compensation. In contrast, the performance metrics of LiteFallNet are well balanced with 97.81% accuracy, 98.54% recall, and 97.89% F1-score while maintaining a compact model size (0.312 MB) and inference time of 7.07 ms.

Table 7.

Comparison of LiteFallNet with state-of-the-art models.

Model	Dataset	Accuracy (%)	Precision (%)	Sensitivity / recall (%)	F1-score (%)	Specificity (%)	Parameters
SeqTCN⁴⁷	UMAFall	92.00	84.00	85.00	85.00	—	—
LSTM-based CVAE⁴⁵	FallAllD	93.42	97.61	91.96	94.70	—	—
Coarse-Fine CNN-GRU⁴⁶	FallAllD	97.95	96.13	92.54	94.26	—	—
PreFallKD³⁸	KFall	98.05	90.62	94.79	92.66	98.53	59,557
MECKD⁴⁴	FallAllD	93.89	—	—	92.99	—	∼14.64 million
1D-FCN⁵¹	UP Fall	99.52	98.14	98.70	98.38	99.65	59,170
DSCS⁵²	SisFall	99.32	98.58	99.15	—	—	—
DSCS⁵²	MobiFall	99.65	98.39	100.00	—	—	—
LiteFallNet (proposed)	FallAllD	97.81	97.23	98.54	97.89	97.03	17,751 (0.312 MB)
LiteFallNet (proposed)	UMAFall	98.85	98.86	99.08	98.97	98.56	17,751 (0.312 MB)

CNN: Convolutional Neural Networks; CVAE: LSTM-Based Convolutional Variant Autoencoder; DSCS: Dual-Stream Convolutional Neural Network Self-attention model; GRU: Gated Recurrent Unit; LSTM: Long Short-Term Memory model; MECKD: Multilayer Mobile Edge Computing with Knowledge Distillation.

SeqTCN,⁴⁷ trained on the UMAFall dataset, achieved an accuracy of 92.00% and a recall of 85.00%, indicating limited sensitivity in detecting actual fall events. While the model may appear reasonably accurate overall, its low recall suggests a high rate of missed falls, significantly undermining its reliability in clinical settings where identifying true fall incidents are paramount. In contrast, LiteFallNet performed excellently when retrained on the same dataset (see Table 6). It achieved a 98.85% accuracy, 99.08% recall, and 99.91% AUC, considerably higher than SeqTCN's performance. These results showcase LiteFallNet's adaptability to new datasets without overfitting or losing performance.

PreFallKD,³⁸ which was trained on the Kfall dataset, achieved a slightly higher accuracy of 98.05% than LiteFallNet (when trained on the FallAllD dataset). However, its precision (90.62%) and recall (94.79%) were lower, indicative that PreFallKD could be more prone to false alarms and missed falls. The moderately lower recall of PreFallKD also suggests a limited sensitivity to actual fall events. Compared to the PreFallKD model, LiteFallNet demonstrates a more balanced performance across all metrics, which enhances its robustness and reliability. Furthermore, LiteFallNet uses only 17,751 parameters, substantially fewer than PreFallKD's 59,557, underscoring its computational efficiency and suitability for deployment on edge devices with limited processing resources.

From Table 7, LiteFallNet holds a competitive position even when compared to high-performing models trained on other datasets such as UP Fall,⁴⁸ SisFall,⁴⁹ and MobiFall.⁵⁰ The 1D-FCN⁵¹ model, trained on the UP-Fall dataset, reported superior accuracy (99.52%) and specificity (99.65%) compared to LiteFallNet's performance on FallAllD (97.81% accuracy and 97.03% specificity). However, it is worth noting that the UP-Fall dataset lacks magnetometer data, which LiteFallNet utilizes to improve spatial awareness and detect orientation-based fall dynamics, further enhancing its real-world utility. LiteFallNet performed better in three evaluation metrics when trained and tested on the UMAFall dataset. LiteFallNet recorded 99.08% recall, 98.86% precision, and a 98.97% F1-score on UMAFall, outperforming 1D-FCN's respective scores of 98.70%, 98.14%, and 98.38%. These results suggest that LiteFallNet offers a better balance between sensitivity and reliability, especially in fall-critical contexts where false negatives and false positives could have detrimental effects. LiteFallNet's compactness and efficiency stem from its architecture, which combines a GRU layer, TCNs, an SE block, and depthwise separable convolutions to effectively extract temporal and spatial features. These components enable high performance with just 17,751 parameters, making LiteFallNet suited for deployment on resource-constrained edge devices. In contrast, 1-Dimensional Fully Connected Network (1D-FCN) lacks temporal and attention mechanisms, relying on stacked convolutions and requiring over 59,170 parameters to achieve comparable or lower results. While 1D-FCN achieves a slightly faster inference time (5 vs. 7.07ms), LiteFallNet compensates by offering stronger overall performance with a much smaller parameter budget. It provides a more favorable balance between architectural efficiency, computational speed, and predictive accuracy, underscoring its suitability for real-world deployment. Among the models compared, only the 1D-FCN study reported inference time, while the remaining works did not report such runtime metrics. Therefore, we emphasized parameter count alongside the main performance metrics to demonstrate LiteFallNet's efficiency.

Similarly, the Dual-Stream Convolutional Neural Network Self-attention model (DSCS)⁵² model, evaluated on SisFall and MobiFall, reported high accuracy (99.32% and 99.65%) and perfect recall (100%) on MobiFall. However, LiteFallNet surpassed DSCS in precision on both datasets (98.86% vs. 98.58% on SisFall and 98.86% vs. 98.39% on MobiFall) when evaluated on the UMAFall dataset. This higher precision indicates LiteFallNet's stronger ability to avoid false alarms, which is crucial in clinical or home settings. Additionally, while DSCS is described as lightweight and embeddable, the paper provides no explicit parameter count or model size, making it challenging to compare deployment feasibility objectively. Conversely, LiteFallNet has just 17,751 parameters and a compact size of 0.312 MB, making it scalable across diverse sensor-rich deployment scenarios.

Explainability remains a critical factor in clinical AI applications. This study used Grad-CAM 1D and LIME to unpack the model's decision-making process. Grad-CAM 1D visualizations offered crucial insight into the interpretability of LiteFallNet by revealing the temporal regions most influential in the model's decision making. In fall samples, the attention represented by red overlays was sharply concentrated around abrupt spikes in the mean sensor signal, aligning with the sudden, high-magnitude changes typically observed during real fall events. This focused activation suggests that the model has effectively internalized physiologically meaningful cues associated with fall dynamics. Conversely, in ADL sequences, the attention was more diffusely distributed or centered on smoother signal segments, indicating that the model recognized the steady and continuous nature typical of nonfall activities. These visual patterns highlight that LiteFallNet is not reacting randomly to signal changes but is instead attentive to consistent features that reflect how normal human movements unfold over time. The alignment between the model's attention and the expected behavioral signatures of falls versus ADL strengthens confidence in its reasoning process. It bolsters its acceptability for deployment in real-world, safety-critical environments. LIME explanations also reinforced LiteFallNet's interpretability by identifying specific time steps that contributed most significantly to the model's predictions. For fall sequences, LIME consistently highlighted clusters of green bars representing strong positive contributions, particularly around segments with high signal intensity, short bursts, or sharp spikes. These patterns indicate that the model associated these abrupt transitions with fall events and was confident in its predictions based on those localized signal features. In contrast, ADL samples exhibited predominantly red bars scattered throughout the signal, signifying negative contributions to the fall class and reflecting the model's recognition of smooth, nontransitional movements as evidence against a fall. The distinct contrast in LIME outputs, green dominating in falls and red in ADLs, mirrors the Grad-CAM results and underscores the model's precision in differentiating falls from ADLs over time. The LIME outputs confirm that LiteFallNet's predictions are reliable and grounded in behaviorally and physiologically meaningful features.

The architectural design of LiteFallNet reflects a deliberate response to the operational demands of real-world fall detection, particularly in settings characterized by privacy sensitivity, constrained connectivity, and limited computational resources. The model overcomes the infrastructural and ethical concerns associated with vision-based systems by leveraging inertial sensor data and supporting rapid on-device inference. Its lightweight architecture and efficient temporal data handling enable continuous monitoring on embedded devices, such as wearables and ambient IoT sensors, without cloud connectivity or high-bandwidth networks. These strengths make LiteFallNet a practical solution for real-world deployment in environments where autonomy, energy efficiency, and real-time responsiveness are non-negotiable, such as smart homes, eldercare settings, and mobile health applications.

In addition to technical feasibility, LiteFallNet can be scalable across heterogeneous deployment environments. Its modular design supports seamless integration into existing edge-based health monitoring infrastructures, particularly in resource-limited settings where computational and financial constraints often preclude using more complex models. The alignment between the model's internal representations and domain-relevant motion patterns further supports its clinical transparency and regulatory plausibility. LiteFallNet can potentially serve as a deployable system capable of addressing the increasing need for reliable, cost-effective fall detection in aging populations across diverse care settings by coupling high model fidelity with architectural efficiency.

Despite its strengths, LiteFallNet has some limitations. Both datasets used in this study were collected under semicontrolled conditions and involved a relatively small number of participants. Although the datasets included relatively few participants, they provided thousands of labeled activity instances across diverse activities, sensor placements, and environments. This diversity reduces overfitting risks and improves the model's robustness. Although LiteFallNet provided consistent performance across the two distinct datasets, the relatively small number of participants in the datasets limits the generalizability of our findings. Nonetheless, future studies using larger and more heterogeneous participant populations are needed to confirm external validity and improve generalizability. Larger-scale, real-world datasets from hospitals, nursing homes, or community settings would offer a more rigorous test of the model's robustness. Future work could also explore multisensor fusion strategies, such as combining IMUs from different body parts or integrating environmental sensors, to improve detection under more complex conditions. Investigating domain adaptation techniques could also help the model generalize across diverse user populations and sensor configurations.

Conclusion

LiteFallNet is a compact, interpretable deep learning model optimized for fall detection using only inertial sensor data. It delivers both speed and accuracy. It works in real time, protects user privacy, and requires very little computing power, making it ideal for wearable and smart home systems.

The model performed consistently across two public datasets and demonstrated strong robustness and low latency. It also offers transparent decision making, helping users and clinicians trust how and why it works. With its balance of performance, simplicity, and explainability, LiteFallNet is a strong candidate for real-world deployment in next-generation digital health solutions.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251386698 - Supplemental material for LiteFallNet: A lightweight deep learning model for efficient real-time fall detection

Supplemental material, sj-docx-1-dhj-10.1177_20552076251386698 for LiteFallNet: A lightweight deep learning model for efficient real-time fall detection by Emmanuel Owusu, Isaac Acquah, Michael Asiedu Asare and Benjamin Appiah Yeboah in DIGITAL HEALTH

Footnotes

Acknowledgments

The authors acknowledge the use of Grammarly software to assist with language editing. All scientific content, analysis, and interpretations were performed by the authors, who carefully reviewed and edited the text and take full responsibility for the final version of this manuscript.

ORCID iDs

Emmanuel Owusu

Isaac Acquah

Michael Asiedu Asare

Benjamin Appiah Yeboah

Ethical approval

The datasets used in this study were obtained from publicly available Kaggle repositories that had been ethically sourced and verified by expert clinicians. Although these datasets were de-identified and openly accessible, we obtained ethical approval from the Committee on Human Research, Publication, and Ethics of Kwame Nkrumah University of Science and Technology (CHRPE/AP/372/25) to ensure compliance with institutional standards and to reinforce our commitment to the responsible use of human-related data.

Author contribution

EO, IA, MAA, and BY was involved in conceptualization, formal analysis, and investigation; EO and MAA in methodology; ; EO, IA, and MAA writing—original draft preparation; IA and BY in writing—review and editing; and IA in supervision.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The FallAllD dataset used in this research to train and test the developed model is publicly available at https://ieee-dataport.org/open-access/fallalld-comprehensive-dataset-human-falls-and-activities-daily-living. The UMAFall dataset used in this research to train and test the developed model is publicly available at . We do not own these data.

Guarantor

IA.

Informed consent

Not applicable.

Peer review

This article was peer-reviewed by external and independent referees. The journal's editorial team managed the review process, and all reviewer comments were addressed by the authors before final acceptance.

Supplemental material

Supplemental material for this article is available online.

References

Kimura

Ruller

Frank

, et al. Incidence morbidity and mortality from falls in skilled nursing facilities: a systematic review and meta-analysis. J Am Med Dir Assoc 2023; 24: 1690–1699.e6.

Vaishya

Vaish

. Falls in older adults are serious. Indian J Orthop 2020; 54: 69–74.

Alamgir

Muazzam

Nasrullah

. Unintentional falls mortality among elderly in the United States: time for action. Injury 2012; 43: 2065–2071.

James

Lucchesi

Bisignano

, et al. The global burden of falls: global, regional and national estimates of morbidity and mortality from the Global Burden of Disease Study 2017. Inj Prev 2020; 26: i3–i11.

World Health Organization . Falls. WHO; 2024. Available from: https://www.who.int/news-room/fact-sheets/detail/falls (accessed 24 January 2025).

Center for Disease Control and Prevention . Facts about Falls. Older Adult Fall Prevention. CDC; https://www.cdc.gov/falls/data-research/facts-stats/index.html (2024, accessed 1 February 2025).

Bradley

. Falls in older adults. Mt Sinai J Med 2011; 78: 590–595.

Ilic

Ristic

Stojadinovic

, et al. Epidemiology of hip fractures due to falls. Medicina (B Aires) 2023; 59: 1528.

Tamulaityte-Morozoviene

Dadoniene

Stukas

, et al. Fear of falling as a psychological consequence of fall in older adults. Eur J Public Health 2024; 34: ckae144.2240.

10.

Sapmaz

Mujdeci

. The effect of fear of falling on balance and dual task performance in the elderly. Exp Gerontol 2021; 147: 111250.

11.

Dayyani

Chu

Abedi

, et al. Social isolation in older adults post hip surgery is correlated with mobility and physiological indicators. Innov Aging 2023; 7: 876–877.

12.

Zhao

Chai

Gao

, et al.

Physical mobility, social isolation and cognitive function: are there really gender differences?

Am J Geriatr Psychiatry 2023; 31: 726–736.

13.

Yokoyama

Furuhashi

Yamamoto

, et al. An examination of the potential benefits of expert guided physical activity for supporting recovery from extreme social withdrawal: two case reports focused on the treatment of Hikikomori. Front Psychiatry 2023; 14: 1084384.

14.

Tran

, et al. Economic burden and financial vulnerability of injuries among the elderly in Vietnam. Sci Rep 2023; 13: 19254.

15.

Florence

Bergen

Atherly

, et al. Medical costs of fatal and nonfatal falls in older adults. J Am Geriatr Soc 2018; 66: 693–698.

16.

Alekna

Stukas

Tamulaitytė-Morozovienė

, et al. Self-reported consequences and healthcare costs of falls among elderly women. Medicina (B Aires) 2015; 51: 57–62.

17.

Henning-Smith

. Quality of life and psychological distress among older adults: the role of living arrangements. J Appl Gerontol 2016; 35: 39–61.

18.

Biosca

Bellazzecca

Donaldson

, et al. Living on low-incomes with multiple long-term health conditions: a new method to explore the complex interaction between finance and health. PLoS ONE 2024; 19: e0305827.

19.

Abraham

Nameer

Tom

, et al. Pro-Safe: An IoT based Smart Application for Emergency Help. In: 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), p. 593–597. doi:10.1109/ICICICT46008.2019.8993220.

20.

Fudickar

Lindemann

Schnor

. Threshold-based fall detection on smart phones. In: Proceedings of the International Conference on Health Informatics, Angers, France: SCITEPRESS; 2014. p. 303–309. doi:10.5220/0004795803030309.

21.

Igual

Medrano

Plaza

. Challenges, issues and trends in fall detection systems. Biomed Eng OnLine 2013; 12: 66.

22.

Santoyo-Ramón

Casilari-Pérez

Cano-García

. A study on the impact of the users’ characteristics on the performance of wearable fall detection systems. Sci Rep 2021; 11: 23011.

23.

Tran

Dao

SVT

. A feature selection approach for fall detection using various machine learning classifiers. IEEE Access 2021; 9: 115895–115908.

24.

Khawandi

Ballit

Daya

. Applying machine learning algorithm in fall detection monitoring system. In: 2013 5th International Conference and Computational Intelligence and Communication Networks (CICN), Mathura, India: IEEE; 2013. p. 247–250. doi: 10.1109/CICN.2013.59.

25.

Usmani

Saboor

Haris

, et al. Latest research trends in fall detection and prevention using machine learning: a systematic review. Sensors 2021; 21: 5134.

26.

De Oliveira

da S Colombo

Nunes

DJV

. Machine learning applied to fall detection in the elderly. In: Proceedings of the 20th Brazilian Symposium on Information Systems, New York, NY, USA: Association for Computing Machinery, p. 1–9. doi:10.1145/3658271.3658330.

27.

El Attaoui

Largo

Kaissari

, et al. Machine learning-based edge-computing on a multi-level architecture of WSN and IoT for real-time fall detection. IET Wirel Sens Syst 2020; 10: 320–332.

28.

Ordoñez Nuñez

Celeste Ghizoni Teive

Rafael Garcia Ramirez

. A robotics-based machine learning approach for fall detection of people. In: Habib

(ed.) Cognitive robotics and adaptive behaviors. Rijeka: IntechOpen, 2022 Oct 6, pp. 1–14. doi: https://doi.org/10.5772/intechopen.106799.

29.

Purwar

Chawla

. A systematic review on fall detection systems for elderly healthcare. Multimed Tools Appl 2024; 83: 43277–43302.

30.

Alam

Sufian

Dutta

, et al. Real-time human fall detection using a lightweight pose estimation technique. In: Dasgupta

Mukhopadhyay

Mandal

, et al. (eds) Computational intelligence in communications and business analytics. CICBA 2023. Communications in Computer and Information Science, vol 1956. Cham: Springer, 2023 Nov 20, pp. 30–40. doi: 10.1007/978-3-031-48879-5_3.

31.

Shin

Miah

ASM

Egawa

, et al. Fall recognition using a three stream spatio temporal GCN model with adaptive feature aggregation. Sci Rep 2025; 15: 10635.

32.

Nahian

Ghosh

Banna

, et al. Towards an accelerometer-based elderly fall detection system using cross-disciplinary time series features. IEEE Access 2021; 9: 39413–39431.

33.

Dutt

Gupta

Goodwin

, et al. An interpretable modular deep learning framework for video-based fall detection. Appl Sci 2024; 14: 4722.

34.

Huang

Yuan

, et al. SDES-YOLO: a high-precision and lightweight model for fall detection in complex environments. Sci Rep 2025; 15: 2026.

35.

Wang

Chen

, et al. LFD-YOLO: a lightweight fall detection network with enhanced feature extraction and fusion. Sci Rep 2025; 15: 5069.

36.

Jain

Semwal

. A novel feature extraction method for preimpact fall detection system using deep learning and wearable sensors. |3294 2022; 22: 22943–22951.

37.

Mujirishvili

Maidhof

Florez-Revuelta

, et al. Acceptance and privacy perceptions toward video-based active and assisted living technologies: scoping review. J Med Internet Res 2023; 25: e45297.

38.

Chi

T-H

Liu

K-C

Hsieh

C-Y

, et al. Prefallkd: Pre-impact fall detection via cnn-vit knowledge distillation. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5. doi: 10.1109/ICASSP49357.2023.10094979.

39.

SALEH

. FallAllD: A comprehensive dataset of human falls and activities of daily living, https://ieee-dataport.org/open-access/fallalld-comprehensive-dataset-human-falls-and-activities-daily-living (2020, accessed 30 January 2025).

40.

Casilari

Santoyo-Ramón

. UMAFall: Fall Detection Dataset (Universidad de Malaga). UMAFall Fall Detect Dataset Univ Malaga. url:https://figshare.com/articles/dataset/UMA_ADL_FALL_Dataset_zip/4214283/7.

41.

Iwana

Uchida

. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 2021; 16: e0254841.

42.

Battah

AL-Saedi

. Anomaly detection in network traffic using 1D CNNs: Insights from explainable AI techniques. Baghdad, Iraq, p. 020008. doi:10.1063/5.0265846.

43.

Shi

G-Y

H-P

Luo

S-H

, et al. 1D gradient-weighted class activation mapping, visualizing decision process of convolutional neural network-based models in spectroscopy analysis. Anal Chem 2023; 95: 9959–9966.

44.

Mao

W-L

Wang

C-C

Chou

P-H

, et al. MECKD: deep learning-based fall detection in multilayer mobile edge computing with knowledge distillation. |3294 2024; 24: 42195–42209.

45.

M-K

Han

Hwang

. Fall detection of the elderly using denoising LSTM-based convolutional variant autoencoder. |3294 2024; 24: 18556–18567.

46.

Liu

C-P

J-H

Chu

E-P

, et al. Deep learning-based fall detection algorithm using ensemble model of coarse-fine CNN and GRU networks. In: 2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Jeju, Korea, Republic of: IEEE, pp. 1–5. doi: 10.1109/MeMeA57477.2023.10171944.

47.

Huang

, et al. Physics Sensor based Deep Learning Fall Detection System. ArXiv Preprint ArXiv240306994. doi:10.48550/arXiv.2403.06994.

48.

Martínez-Villaseñor

Ponce

Brieva

, et al. UP-fall detection dataset: a multimodal approach. Sensors 2019; 19: 1988.

49.

Sucerquia

López

Vargas-Bonilla

. Sisfall: a fall and movement dataset. Sensors 2017; 17: 98.

50.

Vavoulas

Pediaditis

Spanakis

, et al. The MobiFall dataset: An initial evaluation of fall detection algorithms using smartphones. In: 13th IEEE International Conference on BioInformatics and BioEngineering, pp. 1–4. doi: 10.1109/BIBE.2013.6701629.

51.

Mahfouz

Fawzi

. Motion sensor-based fall detection using 1D-FCN. In: 2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 483–488. doi: 10.1109/NILES63360.2024.10753201.

52.

Zhang

Liu

, et al. An effective deep learning framework for fall detection: model development and study design. J Med Internet Res 2024; 26: e56750.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB