Detection of disease on nasal breath sound by new lightweight architecture: Using COVID-19 as an example

Abstract

Background

Infectious diseases, particularly COVID-19, continue to be a significant global health issue. Although many countries have reduced or stopped large-scale testing measures, the detection of such diseases remains a propriety.

Objective

This study aims to develop a novel, lightweight deep neural network for efficient, accurate, and cost-effective detection of COVID-19 using a nasal breathing audio data collected via smartphones.

Methodology

Nasal breathing audio from 128 patients diagnosed with the Omicron variant was collected. Mel-Frequency Cepstral Coefficients, a widely used feature in speech and sound analysis, were employed for extracting important characteristics from the audio signals. Additional feature selection was performed using random forest (RF) and principal component analysis (PCA) for dimensionality reduction. A Dense-ReLU-Dropout model was trained with K-fold cross-validation (K = 3), and performance metrics like accuracy, precision, recall, and F1-score were used to evaluate the model.

Results

The proposed model achieved 97% accuracy in detecting COVID-19 from nasal breathing sounds, outperforming state-of-the-art methods such as those by Lella and Alphonse and Abayomi-Alli et al. Our Dense-ReLU-Dropout model, using RF and PCA for feature selection, achieves high accuracy with greater computational efficiency compared to existing methods that require more complex models or larger datasets.

Conclusion

The findings suggest that the proposed method holds significant potential for clinical implementation, advancing smartphone-based diagnostics in infectious diseases. The Dense-ReLU-Dropout model, combined with innovative feature processing techniques, offers a promising approach for efficient and accurate COVID-19 detection, showcasing the capabilities of mobile device-based diagnostics

Keywords

Digital medicine disease detection machine learning deep neural network respiratory illness

Introduction

The global outbreak of COVID-19 in 2019 has posed unprecedented challenges to public health systems, with emerging variants such as Omicron exacerbating the crisis due to heightened transmissibility and severe respiratory complications.¹ Despite advancements in diagnostic tools like RT-PCR and rapid antigen tests, widespread implementation remains hindered by cost, accessibility, and logistical barriers, particularly in resource-limited regions² This gap underscores the urgent need for non-invasive, scalable, and cost-effective diagnostic alternatives to curb transmission and enable timely interventions.

Recent advancements in digital health technologies have highlighted the potential of acoustic analysis for disease detection. Respiratory infections, including COVID-19, often alter vocal fold dynamics, breathing patterns, and sound production, offering a unique opportunity to leverage audio signals as diagnostic biomarkers.³ Speech and cough analysis have been widely explored, with studies demonstrating the feasibility of machine learning models in detecting COVID-19 through vocalizations (e.g. vowel articulation, cough sounds). For instance, deep learning models analyzing cough recordings have achieved validation accuracies of 67–83%,⁴ while vocal fold vibration analysis via vowel vocalization has yielded ∼80% accuracy.⁵ However, these approaches often require complex speech tasks or extensive computational resources, limiting their practicality for real-world deployment.⁶

A critical gap in existing research lies in the underutilization of nasal breathing sounds—a passive, non-invasive signal that directly reflects upper respiratory tract physiology. Unlike speech or cough sounds, nasal breathing is effortless, making it ideal for rapid screening in diverse populations, including asymptomatic individuals.⁷ While studies have explored oral breathing and cough acoustics for COVID-19 detection,^89–10 nasal breathing remains underexamined despite its clinical relevance. Existing methodologies also face challenges such as low accuracy (<85% in some studies¹¹), reliance on high-dimensional datasets, and insufficient feature optimization, which can introduce noise and overfitting.¹²

To address these limitations, this study proposes a lightweight deep neural network (DNN) framework optimized for nasal breathing sound analysis. By integrating Mel-Frequency Cepstral Coefficients (MFCCs) with advanced feature selection techniques—random forest (RF) and principal component analysis (PCA)—we aim to reduce computational complexity while enhancing diagnostic accuracy.^13,14 Our approach leverages smartphone-recorded nasal breathing sounds from 128 Omicron patients, focusing on key acoustic features such as fundamental frequency, sound pressure level (SPL), and MFCCs^15,16 Through systematic dimensionality reduction and threefold cross-validation, we evaluate the robustness of our model against state-of-the-art methods, demonstrating its potential for scalable, real-world clinical applications.

The primary contributions of this work are threefold:

Novel Data Source: First large-scale exploration of nasal breathing sounds for COVID-19 detection, capturing upper airway pathophysiology.

Lightweight Architecture: A computationally efficient Dense-ReLU-Dropout DNN model optimized for mobile deployment.

Feature Engineering: Integration of RF and PCA to enhance model generalizability, achieving 97% accuracy with minimal feature redundancy.

The remainder of this paper is organized as follows: Section Materials and methods details the dataset collection, preprocessing, and feature extraction methodologies. Section Statistical results describes the lightweight DNN architecture and experimental setup. Results and comparative analyses are presented in Section Experiment results, followed by a discussion of clinical implications and limitations in Section Discussion. Finally, Section Conclusion concludes the study and outlines future research directions.

Materials and methods

Experiment design

Figure 1 shows our research methodology, which is divided into four stages: data collection, feature extraction, model training, and prediction.

Data Collection: Nasal breathing data from patients is collected using a smartphone, providing real-time respiratory samples for analysis.

Feature Extraction: Crucial features are selected from raw data through operations like dimensionality reduction and secondary feature selection using PCA and RF algorithms.

Model Training: Extracted features serve as input for training and evaluating a deep learning model in a threefold cross-validation setup to ensure robustness.

Prediction: The trained model is used for actual predictions, offering insights into patient conditions.

Figure 1.

The general flow of the study consists of four parts: data collection, feature extraction, model training, and prediction.

Dataset

Data collection

The prospective observational study was conducted over 1 year, from March 2021 to February 2022, at Shanghai Sixth People's Hospital and Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University. Nasal breathing sounds were collected from 128 participants (67 COVID-19 positive, 61 healthy controls) using a smartphone application in a controlled clinical environment. All participants provided written informed consent, and the study protocol was approved by the institutional Ethics Committee of Shanghai Sixth People's Hospital (Approval No. 2022-KY-050(K)). These real-time respiratory samples were then analyzed for acoustic feature extraction and disease detection.

Inclusion Criteria

Participants were included if they met the following criteria:

Aged 18–65 years;

Diagnosed with COVID-19 (Omicron variant), confirmed by PCR of rapid antigen test;

Able to provide informed consent and participate in this study;

Presenting symptoms of COVID-19, such as fever, cough, or difficulty breathing, at the time of data collection.

Exclusion Criteria:

Participants were excluded if they met any of the following criteria:

History of severe respiratory conditions such as asthma, chronic obstructive pulmonary disease (COPD), or pneumonia;

Pregnancy of breastfeeding;

Any known neurological disorders or hearing impairments that could affect the ability to participate in the study;

Use of medications that could affect respiratory function (e.g. sedatives or narcotics)

The dataset comprises standard audio recordings of nasal breathing sounds from 67 patients diagnosed with neoplastic disease, collected from the Shanghai Sixth People's Hospital. Each recording has an average duration of approximately six seconds. For the control group, nasal breath sounds were collected from 61 adults who tested negative for neoplastic pneumonia using the same recording procedure. Written consent was obtained from all participants, and the study was reviewed and approved by an ethics committee. Unlike other nasal breathing datasets, such as the COVID-19 dataset from Sonde Health (2020), no category in our dataset contains more than 25 individuals. This dataset provides crucial insights and supports research into the association between neoplastic disease and COVID-19 through nasal breathing sound characteristics. Additionally, it aids in the development of methods for early COVID-19 detection (see Table 1 for detail).

Table 1.

Number of patients enrolled in our study.

Type	Positive for neoplastic disease	Negative for neoplastic disease
Number of samples	67	61

Data preprocessing

The raw nasal breathing audio signals collected via smartphones are inherently one-dimensional (1D) time-series data, representing variations in SPLs over time. However, convolutional neural networks (CNNs) are traditionally designed to process two-dimensional (2D) data, such as images, where spatial hierarchies and local patterns are critical for feature extraction. To leverage the powerful pattern recognition capabilities of CNNs, we transformed the 1D audio signals into a 2D format.

This approach allows the CNN to capture local features in the data, which are essential for distinguishing COVID-19-related acoustic features from healthy controls. By reshaping the 1D audio data into 2D formats, we enable the CNN to apply convolutional filters across both time and frequency dimensions, enhancing its ability to detect subtle associated with respiratory abnormalities.

Statistical analysis

Feature selection

This study employed nine key acoustic features to provide a comprehensive analysis of the audio signal to achieve accurate results. These characteristics include voiced and unvoiced sounds, effective speech segments, fundamental frequency (F0), log energy, short-term energy, zero crossing rate, SPL and MFCCs (see Figure 2), with detailed explanations below:^17,18

Voiced Sounds: Sounds produced by vocal fold vibration, usually louder and of longer duration.

Unvoiced Sounds: Sounds that are not produced by vocal fold vibration and are usually quieter and shorter in duration.

Effective Speech Segments: The part of the speech signal that contains meaningful or relevant information.

Fundamental Frequency (F0): The lowest frequency of the periodic waveform, i.e. the frequency at which the vocal folds vibrate, related to the pitch of the voice.

Log energy: A measure of the total power or loudness of a speech signal, usually calculated over a short period and then logarithmically scaled.

Short-term energy: A measure of the change in energy or amplitude over a short period, which can be used to distinguish between voiced and unvoiced speech, or to detect the presence of speech.

Zero Crossing rate: The rate at which a signal changes from positive to negative or vice versa, often used to distinguish between voiced and unvoiced speech.

SPL: A logarithmic measure of the effective pressure of a sound relative to a reference value, used in acoustics to quantify sound levels.

MFCCs: A set of coefficients representing the short-time power spectrum of a sound, based on a linear cosine transformation of the logarithmic power spectrum on a nonlinear Mel-frequency scale. They are widely used in speech and speaker recognition because of their ability to represent key features of the voice.

Figure 2.

Mel-scale frequency cepstral coefficients map for nasal breath sound in tested adults for COVID-19 (left: positive; right: negative).

Each acoustic feature provides a unique and reliable perspective on voice signal analysis. In this study, these features were used for audio data analysis, offering comprehensive acoustic insights crucial for subsequent model training. The analysis revealed significant variations in the acoustic properties of nasal breathing between individuals with different COVID-19 test results.

Figure 2 demonstrates that the MFCC plots of COVID-19 positive patients exhibit significantly darker tones, indicating a lower frequency component in their nasal breathing sounds compared to healthy individuals. This variation may be linked to the impact of COVID-19 infection on the sound produced.

Figure 3 shows that the fundamental frequency (f0) curves of COVID-19 patients are significantly lower than those of healthy individuals, suggesting that their nasal breathing sounds are relatively low pitched. The stable frequency observed on the right-hand side, with no lower frequency in that period, supports the notion that 2k can be the fundamental frequency.

Figure 3.

Fundamental frequency curve for nasal breath sound in adults who tested for COVID-19 (left: positive; right: negative).

We extract statistical information such as extreme values, mean values, standard deviations, peak values, and skewness from a range of acoustic features. These metrics collectively provide a comprehensive quantification of the key characteristics of each feature. This process offers an analysis that is more profound and informative than that provided by a 2D image. Detailed statistical metrics selected can be seen in the Table 2:

Table 2.

Selected statistical metrics in this study.

Metrics	Meanings	Formulas
Mean value	Illustrating the central tendency	$\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$
Standard deviation	Indicating the feature's variability	$σ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}$
Extreme value	The maximum sound intensity	max $= max_{i} x_{i}$ min $= min_{i} x_{i}$
Skewness	Measures the asymmetry of the distribution	Skewness $= \frac{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{3}}{{(\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2})}^{\frac{3}{2}}}$
Kurtosis	Outlining the shape of the sound feature's distribution	Kurtosis $= \frac{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{4}}{{(\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2})}^{2}} - 3$
Range value	Span of breath sound data	Range $= max_{i} x_{i} - min_{i} x_{i}$

We extract statistical information such as extreme values, mean values, standard variances, and skewness from a range of acoustic features. These metrics collectively provide a comprehensive quantification of the key characteristics of each feature. This process offers an analysis that is more profound and informative than that provided by a 2D image. The extreme value outlines the sound feature's upper and lower boundaries. The mean value illustrates its central tendency. The standard variance is indicative of the feature's variability. The peak value points to the maximum sound intensity. The skewness measures the skewness of the distribution. The process of condensing data from 2D images into these vital statistical metrics helps minimizes redundant information. This step contributes significantly to making the model more streamlined.

Feature dimension reduction

In this subsection, we will introduce the methods we applied for feature dimension reduction. The model input the more redundant information involved.^192021–22

In traditional studies of COVID-19 detection utilizing acoustic features, the challenge of low accuracy often arises. To counter this, we explore critical comparisons of these acoustic features and employ a strategy designed to reduce model complexity, enhancing operational efficiency. Consider data from 2D images into these vital statistical metrics helps minimize redundant information and significantly contributes to making the model more streamlined.

In recent years, dimension reduction techniques have evolved significantly, with several new methods offering promising improvements over traditional approaches. In this study, we employed PCA and RF for the feature selection and dimensionality reduction. While newer methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and autoencoders have gained popularity in the field, recent studies still demonstrated the effectiveness of PCA in acoustic and sound-based analysis. For example, Wang et al.²³ show that PDCA remains a reliable method for extracting relevant features from high-dimensional audio data, especially when computational resources are constrained. Similarly, Chen et al.²⁴ found that RF-based feature selection, when combined with PCA, continues to yield high performance in diagnostic tasks using sound data, making these traditional techniques a strong choice for our study.

To maintain model efficiency, we adopted strategies including RF^25,26 and PCA^2728–29 for ranking and prioritizing the extracted features.

RF ranks and prioritizes features based on their importance scores, where the importance score for a feature $x_{i}$ is given by

I m p o r t a n c e (x_{i}) = \frac{1}{T} \sum_{i = 1}^{T} I_{t} (x_{i})

where $I_{t} (x_{i})$ indicates the importance of $x_{i}$ in tree t, and T is the total number of trees.

PCA reduces dimensionality by transforming the original features into a new set of uncorrelated features (principal components) that maximize variance. The principal components are given by:

Z = X W

where X is the original data matrix, and W is the matrix of eigenvectors of the covariance matrix of X.

Beyond the implementation of RF and PCA, we explored the correlation analysis between input features and labels. We selected the top eight features exhibiting the highest correlation coefficients to form a new subset of features. The correlation coefficient $ρ$ between feature x and label y is given by:

ρ_{x, y} = \frac{C o v (x, y)}{σ_{x} σ_{y}}

This approach ensures that the chosen attributes have a significant relationship with the target label, enhancing our model's diagnostic accuracy.

Network structure

To achieve a high-accuracy lightweight model suitable for rapid disease diagnosis on mobile devices, we developed architectures based on CNNs and DNNs. The input data specifications for both architectures are determined by the chosen feature extraction method, with all features used as input, as illustrated in Figure 4.

Figure 4.

Network architecture of our study. ReLU: Rectified linear unit. Sigmoid: Sigmoid activation function with ‘S’ shape. Pool: Max-pooling. FC: Fully connected layer.

The CNN architecture consists of a single convolutional layer with a kernel size of $1 \times 1$ , designed to increase the number of output channels. We set the number of output channels to 5 and selected the ReLU function as the activation function. This is followed by a $2 \times 2$ max-pooling layer and a dropout rate of 0.25 to reduce overfitting and enhance the model's generalization capability. Subsequently, there is a fully connected layer of size $39 \times 1$ with a dropout rate of 0.5, followed by a sigmoid activation function to obtain the final classification result in the linear layer.

The DNN architecture includes four fully connected layers. Nonlinear transformations between each layer are performed using the ReLU activation function, and dropout rates of 0.25 are applied between the second, third, and fourth fully connected layers to exclude some data. The final classification results are obtained using a sigmoid activation function in the linear layer.

Evaluation metrics

(Cross-Validation) To evaluate the model performance and ensure the generalizability of the results, we employed threefold cross-validation. The dataset was randomly split into three parts, with each part used as the validation set while the other two parts were used for training. This process was repeated three times, with each part of the dataset serving as the validation set once. The evaluation scores from all three folds were averaged to obtain the final performance metrics. This cross-validation method helps to mitigate overfitting and ensures that the model is evaluated on different subsets of the data, providing a more robust measure of its performance. Additionally, it ensures that every data point is used both for training and validation, improving the reliability of the model's performance estimation.

During this process, we did not use an additional split to create a separate validation set. Instead, one of the three parts of the dataset was designated as the validation set in each fold. This ensured that every data point was used both for training and validation, and helped provide a more robust evaluation of the model's performance while avoiding any data leakage.

Before each fold in the threefold cross-validation process, we first randomly shuffle the dataset to prevent potential distribution bias. For each fold, we ensure that there is no overlap between the training and test sets at the patient ID level to avoid data leakage. Additionally, PCA dimensionality reduction and RF feature selection is performed separately within each training set of each fold, preventing test set information from leaking into the feature engineering process and ensuring the independence of model evaluation.

(Model Performance) In addition, we use accuracy, F1 score, precision, and recall as the metrics to evaluate the performance of our model in the study. Their values are given by:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

P r e c i s i o n = \frac{T P}{T P + F P}

R e c a l l = \frac{T P}{T P + F N}

where

T P, F N, F P, F N

are given by confusion matrix.

Statistical results

Feature selection results

Using the selected statistical metrics, and employing the feature selection techniques, Table 3 demonstrates the feature extracted by RF (23 features) and PCA (27 features), respectively.

Table 3.

Results of feature extraction of random forest and principal component analysis.

Random forest selected features	PCA selected features
Fundamental Frequency max Value	Fundamental Frequency max Value
Fundamental Frequency mean Value	Fundamental Frequency mean Value
Fundamental Frequency skew Value	Fundamental Frequency skew Value
Log Energy min value	Log Energy min value
Log Energy mean value	Log Energy max value
Log Energy std value	Log Energy mean value
Short-term Energy min value	Short-term Energy min value
Short-term Energy mean value	Short-term Energy max value
Short-term Energy std value	Short-term Energy range value
Zero Crossing Rate max value	Short-term Energy mean value
Zero Crossing Rate range value	Short-term Energy std value
Zero Crossing Rate mean value	Zero Crossing Rate min value
Zero Crossing Rate std value	Zero Crossing Rate max value
Zero Crossing Rate kurt value	Zero Crossing Rate range value
Sound Pressure Level min value	Zero Crossing Rate std value
Sound Pressure Level mean value	Zero Crossing Rate skew value
Sound Pressure Level std value	Zero Crossing Rate kurt value
MFCC min value	Sound Pressure Level min value
MFCC max value	Sound Pressure Level max value
MFCC mean value	Sound Pressure Level mean value
MFCC std value	MFCC min value
MFCC skew	MFCC max value
MFCC kurt value	MFCC range value
	MFCC mean value
	MFCC std value
	MFCC skew value
	MFCC kurt value

MFCC: Mel-Frequency Cepstral Coefficient; PCA; principal component analysis.

The results of top 8 correlated features and corresponding correlation coefficients can be seen in the Figure 5. In the following study, we will utilize these to train lightweight deep learning-based model. The reduced dimension of the features can be seen in Table 4.

Figure 5.

Heatmap of top eight correlated coefficients from correlation analysis.

Table 4.

Feature dimension reduction results. The values represent the dimension of features after reduction.

Feature	Maximum	Minimum	Range	Std	Mean	Skew	Kurt
Voiced sound	1	1	1	1	1	0	0
Unvoiced sound	1	1	1	1	1	0	0
Effective speech segments	1	1	1	1	1	0	0
Fundamental frequency	1	1	1	1	1	1	1
Log energy	1	1	1	1	1	1	1
Short-term energy	1	1	1	1	1	1	1
Zero crossing rate	1	1	1	1	1	1	1
sound pressure level	1	1	1	1	1	1	1
MFCC	1	1	1	1	1	1	1

MFCC: Mel-Frequency Cepstral Coefficient.

By leveraging these highly correlated features, we construct more robust models for effective COVID-19 detection. This strategy underscores the importance of the feature selection process in building precise and efficient diagnostic models.

Despite the rise of newer dimension reduction methods, we opted for PCA and RF because of their simplicity, efficiency, and strong performance in our dataset, which consists of nasal breath sound recordings. Moreover, the trade-off between newer techniques and the computational cost was a critical factor in our choice. As demonstrated in recent literature, PCA remains effective in preserving the most important features while significantly reducing dimensionality, thereby improving model training efficiency without sacrificing accuracy.

Experiment results

The experimental results (Tables 5 and 6) demonstrated that the DNNs model achieved the highest accuracy of 97.67% on the full feature dataset. In comparison, the initial accuracy of the CNN model on the same dataset was 76%. Notably, the accuracy of the CNN model improved by nearly 10% after applying feature filtering techniques using RF and PCA. However, when the CNN model was applied to a dataset refined by correlation selection, its performance dropped by approximately 16% compared to the full feature dataset.

Table 5.

ROC accuracy for different models with different feature selection methods and results.

Model Acc	All 57 features	RF selected 23 features	PCA selected 27 features	Correlation selected 8 features
CNN	76.55%	84.29%	86.69%	60.90%
DNN	97.67%	96.28%	96.86%	88.31%

RF: random forest; PCA: principal component analysis; DNN: deep neural network; CNN: convolutional neural network.

Table 6.

F1 score for different models with different feature selection methods and results.

Model F1	All 57 features	RF selected 23 features	PCA selected 27 features	Correlation selected 8 features
CNN	70.53%	82.51%	86.71%	67.06%
DNN	98.43%	93.11%	93.63%	86.91%

RF: random forest; PCA: principal component analysis; DNN: deep neural network; CNN: convolutional neural network.

In contrast, the DNN model's accuracy only slightly decreased by about 1% despite the reduction of nearly 30 features through RF and PCA feature filtering. The corresponding F1 score reduction was about 5%. Even with nearly 50 features eliminated from the relevance-selected dataset, the DNN model still achieved an accuracy of 88% and an F1 score of 86%. This indicates that the DNN model maintained high performance across different feature subsets, demonstrating greater robustness and adaptability in managing acoustic features for COVID-19 diagnosis.

It is important to note that training models on the full feature dataset can introduce noise. This noise increases the likelihood of overfitting, where the model performs well on the training data but poorly on unseen data. The feature filtering techniques, such as RF and PCA, help mitigate this issue by removing irrelevant features, thus enhancing the model's generalizability and reducing overfitting risks. The experimental results confirm that the DNN model is more resilient to feature subset variations, maintaining high effectiveness and reliability for COVID-19 diagnosis.

Note: Due to computational constraints and dataset limitations, visualizations of the cross-validation and performance metrics were not included in this manuscript. However, we recognize the importance of such visualizations for enhancing the clarity of model performance and plan to incorporate them in future versions of this work.

Discussion

In this study, we introduced a lightweight classification model for COVID-19 detection using nasal breathing sounds, aiming to contribute to the growing field of non-invasive diagnostics. The key accomplishments of our study, the novel contributions, and the limitations are discussed in the following texts.

Our research successfully developed a highly accurate and efficient method for detecting COVID-19 through nasal breathing sounds recorded on smartphones. The classification model achieved an impressive accuracy rate of 97% and an F1 score of 98% when utilizing the full feature set. By employing DNN, our model consistently outperformed CNN, particularly with datasets containing comprehensive audio features. These results suggest the robustness and effectiveness of DNN in handling complex audio signal features, making it a viable tool for disease detection in real world.

The inclusion of patients with neoplastic disease in this study introduces potential biases, as their nasal breathing sound characteristics may differ from those of the general population, particularly in individuals without underlying health conditions. This could affect the model's ability to generalize across diverse populations. In future work, we plan to explore this potential bias more thoroughly and consider methods to mitigate its impact, such as testing the model on separate datasets or applying stratified sampling techniques.

A key innovation of our study lies in the use and processing of nasal breathing sounds as input data. Unlike general breath sounds, nasal breathing sounds are more reflective of the nasal cavity, nasopharynx, and upper airway characteristics, which are crucial in detecting COVID-19-related respiratory symptoms. This makes nasal breathing sounds a more targeted and informative source of data compared to broader, less specific breath sounds. Furthermore, our approach to feature selection focused on using nine key acoustic features, with a particular emphasis on voiced sounds, which are essential for capturing the dynamics of respiratory health and disease-related changes.

The method of dimensionality reduction using RF and PCA enhanced the representativeness of the audio features, thus improving model performance. Although we employed PCA and RF for feature dimensionality reduction and selection, dataset bias may still exist, such as the potential omission of certain high-dimensional features. Future research could further optimize this process by incorporating additional data augmentation strategies or integrating multiple dimensionality reduction techniques, such as t-SNE or Autoencoder.

The reduction in feature set from 57 to 23 and ultimately to 8 features resulted in the maintenance of high accuracy and F1 scores, demonstrating the effectiveness of both feature selection and dimensionality reduction strategies. These results align with recent studies on acoustic signal analysis, which highlight the importance of selecting relevant features and reducing dimensionality for improving the efficiency of machine learning models in health diagnostics.^3031–32

We observed that dimensionality reduction improved the performance of the CNN, enhancing its ability to classify COVID-19 positive and negative samples. However, for the DNN, performance was actually best when the full feature set was used without any dimensionality reduction. This suggests that while dimensionality reduction is useful for some models, it may not be beneficial for others, such as the DNN, where reducing the number of features resulted in a slight drop in accuracy.

Despite the promising results, our study has certain limitations that warrant further investigation. Firstly, while our study incorporated basic noise reduction techniques, more sophisticated methods are needed to address the noise interference commonly present in acoustic recordings, particularly in uncontrolled environments. Secondly, although our dataset contains 128 samples, expanding it to include more diverse and temporally varied samples would significantly improve the model's generalizability and robustness across different populations, environments, and COVID-19 variants. Thirdly, while we focused primarily on feature extraction and classification, future research should explore other machine learning models, ensemble methods, and hybrid architectures to enhance reliability and accuracy further. Finally, the real-world clinical application of our model requires validation through prospective studies to ensure its feasibility and utility in front-line healthcare settings, where the model could aid in rapid COVID-19 detection and assist healthcare professionals in decision-making.

Beyond COVID-19, the potential of our methodology extends to diagnosing other respiratory diseases such as influenza, bronchitis, and COPD. Similar to COVID-19, these diseases can cause characteristic changes in nasal breathing sounds and vocal fold vibrations, which can be captured and analyzed using similar acoustic features and machine learning models. Notably, studies such as²⁴ have demonstrated the utility of sound-based diagnostic methods in identifying anomalies related to respiratory diseases, showing how such models can provide early diagnosis and continuous monitoring in real-world applications. Future research should investigate the applicability of our approach across a broader spectrum of respiratory conditions, enhancing early detection, patient management, and the use of non-invasive, cost-effective methods in diverse healthcare settings. Integrating our approach with wearable devices or smartphone applications can enable continuous, real-time monitoring, significantly improving healthcare accessibility, especially in low-resource settings.

Conclusion

Our study highlights the potential of nasal breathing sounds as a reliable, non-invasive diagnostic tool for detecting diseases like COVID-19 using a novel lightweight Dense-ReLU-Dropout model. By integrating advanced feature selection techniques such as RF and PCA, we achieved 97% accuracy and a 98% F1 score, demonstrating the feasibility of smartphone-based rapid disease detection. This approach has broader applicability to other respiratory illnesses and mobile health technologies, offering a cost-effective solution for real-time diagnostics. Future work should focus on expanding the dataset through multi-center collaborations to enhance generalizability, validating the model across diverse clinical settings, and addressing challenges such as noise interference in real-world environments to ensure robust performance and seamless clinical integration.

Footnotes

Acknowledgements

Thanks to Hailin Ma, Han Lu, Jiayi Guo, and Rui Su for their support during this study. The authors wish to thank the anonymous referees for their thoughtful comments, which helped in the improvement of the presentation.

ORCID iDs

Jiayuan She

Lin Shi

Peiqi Li

Ziling Dong

Renxing Li

Shengkai Li

Liping Gu

Zhao Tong

Zhuochang Yang

Yajie Ji

Liang Feng

Jiangang Chen

Ethical considerations

This study was approved by the Ethics Committee of Shanghai Sixth People's Hospital Affiliated to Shanghai Jiaotong University School of Medicine (Approval Number: 2022-KY-050(K)) with the informed consent of all participants.

Author contributions

JS contributed to methodology, experiment, software, and writing—review and editing. PL and RL contributed to software, visualization, writing—original draft, review and editing. SL contributed to software and writing—review. LS contributed to software, validation, data curation, and project management. LG contributed to conceptualization, formal analysis, data curation, and visualization. ZT, RS, and ZY contributed to project administration and writing—review and editing. JC, LF, and YJ contributed to inspiration, conceptualization, writing—review and editing, and supervision.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Clinical Research Plan of SHDC, Scientific Development funds for Local Region from the Chinese Government in 2023, Shanghai Acupuncture and Moxibustion Clinical Medical Research Center, Science and Technology Commission of Shanghai Municipality, the Key Research & Development Project of Zhejiang Province, Jilin Province science and technology development plan project, National Natural Science Foundation of China, Shanghai Association of Traditional Chinese Medicine Program, Shanghai Committee of Science and Technology, China, 2022 ”Chunhui Plan” cooperative scientific research project of the Ministry of Education, (grant number SHDC2020CR4057, 16CR2026B, XZ202301YD0032C, 20MC1920500, 22DZ2229004, 22JC1403603, 21Y11902500, 2024C03240, 20230204094YY, 82151318,82004446, 2023-HPZY-04, (23Y11920900D, 22Y11923000, 20Y21901100).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The code and data used in this study are available upon reasonable request. Interested researchers may contact the corresponding author to obtain access.

References

Richards

Anderson

Carter

, et al. The impact of the COVID-19 pandemic on cancer care. Nat Cancer 2020; 1: 565–567.

Usman

Wajid

Zubair

, et al. On the possibility of using speech to detect COVID-19 symptoms: an overview and proof of concept. Department of Electrical Engineering, King Khalid University 2020.

Deshmukh

Al Ismail

Singh

. Interpreting glottal flow dynamics for detecting covid-19 from voice. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2021, pp.1055–1059. IEEE.

Al Ismail

Deshmukh

Singh

. Detection of covid-19 through the analysis of vocal fold oscillations. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2021, pp.1035–1039. IEEE.

Jeleniewska

Niebudek-Bogusz

Malinowski

, et al. Isolated severe dysphonia as a presentation of post-COVID-19 syndrome. Diagnostics 2022; 12: 1839.

Mari

Marchese

Longobardi

, et al. Voice telerehabilitation in iatrogenic unilateral vocal fold paralysis: from necessity to opportunity in the COVID-19 time. J Voice 2022; 38: 1525-e15.

Miranda

Diacon

Niesler

. A comparative study of features for acoustic cough detection using deep architectures. In: 2019 41^st Annual international conference of the IEEE engineering in medicine and biology society (EMBC), 2019, pp.2601–2605. IEEE.

Imran

Posokhova

Qureshi

, et al. Ai4covid-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inf Med Unlocked 2020; 20: 100378.

Khanzada

Hegde

Sreeram

, et al. Challenges and opportunities in deploying COVID-19 cough AI systems. J Voice 2021; 35: 811–812.

10.

Laguarta

Hueto

Subirana

. COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J Eng Med Biol 2020; 1: 275–281.

11.

Xiong

Berkovsky

Kaafar

, et al. Reliability of crowdsourced data and patient-reported outcome measures in cough-based COVID-19 screening. Sci Rep 2022; 12: 21990.

12.

Despotovic

Ismael

Cornil

, et al. Detection of COVID-19 from voice, cough and breathing patterns: dataset and preliminary results. Comput Biol Med 2021; 138: 104944.

13.

Abayomi-Alli

Damasevicius

Abbasi

, et al. Detection of COVID-19 from deep breathing sounds using sound spectrum with image augmentation and deep learning techniques. Electronics (Basel) 2022; 11: 2520.

14.

Benmalek

Elmhamdi

Jilbab

, et al. Automatic COVID-19 detection using machine learning and voice recording. Res Biomed Eng 2023; 39: 597–612.

15.

Bagad

Dalmia

Doshi

, et al. Cough against covid: evidence of covid-19 signature in cough sounds. arXiv preprint arXiv:2009.08790, 2020.

16.

Costantini

Robotti

Benazzo

, et al. Deep learning and machine learning-based voice analysis for the detection of COVID-19: a proposal and comparison of architectures. Knowl Based Syst 2022; 253: 109539.

17.

Arora

Kaur

. A bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl Soft Comput 2020; 86: 105936.

18.

Zhou

. Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 2016; 95: 1–11.

19.

Dewi

Chen

. Random forest and support vector machine on features selection for regression analysis. Int J Innov Comput Inf Control 2019; 15: 2027–2037.

20.

Malhi

Gao

. Pca-based feature selection scheme for machine defect classification. IEEE Trans Instrum Meas 2004; 53: 1517–1525.

21.

Song

Guo

Mei

. Feature selection using principal component analysis. In: 2010 international conference on system science, engineering design and manufacturing informatization, volume 1, 2010, pp.27–30. IEEE.

22.

Lella

Alphonse

PJA

. A literature review on COVID-19 disease diagnosis from respiratory sound data. arXiv preprint arXiv:2112.07670, 2021.

23.

Wang

Barone

Smith

. Current and future role of data fusion and machine learning in infrastructure health monitoring. Struct Infrastruct Eng 2024; 20: 1853–1882.

24.

Chen

Wang

Lin

, et al. A machine learning-based multiclass classification model for bee colony anomaly identification using an IoT-based audio monitoring system with an edge computing framework. Expert Syst Appl 2024; 255: 124898.

25.

Kannagi

Patil

Jain

. Data mining techniques in bioinformatics enhanced by HPC and AI convergence. In: 2024 IEEE 13th international conference on communication systems and network technologies (CSNT), Jabalpur, India, 2024, pp.972–979.

26.

Bania

Sarmah

. R-CovNet: hybridized model for automated classification of COVID-19 cases from chest X-ray images. Multimed Tools Appl 2024.

27.

Khadka

Zhang

Sharma

. Geographic object-based image analysis for landslide identification using machine learning on google earth engine. Environ Earth Sci 2025; 84: 92.

28.

Jalo

Heydarian

. A hybrid technique based on RF-PCA and ANN for detecting DDoS attacks IoT. InfoTech Spectrum Iraqi J Data Sci 2024; 1: 28–41.

29.

Palanikkumar

Maashi

Alsamri

, et al. Machine learning driven multi-hazard risk framework for coastal resilience. J South Am Earth Sci 2025; 152: 105331.

30.

Qureshi

Hussain

Rafique

, et al. EML-PSP: a novel ensemble machine learning-based physical security paradigm using cross-domain ultra-fused feature extraction with hybrid data augmentation scheme. Expert Syst Appl 2024; 243: 122863.

31.

Brown

Chauhan

Grammenos

, et al. Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp.3474–3484.

32.

Coppock

Nicholson

Kiskin

, et al. Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers. Nat Mach Intell 2024; 6: 229–242.