Abstract
Background
The COVID-19 can cause long-term symptoms in the patients after they overcome the disease. Given that this disease mainly damages the respiratory system, these symptoms are often related with breathing problems that can be caused by an affected diaphragm. The diaphragmatic function can be assessed with imaging modalities like computerized tomography or chest X-ray. However, this process must be performed by expert clinicians with manual visual inspection. Moreover, during the pandemic, the clinicians were asked to prioritize the use of portable devices, preventing the risk of cross-contamination. Nevertheless, the captures of these devices are of a lower quality.
Objectives
The automatic quantification of the diaphragmatic function can determine the damage of COVID-19 on each patient and assess their evolution during the recovery period, a task that could also be complemented with the lung segmentation.
Methods
We propose a novel multi-task fully automatic methodology to simultaneously localize the position of the hemidiaphragms and to segment the lung boundaries with a convolutional architecture using portable chest X-ray images of COVID-19 patients. For that aim, the hemidiaphragms’ landmarks are located adapting the paradigm of heatmap regression.
Results
The methodology is exhaustively validated with four analyses, achieving an 82.31%
Conclusions
The results demonstrate that the model is able to perform both tasks simultaneously, being a helpful tool for clinicians despite the lower quality of the portable chest X-ray images.
Introduction
The COVID-19, caused by the virus severe acute respiratory syndrome coronavirus 2, is a multi-organic infectious disease that, therefore, can affect many parts of the body, but that mainly affects the lungs and their surroundings. 1 COVID-19 is an acute disease that can improve or worsen very quickly over time, where the patients can still experience symptoms once they test negative. Those symptoms can last for several days or weeks, as it happens with other pathologies such as the common flu, but they can also last indefinitely. When this happens, the patients are diagnosed with persistent post-COVID-19 syndrome (PPCS). 2 This syndrome can severely affect the life quality of the patients in some cases. In particular, given that COVID-19 can affect the surroundings of the lungs, the diaphragm could be damaged, an important muscle involved in the breathing process. 3 A dysfunctional diaphragm (i.e. weak or paralyzed diaphragm) can present several root causes apart from PPCS, such as stroke, nervous system diseases (for instance, multiple sclerosis or amyotrophic lateral sclerosis) or problems that affect the phrenic nerve. 4 Given the great importance of the diaphragm in the breathing process, a dysfunction can cause shortness of breath, sleeping disorders and fatigue, among other symptoms. The diaphragm dysfunction has been thoroughly studied in other common pathologies such as chronic obstructive pulmonary disease, as a reference.5–7 This dysfunction can be assessed determining the gap between both sides of the diaphragm, often known as hemidiaphragms. This distance can be useful to quantify the breathing capacity of the patient in a given timepoint and to understand its evolution through time. This process can be performed with imaging modalities such as lung ultrasound (LU), 8 chest X-ray 9 or computerized tomography (CT). 10 In particular, chest CT images provide a three-dimensional captures of the explored area with a great resolution and level of detail. However, this image modality is more expensive and difficult to perform. Given that the hardest peaks of the pandemic caused a saturation of the healthcare services, the preferred solution was chest X-ray imaging, as it is easier to manage when dealing with a great amount of patients in a small amount of time. In addition to these difficulties, during the pandemic it was necessary to prioritize the use of portable chest X-ray devices over fixed machinery. 11 This prioritization was motivated by the fact that portable chest X-ray devices are easier to decontaminate, a critical element to prevent the risk of cross-contamination. Moreover, many critical patients required to remain in bed due to their condition, being unable to move to the radiology room. In this sense, the portable devices can be moved to and used where the patient is placed, an aspect that solves the previously mentioned issue. Despite all these advantages, the main issue of using this kind of devices is the low quality and level of detail of the captured images. In this context, the development of computer-aided diagnosis (CAD) systems can be studied as an option to help the clinicians quantify the diaphragmatic function of a given patient.
Apart from the assessment of the diaphragmatic function, the segmentation of several structures of interest in the lungs using imaging modalities as chest X-ray is also a relevant task. The main aim of these tasks is to automatically obtain the region of interest (ROI) of the image, removing information that could introduce noise to the CAD system. This can help to improve the performance of other tasks such as COVID-19 screening and classification in chest X-ray images, scope that has seen a great amount of contributions since the pandemic began (as reference,12–19). In this context, some works have been proposed. As reference, the work from Aslan 20 proposes a method to diagnose COVID-19 in this kind of imaging modality using a DeepLabV3+ architecture for lung segmentation as a part of its pipeline. Other works use the U-Net architecture to perform the lung segmentation, as is the case of Rahman et al. 21 and Vidal et al. 22 In the latter case, the authors use a U-Net model that was pre-trained on a dataset of brain MRI images to segment the lungs in portable chest X-ray images. On the other hand, Alam et al. 23 proposed a modified U-Net architecture that replaces skip connections with bidirectional convolutional long short-term memory modules to perform the lung segmentation task. In particular, related with the diaphragmatic function, the segmentation of the lungs can be helpful to determine the boundaries and the position of this structure. Therefore, to simultaneously perform the localization of the hemidiaphragms’ landmarks and the lung segmentation can help the model to do both tasks more accurately. In this sense, the multi-task learning is a framework commonly used in the state-of-the-art, that exploits the advantages of training with two or more simultaneous tasks 24 . Given the impact that the COVID-19 has made in the last years, many works have explored different problems proposing multi-task paradigms, using chest X-ray images. As reference, the work of Park et al. 25 develops a methodology composed of a shared backbone based on a transformer encoder architecture and two different heads to perform the task of COVID-19 classification and the task of severity assessment simultaneously. In the case of Malhotra et al. 26 , the authors proposed a model called COMiT-Net. This model has a multi-task structure that simultaneously detects if an image presents COVID-19 affectation or not and shows the symptomatic regions with a semantic segmentation. Moreover, the application of multi-task learning has also been explored with datasets of CT images. As reference, the work of Polat 27 proposes the use of DeepLabV3+ to segment COVID-19 lesions. The purpose of using multi-task is to simultaneously segment those lesions with several levels of detail, ranging from a binary segmentation (distinguishing between lesion and no-lesion) to a more detailed semantic segmentation (distinguishing between different types of lesions such as consolidation or pleural effusion, etc.).
The characterization of the position and the movement of the diaphragm is an important task to assess the diaphragmatic function. The contributions in this specific scope can be mainly found in LU or CT images, but there is also some literature in the field of chest X-ray. As reference, Heidari et al. 28 proposed several preprocessing strategies to improve the performance of a convolutional neural network (CNN) trained to detect COVID-19 in chest X-ray. As part of these preprocessing strategies, the authors include a diaphragm removal using a plane threshold. Overall, it can be obtained that none of the works of the state-of-the-art have proposed a methodology to localize the hemidiaphragms’ landmarks with the potential to determine the gap between both structures of interest or related biomarkers. Related with this, it is remarkable that the localization of landmark points is a critical task in many computer vision problems, such is the case of face landmark detection. 29 This also applies to many biomedical imaging problems and modalities. 30 One straightforward approach for landmark detection using deep learning is an end-to-end paradigm that is fed with the input image and returns the coordinates of the detected points. However, this type of paradigms lose part of the strengths of the convolutional network architectures, as the local connectivity or the weight sharing. To exploit the full capability of this kind of architectures, many biomedical imaging works have presented the so-called heatmap regression as part of their pipeline. 31 As reference, Silva et al. 32 proposed an automatic pipeline composed of different steps to assess the severity of petum excavatum in CT images. The main target of this methodology is to perform several measures on relevant slices, given different landmarks. In particular, the authors consider the use of heatmap regression to detect those landmarks. In Kirnbauer et al. 33 , the authors develop a methodology to detect periapical lesions in cone-beam CT images. For this aim, it is necessary to obtain the coordinates of certain objects, such as the teeth. In this scenario, the heatmap regression method is used to predict the coordinates. Regarding the scope of retinographic imaging, several works have proposed the heatmap regression for tasks such as fovea localization or the localization of the center of the optic disc. As reference, this is the case of Meyer et al. 34 , Hervella et al. 35 , Al-Bander et al., 36 or Marin et al 37 .
To the best of our knowledge, none of the current state-of-the-art methods address the challenges of localizing the hemidiaphragms’ landmarks in portable chest X-ray images of COVID-19 patients. Portable chest X-ray imaging presents unique challenges compared to fixed imaging, including lower image quality and the potential for patient movement during the imaging process. Additionally, existing methods do not offer a solution for simultaneously localizing the hemidiaphragms’ landmarks and segmenting lungs in other pathologies.
To fill this gap in the literature, we propose a novel fully automatic deep learning methodology that employs a heatmap regression paradigm to simultaneously localize the hemidiaphragms’ landmarks and segment the lungs in chest X-ray images. The generator architecture of our method is based on a fully convolutional network that consists of an encoder and a decoder. The encoder extracts features from the input image, while the decoder generates the output heatmap for the localization of the hemidiaphragms’ landmarks and the binary mask for lung segmentation. We introduce an ensemble loss function that combines the dice loss for lung segmentation and the mean squared error (MSE) loss for localization of the hemidiaphragms’ landmarks to facilitate the learning of the generator.
To validate the feasibility and potential of our approach, we conduct an exhaustive study that includes four different analyses. Through these analyses, we demonstrate the effectiveness and potential of our proposed methodology in addressing the challenges of multi-task localization of the hemidiaphragms’ landmarks and the precise lung segmentation in portable chest X-ray images of COVID-19 patients.
Analysis I: Ablation study to find the most appropriate value of saturation distance for the heatmap regression. This analysis explores the impact of saturation distance on the performance of the model, as this parameter is crucial in heatmap regression. The study involves training and testing the model with different saturation distances and comparing the results to determine the optimal value that produces the best localization of the hemidiaphragms’ landmarks. In this analysis, we also include a statistical test to support the given discussions. Analysis II: Study of the optimal balance between both tasks with regard to the training process. In this analysis, we investigate the best approach to balance the training process of the two tasks: hemidiaphragms’ landmarks localization and precise lung segmentation. The study aims to determine the optimal ratio of weights assigned to each task during the training process, which results in the best performance of the overall system. Analysis III: Comparison of the performance obtained by each task separately with the performance achieved when carrying out both tasks simultaneously. This analysis is important to demonstrate the added value of the proposed complete system, as the simultaneous execution of both tasks enables a more efficient and accurate localization of the hemidiaphragms’ landmarks and precise segmentation of the lungs. Analysis IV: Qualitative discussion of the outputs returned by the system. The goal is to determine the robustness of the model in various scenarios, such as the presence of abnormalities in the chest X-ray images, variations in patient positioning, and the impact of the use of portable chest X-ray images. This analysis also allows for the identification of potential areas for improvement and future work. Moreover, in this analysis, we have also included the study of the activation maps provided by the model using the GradCAM algorithm.
The rest of the article is structured as follows. Firstly, the ‘Materials and methods’ section describes the used dataset (‘CHUAC dataset’ subsection), the overall steps of the methodology (“Methodology” subsection) and the different details of the training process (“Network architecture and training details” subsection). After that, the “Results and discussion” section details the results obtained after the experimentation was performed, with their corresponding discussion. Finally, the “Conclusions” section discusses the main conclusions extracted from the work development and some possible lines of future works.
Materials and methods
In this section, we detail the aspects of the used dataset in the “CHUAC Dataset” subsection and the used software and hardware resources in the “Software and hardware resources” subsection. Moreover, the description of the methodology is shown in the “Methodology” subsection and the details of the training process are described in the “Network architecture and training details” subsection, with special focus on those aspects that differentiate each task and the particular needs for the multi-task paradigm. Finally, the used evaluation metrics are explained in the “Evaluation metrics” subsection.
CHUAC dataset
In this work, we have used a dataset of portable chest X-ray images provided by the Complexo Hospitalario Universtario de A Coruña (CHUAC), specifically designed for the purposes of this work. The dataset is exclusively composed of COVID-19 patients, making a total of 673 images, that were captured during the first peak of the pandemic in 2020. The dataset was manually labeled, including the manual segmentation of both lungs and the position of the 2 hemidiaphragms’ landmarks, having a total of 1346 labeled lungs. The images were obtained with 2 different portable machines, whose models are Agfa dr100E and Optima Rx200. Due to the previously-mentioned risk of cross-contamination, the captures were performed in isolated medical wings specifically intended to treat COVID-19 patients. The subjects were captured in supine position with an anterior-posterior projection. To do so, the device has a flexible arm with the X-ray tube, that can be placed over the patient. Then, a recorder plate placed under the patient is responsible for obtaining the capture. The resolution of the images ranges from
The current study was approved by the corresponding ethics committee with the code 2020-007. In order to comply with the ethics requirements, all the patients were conveniently anonymized before being sent to any external collaborator. Moreover, all the images were securely stored in appropriate private servers that restricted the access to only the members of the project. All the processes were performed following a protocol agreement with the hospital board. It is important to note that all the cases were visually inspected by the CHUAC staff, to find evidences of COVID-19 affectation. This visualization was corroborated with an RT-PCR test. Some representative examples of the dataset can be seen in Figure 1.

Examples of the CHUAC dataset. First column: original images. Second column: ground truth of the lung regions. Third column: ground truth of the location of the hemidiaphragms’ points.
The used dataset represents a Western countries’ population, more precisely, a subset of the Spanish population located at Galicia. The studied cohort includes a set of patients with a mean age of
To end this section, it is remarkable that, despite using a dataset of portable chest X-ray images (and, therefore, with a lower quality, level of detail and usually with a notable presence of artifacts) we have developed this methodology given that previous state-of-the-art works have demonstrated a great capability handling this kind of input despite the mentioned issues.41,42
Software and hardware resources
The implementation of the methodology herein presented was done using Python 3 (Version 3.8.10). Firstly, for this implementation, it was necessary to use several libraries that are described in Table 1. The main framework chosen for this work was the library torch alongside torchvision, that enables to train and validate computer vision systems using deep learning models. Both libraries were configured with CUDA support, allowing to speed up the training and inference processes with hardware acceleration. Moreover, it was also necessary to add some functionalities from other computer vision and imaging libraries: opencv and scikit-image. In the same line, other functionalities were required from scikit-learn, a machine learning library, to obtain the evaluation metrics. Furthermore, the library pandas was used to work with CSV files and numpy enabled to work with arrays in Python. Secondly, we also specify the characteristics of the used hardware in Table 2. In particular, the experimentation of this work was performed using an NVIDIA Tesla A100 with 2 GPUs of 80 GB each and the driver Version 460.106.00.
Software libraries and versions used to implement the methodology presented in this work.
Hardware resources that were used to execute the implementation of the methodology.
Methodology
In this work, we propose the novel multi-task paradigm that is depicted in Figure 2. This paradigm simultaneously performs task I of heatmap regression (from which the landmarks of the hemidiaphragms are later localized) and task II of precise lung segmentation. For this methodology, it is necessary to adapt the network architecture and propose a loss function for each task. All these aspects are deeply detailed in this section.

Description of the different tasks performed in the methodology, being heatmap regression task I (that is then used for the detection of the points of the hemidiaphragms) and precise lung segmentation task II.
Heatmap regression (localization of the landmarks of the hemidiaphragms). Given the coordinates of an arbitrary landmark, the heatmap is computed as follows. Initially, it is necessary to compute the distance between the coordinates of each pixel in the image with the coordinates of the target landmark. To compute the distance, several metrics such as Minkowski
43
, Mahalanonbis
44
or cosine similarity distance
45
can be used. In particular, for the methodology herein proposed, we have adapted the paradigm proposed by Hervella et al.
35
, as we considered that it was the most closely related to our proposal. This paradigm contemplates the use of the Euclidean distance, a commonly employed approach in regression problems, as it is a recognized method in the state-of-the-art both for medical imaging and other domains46–48. Consequently, the heatmap will be obtained computing the Euclidean distance between each point of the image and the target. The Euclidean distance is calculated as expressed in equation (1):
Nevertheless, the issue of using the expression of the Euclidean distance is that it can give an excessive importance to the distant pixels, an aspect that could lead the model to have a worse performance. To avoid this effect, an exponential decay is applied. In this way, the closest pixels will be given a great importance, while this value will saturate for the most distant pixels. Then, the heatmap will be calculated following the formula stated in equation (2):

Examples of how the saturation distance
Finally, the model will be trained comparing the ground truth with the predicted output using the expression of the MSE loss. This predicted output is an image with the same resolution as the input, given the architectural design that is being used, where the intensity value of each pixel represents its probability of being the actual target point.
Therefore, denoting the loss of the obtained heatmap regression to localize the hemidiaphragms’ landmarks as
Lung segmentation. The segmentation loss, denoted as
For the multi-task learning, it is necessary to define a joint expression that merges the losses of the two proposed tasks. The main aim of this expression is to balance the importance that is given to each task. This balance becomes more important due to the fact that the loss of each task can range in a different magnitude. For that reason, two weight values must be defined,
Network architecture and training details
Regarding the used network architecture, we adapt the original U-Net structure
49
given its suitability for medical imaging tasks. This architecture is detailed in Figure 4. The U-Net has an encoder–decoder structure. The encoder part has four downsampling blocks (blocks 1–4 in the diagram), while the decoder is composed of four upsampling blocks (blocks 6–9 in the diagram). The aim of block 5 is to join both parts. Each downsampling block has two convolutional layers (with a kernel of size

Diagram of the U-Net architecture, adapted for the multi-task paradigm proposed in this work. This architecture is composed of 10 different parts, with four downsampling blocks (encoder), four upsampling blocks (decoder), a block that joins the encoder with the decoder and a final 10th block with a head for each performed task.
The pipeline of the training process for each task was inspired in previous similar works50–52. In particular, it was performed during 200 epochs, with a learning rate of
Finally, in order to provide explainability in our study, we have also included a qualitative evaluation of the model with the activation maps of the model. More precisely, we have considered the gradient-weighted class activation mapping algorithm (Grad-CAM or GradCAM) 54 . Furthermore, it is important to note that, in this study, we report the activation map obtained at the output of the fourth encoder blocks.
Evaluation metrics
To analyze the capabilities of the trained models, we use the metrics that are usually considered in the state-of-the-art. Given that the nature of the two performed tasks is different, the considered evaluation metrics will be particular for each case. With regard to task I, the localization of the landmarks of the hemidiaphragms is evaluated using a similar approach as the one proposed by Marin et al. 37 . Particularly, the accuracy will be measured as the number of points that fall below a threshold divided by the total number of points. To make a more exhaustive analysis, this threshold will be progressive.
In the case of lung segmentation, we considered the same metrics as by Vidal et al.
22
: area under the ROC curve (AUC-ROC), accuracy, precision, recall (heavily used metrics in previous biomedical studies55,56), Dice coefficient, and Jaccard index. Denoting TP as the true positives, TN as the true negatives, FP as the false positives, FN as the false negatives,
Results and discussion
In this section, we present the obtained results of the different designed experimentation. In particular, we performed four different analyses. The first analysis aims to find the most satisfactory saturation distance value (
Analysis I: Ablation study to find the most satisfactory saturation distance value (
)
In this analysis, we study the impact of the saturation distance (
Results of the analysis I, showing the performance of the localization of the hemidiaphragms’ landmarks given the saturation distance
From
The model keeps improving for
Overall, the results presented in this analysis demonstrate that the saturation distance
Apart from the presented results, this analysis also includes a statistical test. To that end, we have used the method of Wilcoxon signed-rank. This kind of statistical test determines if two series of data follow the same distribution (where p-value is >0.05 and, therefore, the null hypothesis is accepted) or if it exists a statistically significant difference (where p-value is smaller or equal to 0.05 and, therefore, the null hypothesis is rejected). To evaluate the ablation study presented in this analysis, the experiments have been grouped by their value of
Results obtained from performing the Wilcoxon signed-rank test comparing the distance errors among the different
For each point in the matrix, the symbol
Analysis II: Analysis of the balance between tasks in the multi-task paradigm
Given that the performed tasks are closely related but different, it is critical to balance the contribution that each one brings to the loss. Setting up the weights of the losses is also important as each one could range in a different magnitude. For that reason, we have designed an exhaustive analysis to determine the most balanced configuration. Totally, we have performed eight different experiments, each one with a particular balance of the loss components. In particular, each experiment progressively gives more weight to lung segmentation while giving less importance to the heatmap regression. Denoting
Firstly, Table 5 shows the results of the localization of the hemidiaphragms’ landmarks given the different configurations of loss weights. There, it can be seen that, given the two less restrictive thresholds (R and R/2), the mean combined accuracy is at least 99.03% for all the cases. It is interesting to remark this aspect in the case of
Results of the analysis II regarding the localization of the hemidiaphragms’ landmarks in the multi-task paradigm, showing the performance of this task given different combinations of loss weights. The highest performing configuration for each threshold is highlighted in bold.
With regard to the most restrictive thresholds (R/10 and R/5) the differences in accuracy are more noticeable. The combined mean accuracy improves from 60.82% to 76.72% with
With regard to the task of precise lung segmentation, the global results of this ablation study in terms of Dice score can be seen in Table 6. Overall, it can be seen that all configurations are appropriate for the lung segmentation task, given that the lowest value of Dice is 0.9665
Results of the analysis II regarding the task of precise lung segmentation given the loss weights of each task in the multi-task paradigm. The results of the highest performing configuration are highlighted in bold.
Analysis III: Comparison between the performance of the tasks separately and simultaneously
For this third analysis, we compare the performance of the tasks (hemidiaphragms’ landmarks localization and precise lung segmentation) when they are performed individually and when they are complemented reciprocally. To do so, we consider the best performing configuration for the heatmap regression task,

Training and validation losses evolution for the multi-task paradigm. The losses are shown with a logarithmic scale to improve their visualization. (a) Evolution of the joint loss, (b) evolution of the loss for the heatmap regression task, and (c) evolution of the loss for the segmentation task.
On the other hand, the evolution of the losses for the heatmap regression when it is performed individually can be seen in Figure 6. This evolution shows an improvement of the training and validation losses until around epoch 75, where the validation starts to stabilize. Moreover, the comparison of the performance obtained when the task is carried out individually and when it is complemented with the precise lung segmentation is shown in Table 7. There, it can be seen that the accuracy values are closely similar. In particular, when the tasks complement each other, the performance is slightly worse in terms of combined mean accuracy given the thresholds R and R/10. However, there is a slight performance improvement considering the thresholds R/2 and R/5.

Evolution of the loss during the learning process for the heatmap regression given a saturation distance of
Comparison of the results obtained when the localization of the hemidiaphragms’ landmarks is performed individually and when it is complemented with the precise lung segmentation given the most appropriate configuration of the balance between tasks. The highest performances for each threshold are highlighted in bold.
With regard to the precise lung segmentation, firstly, the evolution of the training and validation losses can be seen in Figure 7. There, it can be seen that the validation loss keeps improving until it reaches the stability around epoch 50. On the other hand, the comparison of the performance when the task is conducted individually and when it is complemented with the heatmap regression is shown in Table 8. From there, it can be extracted that the multi-task paradigm obtains a competitive performance, with close values in all metrics and a slight improvement in terms of AUC, with a raise from 97.66%

Evolution of the training and validation losses for the task of precise lung segmentation. The values are shown in logarithmic scale to improve their visualization.
Comparison of the results obtained in the task of precise lung segmentation when it is performed individually and when it is complemented with the heatmap regression, using the most suitable configuration of the balance between tasks. The best performing results are highlighted in bold.
Analysis IV: Qualitative analysis of the obtained results
In this last analysis, we study the obtained results under a qualitative point of view. For this specific analysis, we have selected the best model that was chosen in the analysis III given its suitability for the localization of the hemidiaphragms’ landmarks:

Examples of the multi-task paradigm proposed in this work. Each column represents a particular example. First row: input images that are fed to the model. Second row: precise lung segmentation results. Third row: results of the heatmap regression. Fourth row: simultaneous segmentation and the results of the localization of the hemidiaphragms’ landmarks. The blue point denotes the ground truth and the red point denotes the point predicted by the model.
Regarding the evaluation of the obtained activation maps, some representative examples can be seen in Figure 9. There, it is shown that the model gives a greater activation to those regions of the image located around the lower part of the lungs, with a noticeable (but much lower) level of activation in the contours of the lungs. This defines the main priorities of the model, that gives more importance to locate the points of the hemidiaphragms rather than the contour of the lungs (very relevant for the lung segmentation).

Representative activation maps of input chest X-ray images using GradCAM, where the redder tones represent the activations with a higher intensity and the darker blue tones represent the activations with a smaller intensity.
Comparison with the state-of-the-art
Regarding the comparison with other works, it is important to clarify some important challenges that must be faced. Our contribution provides a unique, fully automatic multi-task approach that simultaneously identifies the location of the hemidiaphragms and achieves a precise lung segmentation. However, despite performing both tasks, the primary innovation of our work resides in the detection of the hemidiaphragms’ landmarks, while the lung segmentation serves as an auxiliary task. Given the pioneering nature of our work, a direct comparison with existing methods presents some important challenges. This is primarily due to the lack of datasets that incorporate manual ground truth labels that fit to the specific tasks that are performed in this work. Nevertheless, a comparison with other previous lung segmentation methods can be performed. In particular, our work achieves a dice score of 0.9688 (complementing the task of lung segmentation with the heatmap regression), while the work of Vidal et al. 22 , that uses portable chest X-ray images from a similar dataset as the one considered in this work, reports a dice score of 0.9447 using. Moreover, other works with different public datasets of images obtained with fixed chest X-ray devices achieve a dice score of 0.9421 21 and around 0.9500 23 . As it can be seen, the comparison indicates that our results are consistent with the metrics reported by established research, and even slightly higher than other similar approaches. Once again, it is necessary to point out that this comparison has been in different conditions, with different datasets and experimental schemes and, despite trying to make the fairest comparison possible, it must be taken cautiously. On the other side, regarding the localization of the hemidiaphragms’ landmarks, the unique premise of our research means that there are no directly comparable studies in the existing literature.
Limitations of the study
In this study, there are several areas where we can point out some limitations. Firstly, the used dataset is representative of a very particular demographic group. This makes it necessary to perform small adaptions to the methodology in case that another different dataset to study has very different characteristics from those presented in our study. Secondly, regarding the clinical relevance, despite that this automatic computational methodology was developed together with the clinical professionals to whom it is directed, there are still some points that must be analyzed. This is necessary to ensure that the manual, tedious, time-consuming and error-prone process that the clinicians are currently followed is properly adopted in the daily practice. In the presented work, aspects like the user-friendliness and other elements that could suppose a barrier for adoption, are left undiscussed. Moreover, the potential final clinical applications like the diaphragmatic function quantification or the extraction of clinically relevant biomarkers from lungs are left unexplored in this work. Finally, despite the exploration of different error metrics for the hemidiaphragm points’ localization is interesting, this work has only evaluated the performance obtained with the Euclidean distance, the one that was found to be the naive approach.
Conclusions
In this work, we have proposed a novel fully automatic methodology to simultaneously predict the location of representative hemidiaphragms’ landmarks and precisely segment the lungs in portable chest X-ray images from COVID-19 patients following a multi-task paradigm. The prediction of the hemidiaphragms’ landmarks location was performed supported by the so-called heatmap regression, a method to predict the likelihood for an arbitrary pixel of the image to be the actual target point. The precise lung segmentation was developed following an end-to-end fashion. For the aims of this work, the U-Net architecture was adapted including two output heads, one corresponding to heatmap regression and another corresponding to precise lung segmentation. To study the suitability of this method, four different analyses were performed. The first analysis aimed to study the most appropriate saturation distance value for the heatmap regression (that directly corresponds with the heatmap extension). The second analysis was conducted to study the most appropriate balance between the two tasks regarding the multi-task paradigm. In the case of the third analysis, we present a comparison between the results obtained in each task individually (i.e. heatmap regression and lung segmentation independently, without the additional contribution of the other task) and the scenario when the tasks complement each other. Finally, in the fourth analysis, we studied the outputs obtained by the model under a qualitative point of view. The results obtained in this work demonstrate the feasibility to localize the landmarks of the hemidiaphragms and the regions of interest of the lungs in chest X-ray images, that can improve the performance of other tasks, such as automatic screening. It is remarkable that this high performance has been obtained despite feeding the system with portable chest X-ray images, that provide a lower quality and level of detail in contrast with fixed machinery and that present a great variability with regard to the position of the patients, given that they can be placed less precisely. To the best of our knowledge, this is the first work that simultaneously performs both the localization of the hemidiaphragms’ landmarks and the precise lung segmentation using a CNN architecture.
The proposed methodology has a great potential in the clinical context, as it could help to perform relevant analyses in the field of COVID-19 and other pulmonary pathologies, given the importance of evaluating the diaphragm and other relevant parts of the lung anatomy such as the parenchymal tissue in some pulmonary diseases to measure the extent of disease damage. Thanks to the methodology, the clinicians could rapidly and accurately be assisted when dealing with great patient populations. In particular, the methodology could be used, among other tasks, to quantify the diaphragmatic function and to extract relevant biomarkers indicative of pathological scenarios. In the same line, the results herein presented, despite being mainly intended to post-COVID-19 studies could also be taken as reference to perform similar studies with other pathologies or even different medical imaging modalities and devices. Other possible area of future work exploration is the evaluation of the performance using alternative distance error metrics different from the Euclidean distance, such as the Minkowski, Mahalanobis, or cosine similarity distance.
Footnotes
Acknowledgements
The authors express their gratitude to all the institutions that have provided support for this endeavor and are detailed in the funding section.
Contributorship
DIM was involved in methodology, software, validation, writing–original draft, and visualization. JM contributed to methodology, validation, and writing–review and editing, visualization and supervision. SA was involved in review and editing. JJ took part in review and editing and supervision. JN contributed to conceptualization, validation, writing–review and editing, and supervision. MO took part in conceptualization, supervision, project administration, and funding acquisition.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article: JJ reports fees from Boehringer Ingelheim, Roche, NHSX, Takeda, and GlaxoSmithKline unrelated to the submitted work. JJ, was supported by the Wellcome Trust [209553/Z/17/Z] and by the NIHR UCLH Biomedical Research Centre, UK.
Ethics approval and consent to participate
The presented study was approved by the local ethics committee of the “Sistema Público de Saúde de Galicia” (approval number 2020-007). Informed consent to participate was obtained from all participants.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Ministerio de Ciencia e Innovación, Government of Spain through the research project with [grant numbers PID2019-108435RB-I00, TED2021-131201B-I00, and PDC2022-133132-I00]; Consellería de Educación, Universidade, e Formación Profesional, Xunta de Galicia, Grupos de Referencia Competitiva, [grant number ED431C 2020/24], predoctoral grant [grant number ED481A 2021/196]; CITIC, Centro de Investigación de Galicia [grant number ED431G 2019/01], receives financial support from Consellería de Educación, Universidade e Formación Profesional, Xunta de Galicia, through the ERDF (80%) and Secretaría Xeral de Universidades (20%). This research was funded in whole or in part by the Wellcome Trust [209553/Z/17/Z]. For the purpose of open access, the author has applied a CC-BY public copyright licence to any author accepted manuscript version arising from this submission.
Guarantor
MO
