Abstract
Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, and heterogeneous interstitial lung disease with a median survival of 2–5 years. Though the diagnosis has been improved due to newly published guidelines, the recognition of the prognosis of IPF remains a challenge. Recently, several studies attempted to build prognostic models by extracting predictive variates from pulmonary function data, basic information, or chest computed tomography (CT) and CT-derived parameters with clinical characteristics. Artificial intelligence (AI) algorithms, including principal component analysis, support vector machine, random survival forest, and convolutional neural network, could be applied to the procedure of IPF prognostic model, that is, region of interest extraction, image feature selection, clinical feature selection, and model construction. Compared to human visualization, AI algorithms show a higher efficiency in calculating and extracting deep features and a lower inter-observer variation. Thus, this review provides a comprehensive CT evaluation of IPF prognostic models and discusses the role of AI in constructing IPF prognostic models. The potential improvements of AI in CT assessments, including time-series CT analysis, optimization of AI algorithms, utilization of multi-modal data, and discovery of new biomarkers through unsupervised algorithms, could be introduced to make a more accurate and convenient assessment for the prognosis of IPF patients. This review describes the status quo and future direction of AI applications in CT analysis for prognostic models of IPF.
Take home message
The review summarizes the applications of CT and AI algorithms for prognostic models in IPF and procedures of model construction. It reveals the current limitations and prospects of AI-aid models, and helps clinicians to recognize the AI algorithms and apply them to more clinical work.
Introduction
Idiopathic pulmonary fibrosis (IPF) is the most common fibrotic interstitial lung disease (ILD) with chronic, progressive, and heterogeneous features. While some IPF patients get progressively worse over the progression of the disease, others remain relatively stable. 1 The incidence of IPF ranges from 2 to 30 per 100,000 people and prevalence ranges from 10 to 60 cases per 100,000 people. 2 These figures are generally equivalent to 130,000 patients in America, 640,000 patients in East Asia, and roughly 3,000,000 worldwide.3–5 As it progresses, patients experience dyspnea and hypoxemia with poor quality of life and a high risk of developing complications such as acute exacerbations and pulmonary hypertension. Generally, the median survival time of IPF ranges from 2 to 5 years. 1 Factors related to a poor prognosis include older age, male gender, increased dyspnea, and worse physiological abnormalities.
Due to the limited recognition of this disease, early diagnosis and prevention remained crucial. IPF is usually diagnosed by identifying a radiological or histological pattern of usual interstitial pneumonia (UIP) without clear evidence caused by alternative diseases. 6 A primary challenge for clinicians in diagnosis remains the exclusion of other known causes, such as connective tissue disease-related ILD (CTD-ILD) and interstitial pneumonia with autoimmune feature. 7
The IPF guideline published in 2022 suggested that patients with high-resolution computed tomography (HRCT) of UIP and probable UIP patterns are not recommended to undergo lung biopsy, 8 indicating the great value of HRCT for IPF diagnosis. Also, a series of studies proved the value of chest computed tomography (CT) in IPF prognosis.9–12 This review provides a comprehensive CT evaluation of IPF prognostic models and discusses the role of artificial intelligence (AI) in constructing prognostic models.
Evaluation of CT in IPF prognostic model
As CT plays a vital role in predicting IPF prognosis, recent studies have tried to integrate it into models to improve their performance and accuracy. The clinical, radiological, and physiological (CRP) model by King et al. 13 combined the extracted CT features with clinical features and pulmonary function indices. Oda et al. 14 tried to quantify the extent of fibrosis in HRCT and proposed an ΔHRCT scoring system that measured the changes in the HRCT fibrosis score (FS) from baseline to follow-up. They assessed HRCT at baseline and subsequently after 6 and 12 months, respectively. Patients with higher elevated HRCT scores showed worse outcomes than those with relatively stable scores. Inspired by the Gender-Age-Physiology (GAP) model, Ley et al. 15 replaced the carbon monoxide diffusing capacity (DLCO) in the GAP model with the FS of radiographs, and proposed the CT-GAP model, which had a similar performance to the GAP model, suggesting FS could be a substitute for DLCO if the latter is unavailable.
In addition to traditional visual features, additional radiographic characteristics, such as the ratio of the diameter of the pulmonary artery to the aorta (PA:A), have been suggested as potential indicators for outcomes of IPF patients. 16 Yagi et al. 17 proved that PA:A and mean pulmonary artery pressure (mPAP) could be indicators for a worse prognosis with an area under the curve (AUC) of 0.75. Jacob et al. 12 proposed that vessels-related structures (VRSs) in CT-automated measures could predict IPF prognosis and aid researchers in reducing IPF drug trial sample sizes. Moreover, Nakagawa et al. 11 utilized the quantitative CT-derived honeycombing area, a vital feature of IPF, to predict mortality. The study by Loeh et al. 10 measured densitometry in CT assessments and found that densitometry-derived parameters were linked to patients’ pulmonary function and mortality. In addition, time-serial CT makes it possible for clinicians to monitor and quantify disease progression as Jacob et al. 9 calculated change in annualized VRS parameters, proved its weak associations with forced vital capacity (FVC), and considered it as a strong predictor of IPF prognosis.
Therefore, CT images are crucial in assessing and predicting IPF prognosis. Various visual features observed in CT scans provide valuable insights into the disease severity and serve as risk factors. Honeycombing and reticular opacity are two most critical features of IPF, as both indicate the extent and severity of fibrotic lung involvement. Other features, such as traction bronchiectasis, emphysema, and pulmonary artery hypertension, also contribute to the overall risk in IPF patients.
The visual assessment may not objectively and accurately calculate the area of specific patterns. The variation among observers impairs the subjectivity and introduces potential interference in the assessments. 18 Thus, applications of AI algorithms play a crucial role in automated identification, partitioning, and computer-assisted diagnosis in IPF prognostic model.
Applications of AI algorithms in the procedures of IPF prognostic model construction
To construct a prognostic model for IPF, the initial step entails the acquisition of a comprehensive dataset. A dataset of superior quality necessitates ample data, minimal noise interference, precise data cleaning, and accurate labels derived from representative samples. 19 Then, the dataset is partitioned into the training and testing set. The subsequent steps involve the evaluation of CT, which primarily includes image segmentation, feature extraction, and analysis. To conduct further research, it is common to segment the region of interest (ROI), which is highly associated with the prognosis, such as the fibrosis area in IPF. Through automated and semi-automated segmentation methods, AI can assist in effectively delineating ROI areas, such as U-net, nnU-net, and deeplab.20–23 These models are categorized under convolutional neural networks (CNNs), utilizing convolution to extract relevant features by applying small filters to local regions.
Radiomics is a method that obtains many quantitative features from an image, enabling the automated classification of medical images into predefined categories. 24 The radiomic features, different from clinical indicators, can uncover deeper patterns and characteristics of images and reduce the need for unnecessary biopsies.
As the number of extracted features through radiomics may be enormous, feature engineering and dimensionality reduction must be applied. Standard data dimensionality reduction algorithms include isometric mapping, principal component analysis, and locally linear embedding. These algorithms project data onto a lower-dimensional space while retaining as much information (like variance, local structure, and geodesic distances) as possible. In addition, commonly used clinical features, including pulmonary function, 25 routine blood tests, blood gas analysis, and tumor markers can be selected according to clinical expertise or machine learning algorithms. Due to differences in data characteristics, other algorithms such as k-means and support vector machine (SVM), may be taken to select predictive features for prognosis. Through constructing a hyperplane in high-dimensional or infinite-dimensional space, SVM separates different types of data. The k-means clustering principle is a widely used unsupervised machine learning algorithm that aims to partition a dataset into k distinct clusters. 26
Subsequent feature selection and model development are performed and then evaluated in the training set. There are traditional methods like Cox regression, Kaplan-Meier analysis, and newly developed AI algorithms such as random forest (RF), random survival forest (RSF), gradient boosting decision tree (GBDT), and artificial neural network (ANN). Different algorithms extract important features through various methods, such as using p values and hazard ratios in Cox regression, splitting of features at nodes in RF and RSF, feature contribution scores (the frequency or depth of feature used in building trees) in GBDTs, and weight adjustments (based on backpropagation and gradient descent) in ANNs. RSF combines the concepts of RFs with survival analysis, making it suitable for constructing prognostic models by analyzing time-to-event data commonly encountered in clinical practice (Figure 1). 27

The applications of AI in the prognostic model of IPF. The relationship between AI, machine learning, and deep learning is illustrated by the examples of different algorithms. Also, it shows the four procedures in the prognostic model of IPF and representative AI algorithms, in terms of region of interest extraction, image feature selection, clinical feature selection, and model construction.
Radiomics is widely applied in the prognostic models of IPF and analysis of CT images. Radiomics describes the geometric properties of fibrillated regions, first-order statistical features, and higher-order texture features. Budzikowski et al. 28 utilized differences in radiomic features from lung regions in CT scans of IPF patients to explore correlations between genetic variations and patient survival. Yang et al. 29 demonstrated that radiomic features extracted from pretreatment HRCT scans could forecast how patients with IPF would respond to antifibrotic treatment. Liang et al. 30 demonstrated radiomics model based on CT is capable of predicting lung cancer development in IPF patients. Refaee et al. 31 developed a classification model between IPF and non-IPF ILDs based on handcrafted radiomics and deep learning (DL).
AI-aided CT evaluation in IPF prognostic model
Compared with non-AI, AI presents a promising alternative to human readers with more objective and standardized assessments by precisely quantifying specific patterns and capturing deep features. Also, AI offers the potential for obtaining higher efficiency and scalability. For data collection, AI models allow people to gather high throughput and multi-modal data and automatically select highly predictive factors.
Combining quantitative CT analysis with AI algorithms, computer-aided lung informatics for pathology evaluation and ratings (CALIPER) is a quantitative CT analysis tool developed at Mayo Clinic Rochester. It can predict the prognosis of IPF patients more objectively and reproducibly than visual assessments. 32 However, integrating the composite physiologic index into the CALIPER-derived model didn’t improve performance, which demonstrates the value of HRCT in predicting prognosis. 33 Additionally, Romei et al. 34 applied CALIPER to evaluate radiologic progression in IPF, with a clear correlation between CALIPER-derived parameters and FVC changes. Beyond CALIPER, Ash et al. 35 developed a method depending on the histogram of lung density and the distance to the surface of the pleura. Later, Bak et al. 36 applied a texture-based automatic system to extract features from initial radiographs and proposed a formula that utilized an automatically derived FS and emphysema index to assess the extent of fibrotic and emphysematous area with a sensitivity of 0.71.
In addition to the fully automated method, there are also methods in terms of pre-established training sets. Lee et al. 37 established a CT quantitation system following six specific patterns pre-marked by radiologists. Shi et al. 38 constructed a model including FS, interval changes of FS, age, and desaturation, reaching a C-index of 0.768. They chose wrapper methods to select features and introduced quantum particle swarm optimization (QPSO) as an optimizer and RF as a classifier to propose the QPSO-RF algorithm. Unlike previous research, features extracted by the QPSO-RF algorithm may not be specific image patterns. Other researchers adopted the QPSO-RF algorithm to predict ROI as progressive or non-progressive with a sensitivity of 0.68 and specificity of 0.65 in original images. 39 Recently, Wu et al. 25 designed a lung segmentation network to assess the percentage of honeycombing area occupying the whole lung and proposed a computed tomography pulmonary function model to predict a 3-year survival rate in IPF. Besides, research based on computer-aided diagnosis (CADx) systems 40 and data-driven textural analysis (DTA) 41 also reveal satisfying outcomes in the training set and may be available in the future. The AI algorithms for model construction, radiograph assessment, and data collection show great potential in joint use with quantitative CT analysis.
Comparison of AI and non-AI applications in the IPF prognostic model
Through the detailed description and performance metrics (AUC, specificity, sensitivity) of both non-AI and AI prognosis models (Tables 1 and 2), we may observe that the AUC of these models ranges variably. Sensitivity is the proportion of true negatives among all actual negatives. Specificity is the proportion of actual positives. While, the AUC represents the area under the receiver operating characteristic curve, measuring the model’s ability to distinguish between classes.
Non-AI-aided assessments of IPF in different studies.
AIC, Akaike information criterion; AEx-IPF, acute exacerbation of idiopathic pulmonary fibrosis; AUC, area under the curve; CRP, clinical-radiographic-physiologic; CT, computed tomography; DLCO, carbon monoxide diffusing capacity; FVC, forced vital capacity; GAP, Gender-Age-Physiology; HRCT, high-resolution computed tomography; IPF, idiopathic pulmonary fibrosis; ROC, receiver operating characteristic.
: The training set and test set are not specified.
AI-aided assessments of IPF in different studies.
AUC, area under the curve; CADx, computer-aided diagnosis; CALIPER, computer-aided lung informatics for pathology evaluation and ratings; CPI, composite physiologic index; CT, computed tomography; CTPF, computed tomography pulmonary function; DTA, data-driven texture analysis; FSN, fibrosis segmentation network; FVC, forced vital capacity; GAP, Gender-Age-Physiology; HAA, high attenuation area; ILD%, interstitial lung disease%; IPF, idiopathic pulmonary fibrosis; PFT, pulmonary function test; PVRS%, pulmonary vessel-related structures%; QCT, quantitative computed tomography; QPSO-RF, quantum particle swarm optimization-random forest; SMOTE, synthetic minority over-sampling technique; SVM, support vector machine.
: The training set and test set are not specified.
Though we cannot conclude that AI algorithms significantly improve the performance of IPF prognostic models, they effectively improve the analysis procedure, enhance clinical efficiency, and are available for large-scale population research. Non-AI models mainly rely on traditional statistical methods, known clinical features, and direct analysis of radiological features. In contrast, AI models use machine learning algorithms to construct prognostic models based on massive clinical features, deep features, and radiomics after segmentation from HRCT. DL algorithms enable a comprehensive understanding of HRCT images and maximize the utilization of features, including texture analysis, pattern recognition, and volumetric analysis (Figure 2).

Comparison of AI and non-AI techniques in IPF prognostic model. Based on deep learning neural networks, AI technology can realize the automatic processing of data collection and CT assessment in the process of model construction. In contrast, non-AI technologies require human assistance and rely on human experience for evaluation. Also, AI-aided techniques may extract more detailed and deeper features for analysis and construct more effective prognostic models.
As mentioned above, recent studies attempt to combine CT images with other features, utilizing newly developed AI algorithms to extract features from CT.25,33,38,39 Now, with the help of AI, it is possible to quantify the lesion in CT with less time and better accuracy. 43 There are a few standard methods available such as density histogram,44,45 adaptive multiple features method (AMFM), CALIPER, 46 quantitative lung fibrosis (QLF), 40 functional respiratory imaging (FRI), 47 and DTA. 41 Density histogram and QLF may measure the extent of lung fibrosis through the Hounsfield unit scale. AMFM quantifies lung parenchymal patterns on CT and analyses lung texture. FRI may enable regional quantification of the lung, and DTA assesses the severity and progression of the disease.
In summary, with AI algorithms being widely applied in the prognosis of IPF and other diseases, we can fully utilize CT data, achieve objective quantification, improve evaluation efficiency, and obtain new features from large-scale data.
Limitations of AI algorithms in the research of IPF prognostic model
The IPF prognosis study faces limitations in terms of scalability, robustness, and application range of algorithms or models due to the scarcity of data sets (Figure 3). IPF is a relatively rare disease because of its low incidence rate, difficulty in diagnosis, and limitation in clinical cognition. Thus, limited data sources and single-center research obstruct the development of a universal and reliable model for the prognostic model of IPF. Relatively, the limited data resources derived various models as mentioned in the tables. Larger, more diverse datasets across multiple centers to train and validate AI models are highly needed. Additionally, concerns related to overfitting arise, where the model may demonstrate high accuracy on the training dataset but fail to generalize well to other datasets, leading to potential limitations in model robustness and applicability.48,49

The limitation and the prospect of AI in the IPF prognostic model. Despite some limitations such as limited data sources, various models, overfitting of the model, and lack of interpretability, the AI model has many potential application prospects in IPF prognosis for its time-series model, multi-modal data, interpretability, and ability to find new features, and these can be transformed into each other under certain conditions.
The interpretability of the AI model presents a significant challenge in terms of widespread application. In the medical field, there is still a lack of consensus regarding the definition of interpretability and standardized evaluation. AI may prioritize features based solely on accuracy in prognosis without considering their medical significance. As a result, the “black box” problem hinders its full integration into IPF clinical practice. 50
In machine learning, model training depends on labels obtained by medical experts. However, errors or oversights in the labeling or analysis process can amplify the mistakes made by AI models. Additionally, there are still concerns about legal responsibility and ethical issues. Though the AI model may make decisions based on a black box with relatively stable results, we should be cautious about its applications in the real world and be aware of ethics and algorithm discrimination.
Prospective of AI algorithms in IPF prognostic model
Though recent studies have made progress in the qualification of CT, there is no doubt that the potentiality of CT in predicting outcomes of IPF patients needs further exploration (Figure 3). Multi-modal data, including CT, pulmonary function, tumor markers, and blood gas analysis, are increasingly crucial to evaluating IPF patients. More than including varieties of new clinical features, massive information is hidden in CT to be explored. One potential direction is to introduce several CTs in a period rather than CT at baseline to estimate the progression of IPF and improve the prediction of prognostic models. By analyzing time-series CT images, models can better evaluate the progression of IPF, dynamically guide treatment options, and predict patients’ outcomes. Another direction is to explore new CT patterns and features through newly developed machine learning algorithms to predict patients’ outcomes. Aided by AI, CT analysis extends from visual designs such as honeycombing and reticular opacity to high-dimensional features. 26 Though features extracted by AI may be complicated for human readers to explain, the algorithm can automatically apply these features to mark and calculate lesion area. Besides, this approach can eliminate the inter-observer variation of CT evaluation and improve the accuracy of prognostic models. The interpretive approach method for IPF DL images mainly includes visualization of the lesion area and semantic case. Interpretable deep semantic CNNs produced interpretable lung cancer prediction and obtained significantly better results than the common 3D CNN method. 51 The success in other fields, such as multi-domain networks in pathology, 52 attention model, 53 and Grad-CAM (Gradient-weighted Class Activation Mapping) model, 54 encourages us to apply more technologies to the IPF field. Insights into highlighted regions by AI systems above can be provided through techniques like activation maps (the activation values of convolutional layers) and attention mechanisms (with higher numerical values indicating greater relevance), thus enabling clinicians to understand which image areas are crucial for the model’s decision-making or to cross-validate their interpretations.
The AI-aided clinical practice in IPF may be expected. The automatic analysis throughout the long-term management, the accurate and data-based prediction, and the full utilization of CT series may lower the management burden of IPF patients, and allow clinicians to effectively treat these patients. The progression of IPF may be intervened by corresponding measures such as combined antifibrotic medicines, pulmonary rehabilitation, and lung transplantation based on AI assessments.
Conclusion
The applications of CT in the prognostic model of IPF have significantly improved the model performance. AI optimizes the process of data extraction, model construction, and performance. Multi-modal data and AI algorithms will be applied to the prognosis model of IPF in the future, leading to an accurate, stable, and generic prognostic model for the better clinical management of IPF patients.
