Abstract
Introduction
RBF neural networks are widely used in gastric carcinoma prognostic models, but they face challenges including difficulty in determining the Gaussian radial basis function parameters of the hidden layer and the diversity/ambiguity of factors affecting gastric carcinoma prognosis. The cloud model, a key tool in uncertainty theory, is adept at handling fuzziness and randomness of complex medical data by quantifying uncertainty. This study integrates the cloud model with RBF neural networks to address the aforementioned limitations.
Methods
The study included 11,474 gastric carcinoma patients from the SEER database and 769 from the Linzhou Centre for Disease Control and Prevention database. A new model combining a cloud model with RBF neural networks was used, where high-dimensional cloud transformations identified RBF hidden layer neurons to optimize the network structure.
Results
Comparison with conventional methods showed that the new model predicted overall survival (OS) with a C-index of 0.715. This value is not only significantly higher than that of clinical standard TNM staging (0.591) but also outperforms machine learning methods including random forest (0.614) and traditional RBF neural networks (0.632). It achieves excellent prognostic accuracy meeting the clinical criterion of good discriminative ability, even relying solely on simple clinical factors, which enhances its clinical applicability.
Conclusion
The model is a new and effective prognostic model that provides better and more accurate prognostic assessment for gastric carcinoma patients.
Introduction
Gastric carcinoma is the fifth most common cancer worldwide, and the histological types are mainly classified as adenocarcinoma, Indron cell carcinoma, adenosquamous carcinoma, medullary carcinoma, and undifferentiated cell carcinoma, with the fifth highest mortality rate worldwide.1,2 Most new cases in the World occur in less developed regions. East Asia is the worst hit by gastric carcinoma globally, and, again, the most serious cases occur in China, where gastric carcinoma ranks third in mortality from malignant tumors. Gastric carcinoma is characterized by high aggressiveness and poor prognosis. Despite comprehensive treatment with surgery, radiotherapy, and chemotherapy, the 5-year survival rate of patients is still less than 10%. The incidence of gastric carcinoma is significantly associated with pickles, pickled and smoked foods, moldy foods, and excessive salt intake, in addition to volcanic rocky areas, high peat soils, and soils with excessive nitrates; imbalance in the ratio of trace elements or chemical pollution can directly or indirectly increase the risk of developing gastric carcinoma. Experts across the globe have conducted a significant amount of in-depth research on prognostic models and algorithms to improve the accuracy of gastric carcinoma prognosis, and have proposed many practical methods.3–5 Among them, the most widely used are artificial neural networks (ANNs), random forests (RFs), and support vector machines (SVMs). Compared with RFs and SVMs, radial-basis-function (RBF) neural networks have a very strong nonlinear mapping capability.6,7 They are able to acquire nonlinear mapping capability through a self-learning function and store this capability in the connection weights of the network in a distributed manner. Therefore, RBF neural networks are very suitable for the prognosis of gastric carcinoma patients.
However, it is very difficult to determine the number of hidden-layer nodes as well as the cener and width of the hidden layer when training RBF networks. In addition, the factors affecting the prognosis of gastric carcinoma are usually diverse, complex, and uncertain; known risk factors for gastric carcinoma include smoking and alcohol consumption, while fruit and vegetable intake has a high probability of preventing gastric carcinoma.8–10 After a careful study of the cloud model, we found that the normal cloud is very similar to the Gaussian RBF of the RBF network. We imagine that the problem of determining the center and width of the RBF network can be converted into the problem of determining the parameters of the normal cloud using the normal cloud instead of the Gaussian RBF, and that the fuzziness and randomness can be combined to improve the RBF neural network algorithm.
In this study, we explore the risk factors related to patient prognosis based on the Surveillance, Epidemiology and End Results (SEER) database, perform cloud transformation on each influencing factor, apply the inverse cloud generator algorithm to find the corresponding normal cloud, calculate the cloud parameters after the factors are combined and then use the high-dimensional cloud algorithm to find the number of intermediate hidden-layer nodes and specific parameters, determine the network structure, and finally use the new network to prognosticate gastric carcinoma patients, thus providing help for clinical practice.
Cloud Theory
The prediction of things should follow the law of development of things, and the prediction process is often accompanied by a lot of uncertainty. With the deepening of the research on uncertainty, people determine that randomness and fuzziness are the most basic, and only uncertainty itself is certain. Probability theory and fuzzy mathematics have been unable to solve the uncertainty, on the basis of the Chinese Academy of Engineering Professor Li Deyi put forward the concept of cloud, and linked the ambiguity and randomness. Cloud models are then applied to natural language processing (NLP), enterprise evaluation, decision analysis, image processing, power system load forecasting, and medical forecasting. With the deepening of the research on uncertainty, people determine that randomness and fuzziness are the most basic, and only uncertainty itself is certain. Probability theory and fuzzy mathematics have been unable to solve the uncertainty, on the basis of the Chinese Academy of Engineering Professor Li Deyi put forward the concept of cloud, and linked the ambiguity and randomness. Cloud models are then applied to NLP, enterprise evaluation, decision analysis, image processing, power system load forecasting, and medical forecasting.
Gastric carcinoma prognostic data inherently exhibits dual uncertainties: randomness arising from inter-individual differences (e.g., genetic heterogeneity, variable treatment responses) is quantified by the cloud model’s entropy (En) and hyper-entropy (He), while fuzziness associated with ambiguous clinical indicator boundaries (e.g., borderline tumor differentiation) is effectively captured via its dynamic integration of fuzziness and randomness, which represents a distinct advantage compared with traditional models. Unlike general uncertainty characterized by fixed rules, medical uncertainty lacks regularity due to unmeasurable confounding factors. By synergizing its core parameters, the cloud model simultaneously accommodates vague clinical indicators and random prognostic outcomes, rendering it more adaptable to the inherent complexity of gastric carcinoma prognosis than single-dimensional models.
Cloud Models and Generators
A cloud is a model of uncertainty transformation between a qualitative concept expressed in terms of linguistic values and its quantitative representation. The cloud model is represented by three numerical features: expectation, Ex; entropy, En; and hyper-entropy, He.
11
The model brings vagueness and randomness together in one complete package. Among the aforementioned features, the expectation Ex is the point that best represents the qualitative concept; the entropy En is the uncertainty measure of the qualitative concept, determined by both the vagueness and randomness of the concept, reflecting the degree of dispersion and range of values taken by the cloud drops; the super-entropy He is the uncertainty measure of the entropy, determined by both the vagueness and randomness of the entropy. The normal cloud model is the most basic cloud model and its expectation curve is a normal characteristic curve with a cloud distribution curve of
The software or hardware that generates the clouds is called a cloud generator, and there are forward and inverse cloud generators and conditional cloud X and Y conditional cloud generators,
12
of which the important ones are the forward and inverse cloud generators. The forward cloud generator is the generation of cloud drops based on the three numerical features of the cloud model, while the inverse cloud generator is the efficient conversion of a certain number of exact values into the appropriate qualitative linguistic values It is based on the expectation of each dimension of the sample It is based on the variance of each dimension of the sample The above steps are repeated until the required number of cloud drops have been generated.
Cloud Transformation
Cloud transformation refers to the mathematical transformation of any irregular data distribution into a superposition of several different clouds according to some principle, ie, Obtain the data distribution function Find the location of the wave peak of the data distribution function Subtract Finally, express the superposition of multiple clouds
The reverse cloud generator is the reverse conversion of the forward cloud generator, which can convert quantitative data into qualitative results. The reverse cloud generator takes the cloud droplets conforming to the normal distribution law as samples, and can convert the quantitative data into appropriate qualitatively expressed digital features (expectation, Ex; entropy, En, hyper-entropy, He), which express all the features of the cloud. The algorithm is as follows
Calculate the sample mean of this set of data from expectation Entropy is derived from the sample mean
RBF Neural Networks
RBF Neural Network Structure and Mapping Relationships
Based on the RBF of multivariate interpolation, Broomhead and Lowe applied a RBF to ANN design and constructed a RBF neural network,
5
which is a forward-looking ANN consisting of an input layer, hidden layer, and output layer. Figure 1 depicts the structure of a three-layer RBF neural network with

RBF neural network structure.
In Figure 1,
The RBF neural network's mapping relationship and basic working principle consists of two parts.
17
In the first part, the original data are passed linearly through the input layer to the hidden layer, and after a nonlinear transformation the input data are mapped directly to the hidden-layer space, and the output of the ith hidden-layer unit in Figure 1 is
In the second part, the data are passed to the output layer and, after a linear transformation, the final output of the RBF network is obtained. Letting
Training Algorithm for RBF Neural Networks
Current algorithms applied to the training and optimization of RBF neural networks, 8 such as genetic algorithms, particle swarm optimization algorithms, and methods based on k-means clustering, all revolve around determining the weights, centers, and width of the network. However, all these methods have more or less various drawbacks, and do not take into account the uncertainty of the input network sample. The traditional RBF neural network toolbox uses the gradient-descent method to determine the center and width of the hidden-layer basis functions, and then uses the orthogonal least-squares algorithm to determine the connection weights of the network, so the focus of the training and optimization of RBF neural networks is the determination of the center and width of the basis functions.
Through the introduction of the above cloud model and RBF neural network, it can be seen that the two are similar. The realization of the cloud model requires three parameters (Ex, En, He), while the training and learning of RBF neural network also requires the determination of three parameters (center, width, connection weight). In this way, the cloud model can expect Ex to replace the center, entropy En to replace the width, and then use the least square method to calculate the connection weight, thus achieving the combination of the two. The fusion model can not only introduce the fuzziness and randomness of the cloud model into the new algorithm, but also retain the learning ability, nonlinear mapping ability and topological relationship of the RBF network.
There is often more than one factor affecting the forecast. When there are more than one factor, it is necessary to build a high-dimensional cloud to solve the problem. Firstly, the distribution curve is obtained by cloud transformation after the standardization of n influencing factors, and then the normal cloud of n influencing factors is generated by cloud generator algorithm. The cloud number of each influencing factor is taken as the number of hidden layers, and then the cloud node of each hidden layer is triggered according to the RBF neural network training method, and the high-dimensional cloud model is built according to the N-dimensional normal cloud algorithm. The same samples with different influencing factors trigger the same nodes to obtain different membership degrees. The randomness of the results satisfies the fuzziness of normal distribution. Then, the mean value of different membership degrees is taken as the expectation. All the samples were used to trigger all the nodes respectively, and the output matrix of the hidden layer was obtained. The weight matrix was calculated by using the least square method. Thus, the cloud model optimization RBF neural network completed the training.
Prognosis Prediction of Gastric Carcinoma Patients Based on Cloud Model
Cloud-Model Optimization of RBF Neural Network Parameters
The similarities between cloud theory and RBF neural networks, described above, can be seen in the fact that three parameters (
Analysis of Sample Sources and Prognostic Risk Factors
We downloaded data on 76,862 patients with all primary gastric carcinomas from 1973 to 2015 from the U.S. National Institutes of Health's SEER database (http://seer.cancer.gov/) via SEER*Stat software (v8.3.6, https://seer.cancer.gov/seerstat/). Exclusion criteria are the following: (i) patients with missing basic personal information, such as age at diagnosis, gender, race, TNM stage, and unknown tumor size; (ii) pathological type other than adenocarcinoma or squamous carcinoma. We extracted and analyzed variables such as race, age, gender, tumor location, degree of differentiation, tumor-node-metastasis (TNM), histological grade, histological type, tumor size, survival status, and time to stage all patients according to the seventh edition of the American Joint Committee on Cancer (AJCC) staging protocol for gastric carcinoma with T, N, and M. Of note, 11,474 patients with gastric carcinoma were obtained after strict screening according to the enrollment criteria. In this study, the SEER database (11,474 cases) data were used as training data and the Linzhou Centre for Disease Control and Prevention gastric carcinoma (769 cases) data were used as validation data.
A univariate Cox analysis was first performed on the gastric carcinoma patients with training data, after which the significant risk factors from the univariate analysis were introduced into the Cox proportional risk model for multivariate analysis to obtain independent factors for the prognosis of gastric carcinoma, in which the test criterion was defined as P < 0.01 being statistically significant. The above calculations mainly used the Hmisc, survival, rms, and ComplexHeatmap packages of the RStudio Version 1.1.463 software. The reporting of this study conforms to TRIPOD guidelines 18 .
A univariate Cox analysis of each clinical factor showed that age of diagnosis(P < 0.001), race (P < 0.001), histological grade (P < 0.001), primary tumor (P < 0.001), regional lymph nodes (P < 0.001), distant metastases (P < 0.001), and race (P < 0.001) were associated with patient survival. Gender (p = 0.081) did not correlate with patient prognosis. The calculated cutoff value for the age of diagnosis was set at 60 years as the most appropriate. The details of these results are shown in Table 1.
Univariate and Multivariate Cox Analyses of Factors Affecting Gastric Carcinoma Prognosis.
Factors with univariate analysis results (P < 0.01) were selected for multivariate Cox analysis, which showed that age of diagnosis (P < 0.001), histological grade (P < 0.001), primary tumor (P < 0.001), regional lymph nodes (P < 0.001), distant metastasis (P < 0.001), and race (P < 0.001) were independent factors affecting the prognosis of patients with gastric carcinoma. The details of these results are shown in Table 1.
Age of diagnosis, race, histological grade, primary tumor, regional lymph nodes, and distant metastases were known to be independent factors affecting the prognosis of patients with gastric carcinoma from the multivariate Cox analysis. The final independent prognostic factors obtained were used to construct a prognostic model for gastric carcinoma.
Prognostic Model for Gastric Carcinoma Based on New Model
The risk factor data obtained above were normalized to the range of [0,1] using the Min-Max normalization method
19
,
20
. These normalized data were then used as a feature vector, i.e., a factor affecting the prognosis of patients with gastric carcinoma. The model results were predicted using the MatLab (MathWorks, USA) platform for OS values. Age, race, histological grade, primary tumor (T stage), regional lymph nodes (N stage), and distant metastases (M stage) were used as input data. The RBF three-layer neural networks needed to be constructed, in which the number of hidden layers is determined as follows.
The method described above was used to perform a maximum-value-method cloud transformation of the impact-factor tumor grade. First, the data distribution curve of the tumor grade was obtained, as shown in Figure 2. Again following the cloud generator algorithm, three normal cloud maps were obtained, as shown in Figure 3. The same cloud transformation was done for affecting age, race, histological grade, primary tumor (T), regional lymph nodes (N), and distant metastases (M), and the combined cloud numbers and cloud parameters are shown in Table 2. A high-dimensional cloud model of age, race, histological grade, primary tumor (T), regional lymph nodes (N), and distant metastases (M) was constructed according to the high-dimensional cloud theory. The specific method refers to the n-dimensional normal cloud algorithm described in Section 1.1 above, in which the cloud is constructed with a total of 4*2*4*4*2*2 dimensional clouds, ie, the number of neurons in the hidden layer is eight. Training of the RBF neural networks was completed. Each cloud node of the hidden layer was triggered with a training sample, the same node was triggered with the same sample several times, several different affiliation degrees were obtained, the mean value of the latter was taken as the expectation, eight nodes were triggered with 3751 samples, a 3751*512 output matrix H of the hidden layer was obtained, the network output was a 1*3751 matrix T, and then the least-squares method was used to easily determine the weight matrix W in

Distribution of tumor grade data.

Merged cloud maps.
Combined Parameters of Cloud Transformation for Each Influencing Factor.
To verify the effectiveness of the cloud model RBF neural network algorithm
21
, we further evaluated the predictive power and accuracy of the model by calculating the linear trend
Evaluation of Prognostic Power and Accuracy.
We found that the likelihood ratio
Conclusions
The parameters of RBF neural networks are difficult to determine due to the uncertainty of most of the factors affecting the prognosis of patients with gastric carcinoma. In this study, we present an improved RBF neural network algorithm with normal cloud parameters. The proposed algorithm converts the optimization problem of RBF neural network parameters into a problem of determining normal cloud parameters, which endows the improved cloud-model RBF neural network with the unity of fuzzy and stochastic cloud models. Simulation experiments on gastric carcinoma patient data from the SEER and Linzhou Centre for Disease Control and Prevention databases show that the cloud-model-optimized RBF neural network algorithm has a higher predictive power than other models. It achieves the highest likelihood ratio x2 (31.37) and C-index (0.715), and the lowest AIC value (17321.22), indicating better discriminative ability, predictive power, and model fitting. By effectively addressing the inherent uncertainties of clinical prognostic data, this optimized algorithm offers a more reliable reference for clinical prognostic assessment of gastric carcinoma. In previous studies, cloud models have been applied to power-system load forecasting and temperature prediction, but this is the first time, to our knowledge, that cloud models have been applied to medicine. The results of this study suggest that the algorithm has a promising future in the prognosis of patients with gastric carcinoma. It should be noted that this study has certain limitations. All data used in this research were derived from the Surveillance, Epidemiology, and End Results (SEER) database and the Linzhou Centre for Disease Control and Prevention database. These datasets may have inherent biases, such as limited demographic representativeness and potential underreporting of certain clinical variables, which could affect the generalizability of the proposed model.
In summary, we developed and validated a new cloud-model-optimized RBF neural network model to predict survival in patients with gastric carcinoma. The model is easy to use and has significant predictive advantages over models. Moreover, the model is more advantageous when dealing with uncertainty. In future work, we will collect more comprehensive data to further improve the accuracy of the model and enhance its clinical applicability.
Footnotes
Abbreviations
Ethical Disclosure
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study involves the secondary analysis of de-identified, publicly available clinical data on gastric carcinoma, with no direct interaction with human participants, collection of original patient samples, or access to identifiable personal health information. In accordance with the ethical guidelines for retrospective studies using anonymized public datasets and relevant regulations, formal ethical review and informed consent are not required for this research, as the data have been stripped of all identifiers and pose no risk to patient privacy or autonomy. The data analysis was conducted strictly in compliance with the data usage policies of the source database to ensure the protection of patient information.
Consent for Publication
Not applicable
Authors contributions
LK and G-SG designed this study and wrote the manuscript, and J-YL, W-YX, J-PP, LD, ZYL and CZ prepared the figures. S-XY, HTT, MK and W-XL performed the statistical analyses. All authors read and gave final approval of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Henan Provincial Medical Science and Technology Research Project, China (Nos. RKX202402030 and ZLKFJJ20230509)
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
