Abstract
Rainfall (especially moderate to heavy rain) will change sea surface roughness, leading to alter Global Navigation Satellite System Reflection (GNSS-R) signal characteristics. This study utilizes L1-level GNSS-R data from Fengyun-3E (FY-3E) Global Navigation Satellite System Occultation Sounder II (GNOS-II) to study the classification and regression tasks for moderate to heavy rain (with rainfall intensity greater than 0.4 mm/h). In the 0classification task, three multi-parameter classification models were developed based on GPS/BDS feature data and environmental characteristic data, the experimental results from a 2-month validation dataset indicated that the logistic regression (LR) model performed best, achieving an accuracy of 0.97 and a precision of 0.80. In the regression task for moderate to heavy rain, the Proportional-Integral-Derivative-based Search Algorithm (PSA)-Fusion-Convolutional Neural Networks (CNN)-Transformer-RainNet (PFCT-RainNet) sea surface rain regression inversion model was proposed. This model integrates Delay Doppler Map (DDM) images for multi-modal and multi-parameter fusion training, and incorporates the PSA for automatic hyperparameter optimization. The results from the 2-month validation dataset showed that the PFCT-RainNet model achieved a coefficient of determination (R2) of 0.50, a root mean square error (RMSE) of 0.59 mm/h, and a mean absolute error (MAE) of 0.39 mm/h, which demonstrate the effectiveness of the PFCT-RainNet model in inversing moderate to heavy rain intensity under conditions of low wind speed (WS) and low significant wave height (SWH).
Introduction
Research on precipitation is a significant aspect of meteorological studies, providing practical value for disaster prevention and mitigation. Common methods for obtaining precipitation information include rain gages and meteorological radars. 1 However, the sparse distribution and discontinuous data of rain gages and radars limit their ability to achieve rainfall monitoring with a larger spatial range and greater spatial density. 2 High-density observations from terrestrial GNSS stations can accurately retrieve spatial grids of PWV (Precipitable Water Vapor) with high temporal and spatial resolution over larger land areas, but this approach remains restricted to local terrestrial precipitation observations.3,4
Spaceborne Global Navigation Satellite System Reflectometry (GNSS-R) technology has achieved widespread application in ocean remote sensing due to its advantages of short revisit cycles, low cost, and high spatiotemporal resolution.5,6 Its applications encompass sea surface wind field detection,7,8 significant wave height measurement,9,10 sea level monitoring,11,12 etc. Rainfall-induced changes in sea surface roughness affect GNSS-R signals through two mechanisms: (1) raindrop impacts generating ring waves and turbulence,6,7 and (2) alterations in air-sea momentum transfer that suppress wind-wave development. 13 Early studies14,15 first validated the feasibility of GNSS-R for detecting precipitation under low wind conditions (WS < 5 m/s), while Asgarimehr et al. 16 further quantified the interference effect of rainfall attenuation on GNSS-R wind speed retrieval, revealing significantly enhanced signal attenuation when rainfall intensity exceeds 5 mm/h.
Based on these mechanisms, Bu and Yu 17 proposed a threshold method using probability density functions (PDFs) for GNSS-R observations, achieving rainfall detection and intensity inversion through normalized central delay waveform (NCDW) summation. This method attained a 75% probability of detection (POD) and a model accuracy of 3.74 mm/h under low wind conditions. Subsequent research 18 optimized combined models, reducing the root mean square error (RMSE) of rainfall intensity inversion to 3.17 mm/h (correlation coefficient R = 0.79). However, traditional threshold methods exhibit sensitivity to geometric parameters and sea state fluctuations, particularly in low wave height scenarios (SWH < 2 m), where DDM (Delay-Doppler Map) noise easily obscures microscale roughness signals.18,19 To address this, Bu et al. 20 incorporated CYGNSS DDM data to construct multi-parameter machine learning models using support vector machines (SVM), random forests (RF), and convolutional neural networks (CNN), improve rainfall detection precision, recall, and F1-score to 78.5%, 83.8%, and 78.1%, respectively.
Innovations in machine learning have opened new pathways for GNSS-R rainfall inversion. Xiao et al. 21 developed a deep learning model that successfully decoupled the coupled effects of rainfall and wind speed on DDM features, reducing wind speed retrieval error by 32%. This signal separation technique provides theoretical support for extracting rainfall intensity from time-delay waveform distortions. 17 Concurrently, multi-parameter joint inversion has emerged as a research trend. Bu et al. 13 achieved triple-element joint inversion (rainfall intensity-wind speed-wave height) in China’s coastal regions using CYGNSS data, validating the feasibility of multi-physical quantity collaborative modeling. Furthermore, breakthroughs have been made in GNSS-R applications for extreme rainfall events. Wei et al. 19 assessed flood inundation extent induced by Guangdong rainstorms through CYGNSS signal anomaly detection (F1 = 83.8%), while Rajabi et al. 22 demonstrated a strong correlation (R2 = 0.79) between GNSS-R coherent time degradation and heavy rainfall intensity.
To summary, traditional methodologies, such as threshold-based techniques (e.g. probability density function analysis), face critical challenges:
Manual feature engineering: Reliance on handcrafted features (e.g. normalized center delay waveforms) restricts their ability to capture nonlinear relationships between GNSS-R signals and rainfall intensity.
Poor noise robustness: Fixed thresholds are sensitive to observational noise in DDMs, which is exacerbated under dynamic sea-state conditions.
Narrow applicability: These methods lack systematic validation across heterogeneous datasets (e.g. multi-constellation GNSS-R data) and diverse marine environments.
Machine learning-based approaches have demonstrated promising potential for rainfall detection and intensity retrieval using onboard GNSS-R technology. However, current research predominantly focuses on model accuracy analysis, with limited exploration of generalization capabilities—particularly under low WS (<5 m/s) and low SWH (<2 m) conditions where rainfall-induced roughness dominates over wind-driven effects.
To address these limitations, this paper proposes PFCT-RainNet and PFCAL-RainNet, novel deep learning frameworks tailored for rainfall intensity inversion under low WS/SWH conditions. Our solutions integrate three key advancements:
Multi-modal fusion architecture: A hybrid CNN-Transformer network extracts spatial patterns from DDM images and temporal dependencies from GNSS-R feature parameters. Cross-modal attention mechanisms dynamically fuse heterogeneous data (GPS/BDS signals) to enhance feature representation.
BM3D denoising: A block-matching non-local mean algorithm suppresses DDM noise while preserving micro-scale roughness variations caused by rainfall.
PID search optimization algorithm (PSA): An automated hyperparameter tuning algorithm replaces manual optimization, improving convergence efficiency by 40% in complex models through proportional-integral-derivative feedback control.
Dataset description
The data used in this study can be categorized into three main types: the FY-3E GNOS-II L1-level dataset, the ERA5 reanalysis dataset, and the IMERG-F precipitation dataset.
FY-3E GNOS-II L1 data
Launched on July 5, 2021, FY-3E is China’s second-generation polar-orbiting meteorological satellite and the world’s first civil dawn-dusk orbit meteorological satellite. 23 The Global Navigation Satellite System Occultation Sounder II (GNOS-II) aboard FY-3E is a functional GNSS remote sensing instrument that uniquely combines GNSS radio occultation (GNSS RO) and GNSS reflectometry (GNSS-R). A notable feature of GNOS-II is its capability to receive reflection signals from multiple GNSS constellations, including GPS (Global Positioning System), BDS (Beidou Navigation Satellite System), and Galileo. GNOS-II L1 data provides normalized radar cross-section values at mirror reflection points, calculated using delay-Doppler correlated power waveforms, along with related auxiliary information. The product primarily includes 122 × 20 non-uniform delay-Doppler correlated power waveform data, GNSS-R event occurrence times and locations, reflecting GNSS satellite IDs, the positions and velocities of reflecting GNSS satellites, and the position of the FY-3E satellite.
ERA5 reanalysis data
ERA5 is the fifth generation of global atmospheric reanalysis data, 24 integrating model data with observations from around the world to create a consistent global dataset that provides an optimal estimate of atmospheric states. ERA5 offers a wealth of estimates for various atmospheric, land, and ocean climate variables on an hourly basis.
In this study, convective rain rate (CRR) from ERA5 is utilized as the ground truth for input and validation, while sea surface WS, wave height of first swell partition, mean sea level pressure (MSLL), and sea surface temperature (SST) are employed as environmental parameters in the model to mitigate the impact of environmental factors. The spatial resolution is 0.25 × 0.25 (degree × degree), and the WS data is located at a height of 10 m above sea surface level, encompassing both latitude and longitude directions.
Due to the differing spatial and temporal resolutions between FY-3E and ERA5, this study employs bilinear interpolation for spatial matching and linear interpolation for temporal matching to achieve a synchronized spatiotemporal alignment between FY-3E and ERA5 data.
IMERG-F rainfall data
GPM is an international satellite mission conducted in collaboration between NASA (National Aeronautics and Space Administration) and JAXA (Japan Aerospace Exploration Agency). 25 This initiative utilizes a combination of multiple sensors, satellites, and algorithms, along with rain gage data, to provide higher-accuracy precipitation data. GPM can deliver global rainfall and snowfall products based on microwave observations every 3 h, as well as half-hourly products combining microwave and infrared data, extending its coverage to the polar regions.
IMERG is an algorithm that generates level 3 products from GPM, integrating data from all passive microwave instruments on the GPM satellite to provide rainfall estimates. This study utilizes the IMERG-F half-hour precipitation product, which has a temporal resolution of 30 min and a spatial resolution of 0.1 × 0.1 (degree × degree), covering the global latitudes from 60° S to 60° N.
To further validate the generalization performance of the sea surface rainfall retrieval results, IMERG-F rainfall data and FY-3E GNOS-II L1 data are spatially and temporally matched using bilinear interpolation and linear interpolation methods, respectively. This allows for a comprehensive verification of the model’s predicted rainfall values.
Principle and process of GNSS-R sea surface rainfall retrieving
Principle of inversion of sea surface rainfall
GNSS-R technology utilizes the bistatic radar transmission equation to analyze variations in parameters such as intensity, frequency, phase, and polarization direction between reflected signals and direct GNSS signals. Using these scattering characteristics, it is possible to retrieve properties of the reflecting surface, such as roughness, reflectivity, and dielectric constant, thereby determining the nature and state of the reflecting surface. 26
The scattering theory can be described using the Z-V model established by Zavorotny and Voronovich. In 2016, Gleason et al. 27 made certain modifications to this expression, as shown in equation (1):
It should be noted that equation (1) does not consider the impact of rainfall, focusing solely on the influence of WS. In reality, both sea surface wind and precipitation affect the power of reflected signals. Depending on the WS, ocean wave spectra can be broadly categorized into two electromagnetic scattering states. The first state is the near-specular reflection region, where WS are typically less than 5 m/s, and the surface Rayleigh parameter is significantly less than 1. The second state is the diffuse scattering region, where WS are greater than or equal to 5 m/s, and the surface Rayleigh parameter is higher, making geometric optics approximations effective.
In low WS conditions, surface roughness caused by raindrop impacts dominates, while in high WS conditions, wind-generated surface roughness becomes more significant. The effect of raindrops on the sea surface is modeled through the first-order superposition of ring wave spectra and wind-driven Elfouhaily spectra. The influence of raindrops on the water surface can be described using a logarithmic Gaussian model of the rainfall spectrum 6 :
SK(K) represents the ring spectrum,
Based on the above analysis, when WS are below 5 m/s, the changes in sea surface roughness induced by rainfall is dominant, allowing for the retrieval of rainfall in low WS conditions. Additionally, in low WS conditions, the impact of SWH on rainfall retrieval must also be considered. In this study, it was determined through multiple statistical experiments that DDM data with SWH over 2 m is not used.
The impact of rainfall on GNSS-R characteristics
Based on the aforementioned principles of rainfall retrieval from sea surfaces using satellite GNSS-R, changes in sea surface roughness caused by rainfall directly affect the variations in GNSS-R signal characteristics. The average DDM (DDMA) reflects the intensity of signals around the specular reflection point (SP), 28 while the leading edge slope (LES) indicates the scattering extent of signals in that vicinity.
Under nearly identical SWH and WS conditions, a qualitative analysis of the impact to DDM features was conducted. Figure 1 displays the DDM Images and corresponding DDMA for scenarios of (a) no rainfall, (b) moderate rainfall, and (c) heavy rainfall under conditions where WS is controlled about 2 m/s and SWH is maintained about 1 m. From Figure 1, it can be observed that the DDM shapes appear nearly identical due to approximately equal WS, rainfall has minimal impact on the shape of the DDM; however, compared to no rainfall (0 mm/h) and moderate rain (0.4 mm/h), the DDMA for heavy rainfall (7.4 mm/h) is generally lower. This indicates that under conditions of low WS and low SWH, rainfall causes changes in sea surface roughness, thereby reducing GNSS-R signal energy, and resulting in decreased DDMA values.

FY-3E DDMs under different rainfall conditions: (a) Rainless (CRR = 0 mm/h, WS = 2.0 m/s, SWH = 1.0 m, DDMA = 7,044,979), (b) moderate rainfall (CRR = 0.4 mm/h, WS = 1.9 m/s, SWH = 1.0 m, DDMA = 5,362,010), and (c) heavy rainfall (CRR = 7.4 mm/h, WS = 2.1 m/s, SWH = 0.9 m, DDMA = 3,173,988).
Data from July 2022 to May 2023 (WS less than 5 m/s, SWH under 2 m, and rainfall greater than 0 mm/h) were used to further analyze the impact of rainfall on DDMA and LES characteristics. Figures 2 and 3 show scatter plots of normalized DDMA and LES values against true rainfall values, respectively. As the true rainfall values increase, the overall DDMA values exhibit a declining trend, while the LES values show a slight upward trend. This observation aligns with the principles of sea surface rainfall retrieval using GNSS-R.

Scatter relationship between DDMA (normalized) and rain (The red line represents the cubic fitting curve).

Scatter relationship between LES (normalized) and rain (The red line represents the cubic fitting curve).
When the true rainfall values are below 0.4 mm/h, the correlation coefficients for DDMA and LES with rainfall are 0.06 and 0.03, respectively. However, once the true rainfall values about exceed 0.4 mm/h, the correlation coefficients for DDMA, and LES increase to 0.12 and 0.08, indicating that moderate to heavy rainfall (≥0.4 mm/h) has a more significant impact on the correlation with GNSS-R parameters. Therefore, this study will focus on classifying moderate to heavy rainfall based on a threshold of 0.4 mm/h and conducting rainfall regression retrieval for such events.
Data preprocessing
The preprocessing of FY-3E GNOS-II L1 data is crucial to ensure the quality and reliability of the input features for the classification and regression models. The preprocessing steps are as follows:
Data filtering
Wind Speed (WS) and Significant Wave Height (SWH) thresholds
Only data with WS < 5 m/s and SWH < 2 m are retained, as these conditions are optimal for detecting rainfall-induced changes in sea surface roughness.
Signal-to-Noise Ratio (SNR)
Data with SNR > 0 are selected to ensure the quality of the GNSS-R signals.
Rainfall intensity
Data with rainfall intensity greater than 0 mm/h are included to focus on rainfall detection and inversion.
Data cleaning
Missing values
Any records with missing or invalid values in key parameters (e.g. DDMA, LES, SNR) are removed.
Outlier removal
Outliers in the GNSS-R signal parameters (e.g. abnormally high or low DDMA values) are identified and excluded using the Interquartile Range (IQR) method.
Normalization
Feature scaling
All numerical features (e.g. DDMA, LES, SNR) are normalized to a range of [0, 1] using Min-Max scaling to ensure consistent input scales for the machine learning models.
Image normalization
DDM images are normalized to have zero mean and unit variance to improve the convergence of the CNN-based models.
Data augmentation
Image denoising
The BM3D (Block-Matching 3D Transform) algorithm is applied to DDM images to reduce noise and enhance the quality of the input data for the regression models. This step is particularly important for improving the accuracy of rainfall intensity retrieval.
Spatial and temporal interpolation
To address gaps in the data, bilinear interpolation is used for spatial matching, as illustrated in Figure 4, and detailed in equations (3)–(5). Additionally, linear interpolation is applied for temporal alignment between FY-3E GNOS-II L1 data and the ERA5/IMERG-F datasets, as described in equation (6).

Bilinear interpolation schematic diagram.
Feature selection
Relevant Features: Based on prior studies and correlation analysis, 32 feature parameters (as listed in Table 1) are selected for the classification and regression tasks. These include signal parameters (e.g. DDMA, LES), system parameters (e.g. satellite velocity, antenna gain), and environmental factors (e.g. WS, SWH).
Feature parameters of the model.
Dimensionality reduction
Principal Component Analysis (PCA) is optionally applied to reduce the dimensionality of the feature space, although this step is not used in the final models due to the relatively low number of features.
Dataset splitting
Training and validation sets
The preprocessed data is split into training (70%) and validation (30%) sets. The training set is used for model development, while the validation set is used to evaluate the generalization performance of the models.
Temporal consistency
The validation set is specifically chosen from the period of June to July 2023 to ensure temporal consistency and avoid data leakage between training and validation.
By following these preprocessing steps, the data is prepared for effective model training and validation, ensuring that the input features are clean, normalized, and representative of the target conditions (low WS and low SWH).
Retrieval process of sea surface rainfall
Figure 5 outlines the rainfall retrieval process employed in this study, which is divided into three main components: data matching, preprocessing, and model training and validation. This study focuses on the detection of moderate to heavy rainfall and the retrieval of the corresponding rainfall intensity under low WS conditions (<5 m/s) and low SWH (<2 m). Threshold screening for WS and SWH was conducted during data matching. In the preprocessing stage, strict data quality control and screening were applied using criteria such as signal-to-noise ratio (>0) and retaining sea surface DDM to enhance the effectiveness of various feature parameters in the model. The rainfall model training included a classification model for moderate to heavy rainfall based on GNSS-R feature parameters, as well as a multimodal sea surface rainfall regression model based on the fusion of heterogeneous GNSS-R Data. To improve the model’s reliability, all experimental results were averaged from three independent experiments. Finally, the results of the moderate to heavy rainfall classification model were validated against ERA5 data, while the results of the rainfall regression retrieval model were validated against both ERA5 and IMERG-F data for accuracy comparison.

Rainfall inversion process diagram.
Feature parameters
In previous studies, DDM observations related to sea surface roughness have been proposed and widely applied in fields such as sea surface WS retrieval and SWH estimation,29,30 this study continues the use of signal parameters of DDM associated with GNSS-R signal strength, including signal-to-noise ratio (SNR), normalized bistatic radar cross section (NBRCS), and LES at SP (such as No. 10, No. 26, No. 7), DDMA (No. 25) and SNR at peak point (No. 11). Code delay and Doppler delay at SP (No. 12, No. 13) or peak point (No. 14, No. 15) are also used here. The above signal parameters are provided by FY-3E. SP_LES (No. 27) and SP_TES (No. 28), which represent LES and the trailing edge slope (TES) at SP, are the results calculated through FY-3E DDM.
Additionally, system parameters relevant to DDM, such as the XYZ directional velocities of the satellite transmitter (No. 1–3) and receiver (No. 4–6), as well as the incidence angle (No. 8), gain of antenna (No. 9), GNSS PRN (No. 21) are included. Considering spatial and time sequence impact, position of SP (No. 19, No. 20) and time of DDM (No. 17, No. 18) are considered here. Parameters related to the attitude of the FY-3E satellite (No. 29, No. 31) and the attitude of the orbit (No. 30, No. 32) are also used here. The above system parameters are provided directly by FY-3E.
Moreover, four environmental factors related to rainfall conditions or sea surface roughness are added: sea surface WS (No. 16), SWH (No. 22), MSLL (No. 23), and SST (No. 24).
In total, 32 feature parameters are considered, as outlined in Table 1. Among these, system and signal parameters are sourced from the FY-3E satellite, while environmental feature parameters are obtained from ERA5. The contribution of each parameter will be discussed in Chapter 5.
Classification model for moderate to heavy rainfall
Based on the characteristics of satellite GNSS-R data, this study employs three typical models—Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR)—for the classification of moderate to heavy rainfall (greater than 0.4 mm/h). The input features for the classification model only utilized the 32 one-dimensional feature parameters listed in Table 1. Figure 6 presents three moderate to heavy rainfall classification models.

Three moderate to heavy rainfall classification models.
RF classification model
Random Forest (RF) is based on the Bagging ensemble learning theory and the concept of random subspace, comprising numerous decision trees. This approach addresses the weakness of a single decision tree’s generalization ability, maintaining good accuracy even with significant feature loss, and significantly reducing the risk of overfitting. However, RF has higher computational and storage costs due to the need to train multiple decision trees, resulting in increased model complexity and reduced interpretability compared to a single decision tree.
Through multiple random small sample experiments, it was determined that the RF classification model used in this study employs 100 decision trees, utilizes the Gini coefficient as the splitting criterion, does not limit the maximum depth, and sets the random seed to 42.
DT classification model
Decision Tree (DT) is a tree-structured model used for classification and regression tasks. Each node represents a feature, each branch signifies a split based on a feature value, and each leaf node corresponds to a category or a regression value. The advantages of decision trees include their ease of understanding and interpretation, as well as their capability to handle both numerical and categorical data without requiring feature scaling. However, they can easily overfit when the tree depth is large, leading to potentially unstable results for certain datasets.
Through multiple random small sample experiments, it was determined that the DT classification model used in this study employs the Gini coefficient as the splitting criterion, does not limit the maximum depth, and sets the minimum number of samples required at each internal node (non-leaf node) to 2, and the minimum number of samples at each leaf node to 1, without restricting the maximum number of leaf nodes.
LR classification model
Logistic Regression (LR) is a generalized linear regression analysis model primarily used to address binary classification problems. It builds on linear regression by incorporating the Sigmoid function, which restricts the output to a range between 0 and 1.
Through multiple random small sample experiments, it was determined that the LR classification model used in this study employs L2 regularization, sets the random seed to 42, and establishes the maximum number of iterations at 1000.
Regression model of moderate to heavy rainstorm with image fusion
To further enhance the model’s generalization capability, this study proposed two regression models based on the PSA for DDM image fusion in the context of moderate to heavy rainfall regression experiments. The input features for the regression models combined two-dimensional DDM image data and 32 one-dimensional feature parameters from FY-3E listed in Table 1.
DDM denoise
DDM images often contain noise due to satellite observations, the BM3D (Block-Matching 3D Transform) algorithm 31 is employed to denoise the DDM images during preprocessing. BM3D is a classical image-denoising algorithm that improves denoising performance by leveraging block-matching and three-dimensional transform domains. By applying the BM3D algorithm to the DDM data, the study aims to reduce interference caused by noise and clutter, thereby enhancing the accuracy, and reliability of rainfall detection and intensity retrieval.
The BM3D image denoising algorithm consists of two main stages:
Step 1: Basic Estimation
(1) Block matching identifies segments similar to a reference block, referred to as similar blocks. They are organized into a three-dimensional array known as a three-dimensional group based on their similarity.
(2) Collaborative hard threshold filtering is applied to these groups through transformation, helping to reduce image noise. A three-dimensional inverse transformation then generates an estimate for the two-dimensional similar blocks.
(3) A weighted average of the multiple estimates for each similar block is computed, resulting in the basic estimate of the image.
Step 2: Final Estimation (Using Basic Estimation as Input)
(1) Block matching is again conducted within the basic estimate to identify the positions of similar blocks, leading to the creation of two sets of three-dimensional images: one from the noisy input and the other from the basic estimate.
(2) Collaborative Wiener filtering is applied to both sets via three-dimensional transformations. The group from the basic estimate acts as the energy spectrum of the true signal, which is used to perform the filtering on the noisy image. An inverse transformation returns the processed data to its original positions, yielding the final estimate.
(3) A weighted average of the pixel estimates is calculated to produce the final image estimate.
Figure 7 illustrates the comparison of the DDM before and after BM3D image denoising, showing a notable enhancement in smoothness and a significant reduction of noise around the peaks, resulting in a clearer, and more refined image.

DDM comparison before and after denoising: (a) original DDM and (b) DDM after BM3D denoise.
PSA
PSA 32 is employed to determine the model’s hyperparameters, such as learning rate, number of hidden layer nodes, and regularization coefficient. Given the complexity of the proposed model and the numerous hyperparameters involved, manually tuning these parameters is both time-consuming and prone to yielding suboptimal solutions. The PSA efficiently explores the parameter space by automatically adjusting the parameters, thereby avoiding the cumbersome process of manual tuning. Its feedback mechanism allows for dynamic adjustments of the parameter settings, facilitating a faster identification of the optimal parameter combinations, which enhances the model’s performance, and stability. This approach not only improves the efficiency of parameter optimization but also significantly boosts the model’s generalization capability across different datasets.
The PID control consists of three components:
Proportional (P): Adjustments are made based on the magnitude of the current error; larger errors lead to greater adjustments.
Integral (I): The error is accumulated to eliminate long-term deviations in the system.
Derivative (D): Adjustments are based on the rate of change of the error, predicting future trends and modifying the rate of control adjustments.
In the PSA, these three components are utilized to regulate the optimization parameters during the search process, enabling the algorithm to effectively locate optimal solutions within a vast search space. Figure 8 illustrates the adjustment process of incremental PID control.

Adjustment process of incremental PID control.
In this study, the population size for the PSA algorithm is set to 30, with a maximum of 1000 iterations. Figures 9–12 display comparisons of the results obtained using the PSA algorithm and the classical PSO (Particle Swarm Optimization) 33 algorithm on four benchmark test functions from CEC 2005. PSO is a widely used optimization algorithm inspired by the social behavior of birds and fish. We use PSO to compare the performance of PSA. These figures illustrate the performance differences between the two optimization methods. It is evident from the comparison that PSA outperforms PSO in terms of convergence. Specifically, PSA reaches better solutions with fewer iterations, demonstrating faster convergence. This indicates that, when dealing with complex optimization problems, PSA possesses stronger global search capability and higher convergence efficiency, allowing it to more effectively explore the search space and find solutions that are closer to the optimal.

Comparison of F1 standard test functions.

Comparison of F5 standard test functions.

Comparison of F10 standard test functions.

Comparison of F12 standard test functions.
The PSA implements hyperparameter optimization through a feedback control mechanism inspired by PID controllers. As shown in Figure 8, the algorithm dynamically adjusts hyperparameters (e.g. learning rate αt, regularization coefficient λt) at iteration t through the following incremental update rule:
Where
Where
PSA effectively addresses common limitations of other hyperparameter optimization methods. Unlike grid search, PSA avoids the curse of dimensionality, where search time grows exponentially with the number of parameters. Compared to Bayesian optimization, PSA better handles noise in the objective function of GNSS-R data, mitigating optimization bias caused by noise interference. Additionally, unlike random search, PSA incorporates a PID feedback mechanism, providing directional guidance and significantly improving exploration efficiency. Thus, PSA demonstrates superior robustness and efficiency in hyperparameter optimization.
The algorithm process for optimizing the hyperparameters of the neural network model using the PSA is as follows:
Initialize the parameters of the PSA controller.
Adjust the hyperparameters based on the current model performance (such as loss on the validation set).
Train the model and evaluate its performance.
Adjust the PSA controller parameters based on feedback.
Repeat steps 2–4 until the optimal hyperparameters are found.
After experimentation, it has been determined that the optimal learning rate for the model in this study is 0.001, with the regularization coefficient being L1. The hyperparameters used by all models are set to the above values.
PSA-Fusion-CNN-Attention-LSTM-RainNet rainfall regression model
Due to the exceptional spatial feature extraction capability of the CNN model and the temporal feature learning ability of the LSTM model, 30 this study combines both models to achieve improved generalization performance. The CNN processes input data through convolutional layers to extract local features, while the LSTM captures long-term dependencies from serialized data. The processed image data and feature data are subsequently merged, allowing the model to simultaneously understand both local input features and their temporal variations, thereby enhancing its ability to model complex sequential data.
To further refine the feature extraction process, a global average pooling layer and an SE (Squeeze-and-Excitation) channel attention mechanism are introduced after the CNN. The SE mechanism adaptively recalibrates the feature responses of each channel by learning their importance. 34 This improves the model’s sensitivity to significant features and helps it focus on critical aspects, thereby strengthening its representational capacity. As a result, the model is better equipped to prioritize the features most relevant to the prediction task. This attention mechanism enhances the model’s ability to capture key information while preserving the richness of sequential features.
Figure 13 illustrates the architecture of the PSA-Fusion-CNN-Attention-LSTM-RainNet (PFCAL-RainNet) rainfall regression model.

PFCAL-RainNet framework map.
PSA-Fusion-CNN-Transformer-RainNet rainfall regression model
The Transformer model, known for its self-attention mechanism, demonstrates exceptional performance in handling both temporal, and spatial data. 35 In this study, it is applied to image fusion tasks. The self-attention mechanism computes weighted representations of the input data, capturing global dependencies and important features for a more comprehensive understanding of the input images. The multi-head self-attention mechanism allows the model to concurrently focus on different parts of the input data across various subspaces, enhancing its ability to capture complex features.
The Transformer utilizes positional encoding to retain sequence information, enabling the model to recognize the order of input data and aiding in understanding how features evolve. Stacking multiple layers of self-attention mechanisms and feedforward neural networks further extracts high-level features, thereby enhancing the model’s representational capacity and robustness, resulting in the improvement of the performance on image fusion tasks.
Specifically, CNN is first employed to extract features from the input DDM images, which are then processed by the Transformer. The processed image data and feature data are subsequently merged, and the results are output through a dense layer. Figure 14 illustrates the structure of the PSA-Fusion-CNN-Transformer-RainNet (PFCAL-RainNet) rainfall regression model.

PFCT-RainNet framework map.
Analysis of experimental results
Experimental data explanation
To ensure the temporal span and diversity of the training set, the dataset used in this study covers a total of 10 months from July 2022 to May 2023 (including both GPS and BDS data). Initially, data preprocessing was conducted using conditions such as WS less than 5 m/s, SWH under 2 m, and signal-to-noise ratios greater than 0. Subsequently, 70% of the data was randomly selected for training, while the remaining 30% was reserved for testing. The validation set, covering June to July 2023, was used to evaluate the model’s generalization capability. Table 3 presents the distribution of the validation set data.
Table 2 displays the quantity and distribution of the FY-3E training set data, while Table 3 presents the distribution of the validation set data. From the training set shown in Table 2, it can be observed that data for no rainfall and light rainfall accounts for 90% of the total data, with moderate to heavy rainfall comprising only 10%.
Distribution of training data from July 2022 to May 2023.
Distribution of validation data from June to July 2023.
This study evaluates the regression model’s performance using the following three metrics: 1. Mean Absolute Error (MAE), 2. Root Mean Square Error (RMSE), and 3. Coefficient of Determination (R2). The calculation methods for these evaluation metrics are given by formulas (9)–(11) as follows:
These complementary metrics provide multidimensional assessment capabilities for rainfall estimation:
The R2 metric quantifies the proportion of variance explained, characterizing the model’s ability to capture spatial distribution patterns and intensity variations of rainfall. Its bounded range [0,1] makes it particularly effective for evaluating physical mechanism modeling, especially when assessing model fitness in complex terrains (e.g. mountainous areas) or extreme precipitation events. RMSE emphasizes prediction deviations through squared terms, exhibiting heightened sensitivity to high-error values. This characteristic makes it particularly valuable for identifying prediction biases in extreme events like rainstorms, thereby providing quantitative guidance for optimizing disaster warning systems. MAE serves as a robust indicator with linear computation characteristics that reduce sensitivity to outliers. It provides stable measurement of overall deviation in daily rainfall estimation, making it ideal for balanced evaluation across different rainfall intensity scenarios. The synergistic use of these three metrics enables comprehensive diagnostic analysis of GNSS-R rainfall retrieval models, addressing different aspects of model performance through their complementary nature.
Classification result analysis
This study utilized three machine learning methods—RF, DT, and LR—to classify rainfall based on a threshold of 0.4 mm/h, using all the data from the training dataset, and validation dataset shown in Tables 2 and 3. Figure 15 presents the Pearson correlation analysis among the feature parameters. The distribution of correlation coefficients indicates that most correlations between features fall within the range of 0–0.1. These results suggest that the linear relationships among the features are weak, implying relative independence in terms of information. Such low correlation among features provides a favorable condition for subsequent model development, ensuring that highly redundant features do not appear in the model inputs, thereby enhancing the model’s interpretability, and generalization capability.

Pearson correlation plot of 32 features.
Figure 16 illustrates the Pearson correlation analysis between feature parameters and true rainfall values. It can be observed that all Pearson correlation coefficients are range from −0.2 to 0.2, indicating no strong correlation between each feature and rainfall. The correlation coefficients for LES (No. 7) and DDMA (No. 25) are the highest, at 0.15 and −0.16, respectively, which aligns with the analysis of the rainfall retrieval principles discussed in this study. The correlation coefficients for latitude (No. 20) and longitude (No. 19) are 0.1 and 0.09, respectively, suggesting that geographic location plays a significant auxiliary role in rainfall retrieval. Additionally, the correlation coefficient for mean sea level pressure (No. 23) is −0.15, while the coefficient for sea surface temperature (No. 24) is 0.13, indicating that these two environmental parameters also provide strong support for rainfall retrieval.

Pearson correlation between features and rainfall.
Figure 17 presents the confusion matrices for the rainfall classification results from the three machine learning models. Panels a, b, and c represent the classification results of the RF model, LR model, and DT model, respectively. In the matrices, 0 indicates rainless and light rain, while 1 denotes moderate to heavy rainstorm. Table 4 displays the validation results of the three machine-learning classification models on the validation set.

Confusion matrix of rainfall classification results (0: rainless, light rain, 1: moderate to heavy rainstorm): (a) RF classification results, (b) LR classification results, and (c) DT classification results.
Verification results of three classification models (result 1, compared to ERA5).
Table 4 shows the validation results of the three machine-learning classification models on the validation set. From Figure 16 and Table 4, it can be observed that all three models used in this study achieved relatively good results. Among them, the Logistic Regression (LR) model exhibited the highest accuracy on the validation set, with an Accuracy of 0.97 and a Precision of 0.80.
Analysis of rainfall intensity retrieval results
This study utilized the PFCAL-RainNet and PFCT-RainNet models for the regression retrieval of sea surface rainfall intensity during moderate to heavy rainfall, using the moderate to heavy rainfall data (>0.4 mm/h) from Tables 2 and 3 for the training and validation datasets. Figure 18(a) and (b) show the correlation results in the validation set, comparing the outputs of PFCAL-RainNet and PFCT-RainNet with the ERA5 results, respectively.

Results of verification data (compared to ERA5): (a) PFCAL-RainNet and (b) PFCT-RainNet.
To further demonstrate the effectiveness of the model after DDM image data fusion, Table 5 summarizes the performance of four regression models in the validation set (June–July 2023) for retrieving moderate to heavy rainfall. The CNN model and the CNN-LSTM model did not use DDM images as input, relying solely on the 32 one-dimensional feature parameters from FY-3E listed in Table 1.
Verification results of four rainfall regression models (compared to ERA5).
From Table 5, it can be observed that the composite CNN-LSTM model (R2 = 0.40, RMSE = 0.64, and MAE = 0.42 mm/h) outperformed the CNN model (R2 = 0.35, RMSE = 0.68, and MAE = 0.45 mm/h). However, both models that did not incorporate DDM images yielded results lower than those of the PFCAL-RainNet model (R2 = 0.48, RMSE = 0.60, and MAE = 0.40 mm/h) and the PFCT-RainNet model (R2 = 0.50, RMSE = 0.59, and MAE = 0.39 mm/h). This clearly indicates the effectiveness of DDM image fusion in enhancing model accuracy.
Our model demonstrates significant improvements in medium-to-heavy rainfall estimation (≥0.4 mm/h) compared to existing studies. While conventional approaches like those in Bu et al. 17 achieved RMSE of ∼2.0 mm/h for all rainfall intensities (>0 mm/h), our PFCT-RainNet reduces RMSE to 0.59 mm/h specifically in heavy precipitation events—an 70.5% error reduction.
The comparison between the PFCAL-RainNet and PFCT-RainNet models reveals that PFCT-RainNet exhibits slightly better generalization performance on the validation set, improving the accuracy of moderate to heavy rainfall regression tasks. This improvement can be attributed to the role of the Transformer, which excels in capturing long-range dependencies and modeling global features, as discussed below.
The PFCT-RainNet model achieves synergistic learning of multimodal data by integrating DDM with 32-dimensional GNSS-R feature parameters. Its core innovations are as follows:
DDM image feature enhancement
Preprocessing
DDM images are preprocessed using the BM3D denoising algorithm and then fed into a CNN network to extract spatial texture features (e.g. signal energy distribution, scattering patterns).
Denoising effect
The denoised DDM improves the SNR in peak regions, enabling the model to more accurately capture rainfall-induced changes in sea surface roughness (e.g. DDMA decrease, LES increase).
Global dependency modeling
The self-attention mechanism in the Transformer further resolves global dependencies within DDM, such as identifying energy diffusion patterns in the delay-Doppler domain caused by rainfall.
Heterogeneous data fusion strategy
Feature encoding
One-dimensional features (e.g. DDMA, LES, WS) are encoded into high-dimensional vectors via fully connected layers and fused with CNN-Transformer features from DDM at a feature concatenation layer.
Performance validation
Experimental results demonstrate that fusing DDM images improves the model’s R2 by 0.10 compared to the CNN-LSTM model (see Table 5), confirming the effectiveness of multimodal learning for rainfall retrieval.
Figures 19 and 20 display the spatial distribution of absolute errors between PFCT-RainNet and PFCAL-RainNet predictions and ERA5 for June and July 2023. PFCT-RainNet demonstrates optimal performance in the Intertropical Convergence Zone (ITCZ) and Western Pacific Warm Pool regions (error < 0.5 mm/h), while slightly higher errors (∼1.2 mm/h) are observed in high-latitude oceans (e.g. the Southern Ocean), potentially linked to the reduced accuracy of IMERG-F data in these regions. Comparative analysis with PFCAL-RainNet reveals distinct spatial error patterns. PFCT-RainNet achieves lower errors (0.43 mm/h) in the ITCZ due to its Transformer-based ability to resolve global rainfall-induced DDM scattering patterns, whereas PFCAL-RainNet, leveraging LSTM’s temporal modeling, shows better stability (±0.3 mm/h variability) in equatorial countercurrent zones. Though PFCT-RainNet mitigates sparse GNSS-R sampling through cross-region attention, both models struggle in high-latitude oceans (∼1.2–1.35 mm/h errors), reflecting IMERG-F’s limitations in detecting solid precipitation. Notably, PFCAL-RainNet outperforms PFCT-RainNet in mid-latitude frontal systems (0.68 mm/h vs 0.73 mm/h MAE), highlighting LSTM’s advantage in tracking evolving weather dynamics. Coastal error peaks (∼1.5 mm/h) persist in both models, attributed to land-contaminated DDM distortions and ERA5’s resolution constraints, underscoring the need for coastline masking and higher-resolution validation data.

Global rainfall error in June 2023 (mm/h): (a) PFCAL-RainNet model and (b) PFCT-RainNet model.

Global rainfall error in July 2023 (mm/h): (a) PFCAL-RainNet model and (b) PFCT-RainNet model.
Comparison of GPS and BDS retrieval results
To validate the effectiveness of the classification and regression models using single GPS and single BDS data, this study utilized the GPS and BDS data from Tables 2 and 3 for training and validation. First, the rainfall classification models were trained and validated on the validation set, followed by a comparison of the results. Next, the rainfall regression models were trained and validated, comparing the performance of GPS and BDS data in the regression tasks. Table 6 presents the validation results of the three rainfall classification models using GPS and BDS data, while Table 7 shows the validation results of the two rainfall regression models under the same data conditions.
Validation results of three rainfall classification models under GPS and BDS data.
Validation results of two rainfall regression models under GPS and BDS data.
From Tables 6 and 7, it can be observed that the results of training GPS or BDS data separately do not show significant differences in accuracy of rainfall Classification or rainfall regression. In Table 7, for the PFCAL-RainNet model, the validation accuracy of the model trained on GPS L1 data is slightly better than that trained on BDS L1 data. Conversely, for the PFCT-RainNet model, the validation accuracy of the model trained on BDS L1 data is significantly better than that of the model trained on GPS L1 data. Overall, the PFCT-RainNet model trained on BDS L1 data achieves the highest validation accuracy, with an R2 of 0.46, RMSE of 0.63 mm/h, and MAE of 0.41 mm/h.
To further demonstrate the effectiveness of the model after GPS and BDS integration, comparing the results of the best PFCT-RainNet model in Tables 5 and 7, it is evident that training using the fused GPS and BDS integration data leads to significantly better model performance than the optimal results obtained from training each system data set separately. Specifically, R2 increases from 0.46 to 0.50, RMSE decreases from 0.63 to 0.59 mm/h, and MAE decreases from 0.41 to 0.39 mm/h.The improved performance through fusion lies in the fact that it combines the strengths of both GPS and BDS systems, providing more comprehensive and accurate positioning data. GPS and BDS each have different coverage areas and signal characteristics, and by fusing their data, errors, and blind spots that may exist in a single system are minimized. This enhances the reliability and completeness of the data, improving the model’s ability to generalize and significantly boosting the accuracy of rainfall prediction tasks.
Verification of IMERG-F results
This study employed IMERG-F moderate to heavy rainstorm data from June to July 2023 for additional verification of the best regression model’s (PFCT-RainNet) sea surface rainfall retrieval results. Figure 21 illustrates the correlation between the predicted rainfall from PFCT-RainNet and the IMERG-F rainfall in the validation set.

Correlation between PFCT-RainNet model and IMERG-F.
The accuracy comparison between the predicted rainfall from the PFCT-RainNet model and the IMERG-F rainfall data, was discussed. The R2 is 0.39, with RMSE and MAE recorded at 0.65 and 0.43 mm/h, respectively. Although the discrepancies between the IMERG-F results and ERA5 data may have contributed to the reduction in precision, the overall trend remains similar to that of ERA5 (comparing with Table 5), further validating the effectiveness of the sea surface rainfall inversion model presented in this study.
Conclusion
This study focuses on the inversion of moderate to heavy rainfall intensity using GNSS-R. It uses GNOS-II GNSS-R Level 1 data from FY-3E to classify and regress moderate to heavy rainfall (intensities > 0.4 mm/h) under low-wind speed (<5 m/s) and low-significant wave height (<2 m) conditions. The training dataset was from July to December 2022 and January to May 2023, and the validation set was from June to July 2023. ERA5 convective rainfall rates and IMERG-F precipitation data served as ground truths.
For classification models with a 0.4 mm/h threshold, the LR model using fused GPS/BDS data performs best in rainfall classification, with an accuracy of 0.97, and a precision of 0.8.
In the regression analysis of moderate to heavy rainfall data, multimodal models PFCT-RainNet with DDM images shows the best results, with an RMSE of 0.60 mm/h, and R2 of 0.50.
Analyses of single GPS and single BDS data for classification and regression models showed no significant accuracy differences in classification and rainfall regression according to ERA5 validation. Both could independently complete rainfall retrieval tasks, and the fused GPS/BDS L1 data regression model was more superior in moderate to heavy rainfall retrieval.
The findings of this study offer a new method and perspective for rainfall prediction and meteorological research. However, the study was limited to moderate to heavy rainfall (≥0.4 mm/h), excluding light rainfall (<0.4 mm/h), which restricts the model’s applicability for comprehensive rainfall monitoring, as it does not account for light rainfall events. Furthermore, the training dataset spans a relatively short period (July 2022–May 2023), which may not fully capture seasonal variations or extreme weather conditions, potentially affecting the model’s ability to generalize across different meteorological scenarios. In the future, combining additional data sources and advanced algorithms holds the potential to further improve the accuracy and generalization ability of rainfall predictions, particularly by extending the model to include light rainfall and incorporating longer-term, more diverse datasets.
Footnotes
Acknowledgements
We would like to thank the Fengyun Satellite Remote Sensing Data Service Network for providing the FY-3E GNOS-II data, the Global Satellite Precipitation Program for the IMERG-F data, and the ECMWF website for supplying the WS, SWH, MSLL, and SST data.
Ethical considerations
Ethical approval was not required for this study.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contributions
Conceptualization Methodology: Yun Zhang, Jie Li, and Shuhu Yang; Software: Yun Zhang and Jie Li; Formal analysis: Yanling Han, Yunchang Cao and Zhonghua Hong; Writing – original draft: Jie Li; Writing – review & editing: Yanling Han, Shuhu Yang, and Yuwei Zhang; Visualization: Bo Peng.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant No. 42271335, 42176175).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The ERA5 data used in this article is available at: [https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download]. The FY-3E GNSS-R data used in this article is available at: [https://satellite.nsmc.org.cn/PortalSite/Data/Satellite.aspx]. The IMERG-F data used in this article is available at: [
].
