Abstract
Satellite remote sensing technology has proven effective in monitoring various environmental parameters, but its efficiency in assessing shallow lakes has been limited. This study applies state-of-the-art machine and deep learning algorithms supported by classical statistic methods to analyze remote sensing data to measure chlorophyll-a (Chl-a) concentration levels. Focused on a shallow coastal lagoon, Mar Menor, this work analyzes statistically daily Sentinel 3 information behaviour and compares Machine Learning and Deep Learning techniques to enhance efficiency and accuracy data of this satellite. Convolutional Neural Networks (CNNs) stand out as a robust choice, capable of delivering excellent results even in the presence of anomalous events. Our findings demonstrate that the CNN-based approach directly utilizing satellite data yields promising results in monitoring shallow lakes, offering enhanced efficiency and robustness. This research contributes to optimizing remote sensing data to and produce a continuous information flow addressed to monitoring shallow aquatic ecosystems with potential environmental management and conservation applications.
Keywords
Introduction
Satellite Remote Sensing (SRS) is rapidly emerging as a dominant technology for monitoring diverse natural environments [24,30]. The variety of sensors on satellites allows to capture and record different physical Earth events. They producing high-resolution images and/or datasets, characterized by precise geometric accuracy and detailed radiometric information which allow the analisys of biogeophysical parameters [42,45,49] to provide a wide range of products to depict the land, oceans, and beyond [26]. Technological advancements have ushered in superior spatial and temporal resolutions, offering new service opportunities such as ESA’s Sentinels [32] and NPP VIIRS [27] products. Despite SRS provides worldwide information, still it needed to be adapted to regional or local characteristics, thus leading to discrepancies between satellite-derived metrics and actual surface parameters [48].
This situation highlights concerns about accuracy. While in situ samples are considered the gold standard due to their lower measurement errors, they are difficult to obtain, especially in marine environments. The main disadvantages of in situ sampling include significant time and financial costs for frequent data collection, the need for specialized personnel, and the limitation to measuring specific parameters. Additionally, the variation in measurement methods used by different organizations can result in inconsistencies. This is compounded by a significant gap in spatial and temporal uniformity in data collection [35]. Despite its potential precision trade-off compared to in situ approaches, remote sensing counterbalances many field methods limitations. Its cost-effectiveness and the regularity of observations render it indispensable for longitudinal water quality surveillance.

Taking advantage of the potential of SRS data we focus on the Mar Menor lagoon in Murcia, Spain, which faces significant water quality issues from various sources [40]. Situated in Murcia (Southeastern Spain), the Mar Menor is the largest coastal lagoon on the Iberian Peninsula and ranks among the biggest in Europe, spanning an area of 135 km2. Characterized by its relatively shallow depth, it averages 3.6 m and peaks at 7 m. The lagoon is separated from the Mediterranean Sea by a 22 km sandy barrier, La Manga, interspersed with several gullies. These gullies provide the lagoon with its semi-confined nature, bestowing its distinctive temperature and salinity attributes (refer to Figure 1). Beyond its environmental significance, the Mar Menor plays a crucial role in Murcia’s economy. It draws tourists, recreational enthusiasts, and fishermen alike, courtesy of its unique climatic conditions and rich natural resources. Furthermore, the Mar Menor basin, known as Campo de Cartagena (CC), spans over 1,200 km2. This extensive plain is interspersed with ephemeral streams that collect the region’s infrequent yet intense rainfalls [41]. Historically, the Mar Menor’s pristine, transparent waters symbolized its resistance to eutrophication. Yet, the last decade has witnessed the lagoon’s shift towards eutrophic tendencies [7] due, mainly, to increasing anthropogenic pressures, such as agriculture and tourism, which have led to several ecological crises and brought it to the brink of collapse. This shift owes largely to modifications in the CC’s agricultural practices, including the advent of intensive irrigation. Consequently, the lagoon has seen an influx of nutrients, particularly nitrates and other fertilizing agents, leading to an upsurge in pollution and eutrophication [15,25]. The 2016 extreme eutrophication incident is a testament to this degradation, marking a notable decline in water transparency and quality [34]. Subsequent events, such as the 2019 spike in Chl-a levels after a storm, further accentuated the lagoon’s vulnerability [39]. In light of these challenges, proactive intervention is imperative. Thus, the importance of Chl-a concentration is a pivotal metric indicative of the eutrophication status of the Mar Menor ecosystem and the developing of satellite-based monitoring system to provide a lagoon health surveillance as continuous as possible.
Numerous studies have explored the potential of Satellite Remote Sensing (SRS)-based monitoring systems for Chlorophyll-a (Chl-a) concentrations, particularly in shallow water environments. However, these environments present specific challenges. Despite significant works such as [17,31] demonstrating a correlation between Chl-a and certain spectral bands (blue, green, red, and Near-Infrared), this methodology does not adapt well to shallow waters. Moreover, this technique commonly employs the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument from the Terra and Aqua missions, which are now considered outdated. Other studies have enhanced satellite data by applying machine learning techniques, as evidenced by [1,6,28,47]. These studies validate and highlight the use of satellite data in various water environments, yet they do not propose specific models to estimate Chl-a. For instance, [1] developed algorithms for Sentinel 3 atmospheric corrections and highlight the appropriate use of Sentinel 3 in shallow lakes to analyze its quality. Moreover, [6] proposes an interesting approach, but a somewhat imprecise, using machine learning with Landsat 8 data.
Furthermore, owing to the recent notoriety of the Mar Menor, studies based on this environment, such as [5,11,19,20], provide insightful approaches using Landsat 8/9 and Sentinel 2 missions. [5,11] validate the use of remote sensing in the Mar Menor, while [19,20] offer algorithms for Chl-a estimations among other variables, employing Landsat 8/9 and Sentinel 2 or a combination thereof. Despite [20] achieved notable results, it is essential to recognize that their analysis was based on a dataset interpolated to bridge data gaps. These gaps were primarily due to the four-day revisit time of Sentinel 2, compounded by the presence of invalid images and the necessity of synchronizing with in-situ data occurrences. Employing Landsat extends this interval even further. Although Landsat 8/9, Sentinel 2, or their combination are frequently utilized for studying small and consequently shallow lakes, owing to their high resolutions, they encounter challenges due to extended periods without data, an issue that can be effectively addressed by utilizing the twin Sentinel 3 satellite, which provide daily data. The Mar Menor, despite its shallowness, is expansive. Therefore, studies conducted in environments similar to ours, such as the research on the western shallow part of Lake Erie referenced in [37], demonstrate that Sentinel 3 data is a promising candidate for analyzing cyanobacterial blooms related to Chl-a. This concept is further reinforced by [29]. However their analysis, which utilizes Sentinel 3 for monitoring small inland lakes, also encounters several challenges. These include substantial errors associated with derived remote sensing reflectances and pigment concentrations. Such complexities in modelling are crucial when considering the unique ecological context of the Mar Menor. The challenges are multifaceted, including a lack of high-quality in-situ data, spatial disparities between in-situ measurements and remotely sensed pixels, oversight of the intrinsic heterogeneity of terrestrial and marine surfaces contributing nutrients to the lagoon, and theoretical gaps relating to scale discrepancies during validation. Furthermore, there is a prevailing assumption regarding the unfettered reliability of data from satellite systems, often overlooking rigorous validation processes and comprehensive coverage spanning the spectrum of available products.
Consequently, Sentinel 3 emerges as a remarkable option for the consistent and dependable monitoring of Chl-a levels in the unique physical environment of Mar Memor and as pivotal aspect of its aquatic health research. Its data proves indispensable in Mar Menor, offering comprehensive large-scale insights. This facilitates informed decision-making processes, underpinned by robust Chl-a forecasting capabilities.
Our monitoring system proposal extends research [16] that utilizes Sentinel 3 and its OLCI instrument for Chl-a estimation based on daily data. This study highlights the necessity of evaluating the precision of in-situ data against data derived from satellites. The focus is on several in-situ measurement points (ISMPs). Considering the variable and unpredictable nature of this data, the system aims to derive a Chl-a metric from satellite data with minimal error. This approach enables stakeholders to effectively monitor water quality changes and intervene in a timely manner to prevent critical degradation. The proposed mechanism for Chl-a monitoring includes a range of models to enhance the accuracy of data obtained from the Copernicus ocean monitoring framework, with a primary focus on Chl-a data sets from twin Sentinel 3 which is able to provide daily information. Our study begins with the development and evaluation of a classical statistical model, tailored to the unique environmental conditions of Mar Menor and its inherent data variability, influenced by ecosystem-specific factors like bottom vegetation density [18]. We conduct an extensive classical statistical analysis to understand these system dynamics, which forms the basis for further data interpretation and modeling. In the initial stages, our analysis includes examining autocorrelation in predictor variables, identifying relationships between data points over specific time intervals, and eliminating variables with low model tolerance. We use the Durbin–Watson test [9] to detect autocorrelation in our regression analysis residuals. To evaluate the effectiveness of the classical model, we compare it with machine learning and deep learning approaches, determining its predictive capacity and identifying the most suitable model for our monitoring system. We also consider modifying some constraints of the classical model, focusing on incorporating more selective variables to reduce noise and enhance predictive accuracy. By integrating machine learning, deep learning, and classical statistical methods, we aim to develop an advanced model that includes both established and novel measurement points. This comprehensive approach leads to the creation of a predictive framework specifically designed for the monitoring system we envision.
The primary contributions of this study include the following:
SRS data analysis using classical statistics, discerning behaviors within such ecosystems and pruning non-essential data and variables for an optimized predictive model. Comparing machine learning (ML) and deep learning (DL) models with classical statistical approaches for estimation of chlorophyll-a (Chl-a) concentration in the Mar Menor. This comparison is based on comprehensive data sets, further refined using insights from the Durbin–Watson test. Exploration of alternative variable selection methodologies, particularly by easing the constraints imposed by the Durbin–Watson test, leading to the formulation of revamped ML and DL models based on this refined data set. Establishment of a comprehensive model for Mar Menor to estimate Chl-a which facilitates continuous monitoring of lagoon health and augments the ability to pinpoint peak Chl-a concentrations. It will provide large series to work in forecasting.
The remainder of this article is organized as follows: Section 3 details the materials and methods implemented in this study. In Section 4, we present the quantitative findings derived from all SRS-based products tailored for monitoring the Mar Menor coastal lagoon. Conclusions and potential directions for future research are outlined in Section 6.
This section comprehensively details the materials and methods employed in our research. It first describes the dataset utilized in this study, encompassing both in-situ and Satellite Remote Sensing (SRS) data. This is followed by a brief description of the various models implemented in this research.
In situ data
The data used in this study, provided by the Regional Government of Murcia (CARM), are essential for analyzing remote sensing data. These data are from August 2016, a period marking the onset of Mar Menor’s degradation, prompting CARM to initiate regular monitoring activities. The compiled dataset includes almost weekly measurements taken at various depths across twelve points in Mar Menor, designated as in situ measurement points (ISMPs). These ISMPs represent the lagoon’s heterogeneity, characterized by diverse attributes such as depth and proximity to critical landmarks like the shoreline and wadis. Table 1 details the data provided by CARM, encompassing variables like chlorophyll-a (Chl-a), turbidity, chromophoric dissolved organic matter (CDOM), oxygen, salinity, and pH. However, due to its significant correlation with anoxia episodes, our study primarily focuses on Chl-a.
Overview of the in situ monitoring data by CARM.
Overview of the in situ monitoring data by CARM.
Statistical description of the Chl-a data sourced from CARM.
The dataset is pre-filtered, which obviates the need for additional data cleaning or outlier identification in our study, thereby ensuring a higher degree of reliability. In our research methodology, we categorize data based on depth to facilitate more precise analytical clarity. As a result, our data interpretations and model fittings are tailored to specific depth intervals, thus mitigating any potential confusion. For instance, in our analysis, measurements recorded at depths ranging from 0 to 1 meter are classified under ‘depth 0’, while those from 1 to 2 meters are labeled as ‘depth 1’, and so forth. Table 2 offers an in-depth exploration of the Chl-a data characteristics, encompassing statistical metrics such as count, mean, standard deviation, and distribution percentiles.
SRS has risen in prominence as a near real-time (NRT) monitoring tool, addressing both natural and societal challenges. Its significance spans regional to global scales, supporting several global initiatives such as the Sendai Framework, Paris Agreement, and Sustainable Development Goals [21,36,46].
This paper utilizes publicly available satellite remote sensing (SRS) data from the European Copernicus Marine Service (CMS). The CMS offers extensive information on oceanic conditions at both global and regional scales. Among the various products provided by Copernicus, our study specifically concentrates on data from the Sentinel 3 A and B satellites. Both satellites are equipped with the Ocean and Land Colour Instrument (OLCI), which provides data across 21 spectral bands and boasts a maximum spatial resolution of 300 meters. The unique twin-satellite system of Sentinel 3 A and B enables daily revisits to the same region at approximately the same hour. These characteristics make Sentinel 3 exceptionally suited for daily water monitoring, as they obviate the need for image calibration at different times, while providing sufficient spatial resolution. CMS processes OLCI data to offer a range of products tailored to diverse research objectives. This study utilizes data from the Level-2 Water Full Resolution (OL_2_WFR) products, which provide surface directional reflectances. These are corrected for atmospheric effects and sun specular reflection, and include two Chlorophyll-a (Chl-a) concentrations measured in milligrams per cubic meter (mg/m3.), computed using the OC4Me and Neural Network algorithms. The Level 2 (L2) products also include other parameters derived from these spectral bands, such as Total Suspended Matter concentration (TSM) and the Diffuse Attenuation Coefficient for down-welling irradiance (KD490). While these parameters offer insights into water quality aspects like turbidity or transparency, they are beyond the initial scope of this research. In terms of Chl-a concentration, our study leveraged the Inverse Radiative Transfer Model-Neural Network (IRTM-NN) due to its capability to handle negative reflectance values, a challenge for conventional algorithms. The IRTM-NN, pre-trained by CMS, uses neural networks for efficient computation and outputs various water-inherent optical properties. For a comprehensive understanding, readers are directed to [38]. Furthermore, the OL_2_WFR ancillary information includes flag data indicating the nature of each pixel, categorizing them as land, water, snow, cloud, invalid, and others. For this study, only pixels labeled as ‘water’ were considered, with invalid entries being disregarded.
It is important to highlight that CMS publishes products based on processed time. Thus, we prioritize Non-Time Critical (NTC) files, which are released 24/48 hours post-satellite data acquisition, offering a more refined and accurate dataset compared to Near Real-Time (NRT) products. Integrating a monitoring system can begin with NRT data and be updated once the NTC dataset becomes available. The CREODIAS platform offers both a web interface and an Application Programming Interface (API) for accessing and downloading Sentinel 3A/B products, as well as data from other satellite sources [33]. Utilizing its API, our study acquired several netCDF4 files for the specified ISMPs and study dates. These files were instrumental in extracting the parameters mentioned previously: Chl-a concentrations and reflectances.
Unlike deeper waters, the challenges associated with SRS in shallow waters are multifaceted and complex, encompassing issues such as water clarity, bottom reflectance variability, and the intricate interplay of light within these environments. Thus, reflectances from shallow waters combines the water column and seabed reflectance. Also, the optical properties of shallow waters are often influenced by a higher concentration of suspended sediments, organic matter, and other particulates. These elements can scatter and absorb light differently than clear open-ocean water, leading to skewed measurements when interpreted by algorithms primarily designed for deeper waters. This blending complicates the spectral signature, making it difficult to discern and isolate particular water quality parameters and can lead to rapid changes in water quality parameters, potentially outpacing the satellite’s revisit rate and thus missing short-term but significant events.
There is also the challenge of spatial resolution. While a 300-meter resolution might be appropriate for vast open oceans, it may fail to capture the fine-scale variability present in smaller, shallow water bodies, where features like seagrass beds, coral reefs, or algal mats can drastically change over short distances. Given these challenges, it is evident that while satellites like Sentinel 3 are invaluable for broad-scale, open-ocean observations, they may fall short in delivering precise data for shallow waters. This emphasizes the imperative need to integrate satellite observations with in-situ measurements and other data sources, harnessing complementary strengths to ensure a comprehensive and robust monitoring system.
An Initial analysis indicates a moderate correlation between Sentinel 3 Chl-a values and CARM in-situ measurements (see Section 4). The correlation quality varies by ISMP, suggesting that S3 data might not be uniformly reliable across all points. Moreover, the Sentinel 3 dataset lacked identifiable Chl-a peaks, underscoring potential limitations in pinpointing algal blooms that were evident during the study period.
Working datasets
By grouping the previous dataset described, for this research, we generate distinct datasets for each ISMP, consisting of paired Sentinel 3 A/B data and corresponding in-situ measurements, organized by date. These datasets contain dimensionless reflectances for each ISMP location as input variables and in-situ Chl-a concentrations measured in mg/m3. Table 3 summarizes the number of instances for each dataset. Based on findings from [16], we excluded ISMPs 1, 5, and 9 from our study. Their exclusion was primarily due to the shallow nature of their waters and the lower quality of data derived from these locations, rendering them unsuitable for our monitoring framework.
Number of instances by ISMP and depth.
Number of instances by ISMP and depth.
In this article, we tackle Chl-a estimation through a regression analysis framework. Our primary objective is to establish a relationship between satellite-derived data and corresponding in-situ Chl-a measurements. Significantly, our exploration extends beyond traditional statistical regression methods to include modern Machine Learning (ML) models. This dual approach enables a comparative analysis of conventional statistical techniques and state-of-the-art ML methodologies, assessing their robustness and effectiveness. Focusing on the SRS data obtained from the CMS, particularly for the Mar Menor region, our goal is to determine which approach – traditional or contemporary – offers superior accuracy and predictive capability in Chl-a forecasting. To achieve this, we have carefully curated a range of both statistical and ML algorithms for thorough evaluation, as well as the ERROS AND IT PARAMETER.
Algorithms
KNN’s key advantage is its non-parametric approach, effectively handling complex, non-linear data relationships without presuming a specific data form. However, its need to retain the entire training dataset for predictions leads to high memory usage and slower performance with large datasets. The choice of distance metric and the k value are critical for its accuracy. Additionally, due to its reliance on distance calculations, feature scaling is essential for optimal performance, emphasizing the importance of preprocessing in KNN applications. This method balances flexibility in modeling with considerations for memory and computational efficiency [22]. Formally, for a single-layer MLP, the output for an input During training, MLP uses the backpropagation algorithm and optimization methods like stochastic gradient descent (SGD) to adjust weights and biases, aiming to minimize prediction errors as measured by loss functions, such as Mean Squared Error (MSE). MLPs are known for their ability to approximate any continuous function given adequate size and proper configuration, as per the universal approximation theorem. However, they risk overfitting, particularly with large networks or limited data. To counter this, regularization techniques like dropout or L2 regularization are used. The performance of MLPs is also heavily influenced by hyperparameters, including the number of layers, neurons, and learning rate, necessitating careful selection for optimal results. This approach balances MLPs’ adaptability with considerations for complexity and data adequacy [43].
The performance of the proposed models is assessed using several statistical metrics. These metrics are essential, as they provide a comprehensive understanding of the models’ estimation capabilities. The following is a description of each metric:
The algorithmic parameters have been selected through an iterative grid search using datasets described in 3.3. Also, we tested some Python functionalities to complete the analysis such as GridSearchCV from Scikit-learn library. Consequently, the optimal configurations identified for each algorithm are as follows:
Parameters not described here are initialised with the default value according to the Python Sklearn library and TensorFlow for CNN documentation.
It is important to recall that OLCI provides 21 spectral bands of information. Nevertheless, as we have introduced earlier, in shallow lakes, significant values may be observed in spectral bands not considered by this relationship, owing to factors such as the influence of the seabed [29]. Consequently, in this section, we present experiments that will enable us to develop a model highly tailored to the Mar Menor, based on a statistical adjustments.
Experimental set up
In the realm of predicting Chl-a levels, our research consisted of a methodologically rigorous four-phase experimental procedure. To ensure the robustness of each experimental phase, we consistently implemented a 3-fold cross-validation, employing the Scikit-Learn Kfold Python library as a means of validation.
Initially, a linear regression analysis was conducted on the reflectance values, which served as input variables, to predict Chl-a concentrations observed at various depths for each ISMP. In this context, the dependent variable is the Chl-a concentration, while the reflectance values act as the independent variables. The model is formally represented as follows:
Through this analysis, our objective was to elucidate the linear relationships between reflectance values and Chl-a concentrations. Additionally, by scrutinizing the coefficients
Subsequently, the ML models previously described were employed to refine the LR model for Chl-a concentrations. This exploration was conducted in two stages: initially utilizing the full set of input variables, followed by an analysis using a subset of variables, selected based on insights from the earlier linear regression analysis. By comparing the performance metrics in these two scenarios, our goal was to unravel the complex interactions among reflectance values and determine the most effective combination of variables for accurate Chl-a estimation.
Finally, an OLS analysis was conducted to identify the most pertinent features for Chl-a estimation. Diverging from our initial LR method, this phase adopted a more lenient criterion for retaining correlated variables. This approach was instrumental in ensuring that no potentially predictive features were inadvertently omitted. Ultimately, a unified model was developed, incorporating data from multiple stations. This model is meticulously designed to precisely predict Chl-a levels at stations not included in the initial training set, thereby enhancing its applicability and robustness.
LR allows to detect the presence of correlated variables that may adversely impact the predictive performance of the models. We use the
Results of the statistical technique showing the coefficients associated with the dependent variable, as well as the p -value obtained for each of them.
Results of the statistical technique showing the coefficients associated with the dependent variable, as well as the
In Table 4, several targeted variables exhibit high
After the exclusion of the specified variables, the linear regression is re-applied to ascertain that the remaining variables adequately represent the variance of the dependent variable. In this more streamlined analysis, the
Linear regression metrics (
To discern the tangible benefits of this process, we assess LR, ML and DL with and without variable selection. Table 6 illustrates that there are negligible differences in
The observed variability in results suggest that variable selection might not be critically important, given that the techniques demonstrating improved outcomes still do not surpass the performance of the best model identified as the CNN approach, irrespective of its application to the full set of variables or just the selected subset. Consequently, a more in-depth exploration into the correlation among variables is warranted, along with an effort to moderate the criteria for variable selection. The overarching aim is to judiciously eliminate non-essential input variables while ensuring minimal loss of valuable information.
Performance of ML and DL techniques using either all reflectances or those curated by classical statistics, showcasing MSE, MAE, and
metrics across depths.
Performance of ML and DL techniques using either all reflectances or those curated by classical statistics, showcasing MSE, MAE, and
Reflecting on the outcomes of previous experiments, it becomes evident that the inherent robustness of machine learning (ML) and deep learning (DL) algorithms may diminish the efficacy of stringent variable elimination criteria. Nonetheless, some studies in the literature propose that the conventional
Upon closer examination of individual ISMP, an intriguing observation was made: certain reflectances, while deemed non-significant on an aggregate level, proved to be crucial for specific stations. Consequently, the elimination of these variables, based on the rigid criteria of classical statistical methods, raised the possibility of compromising the model’s overall efficacy. To investigate this phenomenon, models were constructed with systematic adjustments to the
CNN model’s predictive performance under varied p -value thresholds for feature selection.
CNN model’s predictive performance under varied
Using the most suitable model, the CNN, and the subset of specified reflectances our aim is to develop a holistic predictive model,
Table 8 shows the
Therefore, the integrated model
Performance of the CNN-based predictive model, trained using selected reflectances (p -value < 0.3). Each ISMP’s results are based on training with all other ISMPs and testing on the ISMP in question.
Performance of the CNN-based predictive model, trained using selected reflectances (
In the context of shallow lagoon environments such as Mar Menor, satellite data requires a nuanced interpretation and application. The intrinsic characteristics of these environments, characterized by a combination of factors including water composition, proximity to the coast, and depth variability in limited areas, necessitate specialized approaches to fully leverage the insights derived from satellite data [2,13]. Although satellites offer a vast array of information, tailoring this data to specific ecosystems like Mar Menor presents distinct challenges [18].
The statistical analysis of satellite reflectance data uncovers a range of intriguing patterns and complexities. Notably, while there are discernible relationships that enable characterization of the environment through reflectance variables, these variables often exhibit anomalous behaviors. On one hand, the pronounced autocorrelations among them imply that certain reflectances could be excluded from the models without substantially impacting the information content. On the other hand, a comparative analysis with models incorporating all variables suggests that each variable plays a role and contributes to the model’s accuracy. This dilemma underscores the complex nature of these relationships and the inherent challenges in determining their true significance.
To assess the importance of these variables within the models, comprehensive comparisons were made across various techniques, including a relaxation of the criteria for selecting input variables. This involved choosing variables that were not deemed statistically significant according to different
In the realm of deploying deep learning models, which are renowned for their proficiency in high-dimensional space exploration and their ability to decipher intricate patterns, the necessity of including the entire array of reflectance variables becomes evident. Despite the inherent ability of deep learning architectures to assimilate and interpret subtle data relationships, omitting certain reflectance variables detrimentally affects the models’ capacity to achieve their theoretical performance potential, as delineated in 3. This finding is consistently observed across various modeling approaches, reinforcing the premise that each reflectance variable is integral to enhancing the accuracy and reliability of the modeling framework.
The intricacies involved in adapting satellite data to distinct environments such as the Mar Menor necessitate specialized methodologies. The complex interplay among reflectance variables challenges traditional approaches to variable selection, underscoring that all variables are instrumental to the accuracy of the models. This study highlights the criticality of meticulously considering the subtleties inherent in applying satellite data within complex ecosystems and emphasizes the need for comprehensive modeling strategies.
Conclusions and future work
This article sheds light on the substantial potential and complexities associated with utilizing the Sentinel 3 observation system for daily monitoring in shallow water environments. With a focus on the Mar Menor lagoon and its current state, this study concentrates its efforts on monitoring Chlorophyll-a concentration, a critical parameter for assessing water health quality, especially in this sensitive environment. Our research, unlike other studies focused on the Mar Menor, centers its efforts on the twin Sentinel 3 mission and its daily production of information, enabling exhaustive analysis. The lower spatial resolution of Sentinel 3, compared to other satellites used in shallow waters, is not necessarily a limitation given the extensive area of this lagoon.
While classical statistical analyses might indicate redundancy in certain reflectance variables, our comparative evaluations across diverse modeling techniques uniformly underscore the essential contribution of each variable to model accuracy. This observation highlights the multifaceted character of such data and the critical need for all-encompassing variable inclusion in complex environmental analyses against classical relation between blue and green bands. Our experiments demonstrate the remarkable aptitude of deep learning models, especially Convolutional Neural Networks (CNN), in detecting and modeling complex data patterns. Nevertheless, these models, renowned for their proficiency in managing extensive and intricate datasets, also reinforce the imperative of including a comprehensive range of variables to achieve optimal performance.
Despite the inaccuracies observed in OLCI Chl-a data, a strong correlation exists between OLCI reflectances and in-situ measurements provided by CARM. This finding underscores the efficacy of remote sensing, particularly in developing reliable methods to estimate Chl-a concentrations in the Mar Menor lagoon, a crucial aspect for ongoing monitoring and ecological assessments. This research demonstrates that, although individual algorithms exhibited variable performance across in-situ sampling points (ISMPs), the collective effort to formulate a comprehensive predictive model was successful. Among the techniques evaluated, the Convolutional Neural Network (CNN), employed in conjunction with selected reflectance variables, emerged as a particularly robust tool for estimating Chl-a in Mar Menor. This highlights its potential for broader application in similar environmental settings and for other water quality parameters.
The outcomes of this study pave the way for numerous possibilities in future research and development. Although the current study underscores the importance of incorporating all reflectance variables, subsequent research might explore advanced optimization methods, such as feature extraction or dimensionality reduction, to enhance the model’s precision. Additionally, the use of more comprehensive datasets, encompassing extended temporal periods, could significantly improve model robustness. This approach would facilitate a more effective analysis of seasonal and annual variations. Integrating the capabilities of both machine learning and deep learning could lead to the development of hybrid models, potentially setting new standards for accuracy and reliability. The methodologies and results of this study are also applicable to similar aquatic environments globally, offering wider relevance and adaptability to unique regional conditions. Given the encouraging outcomes with the CNN predictive model, future initiatives could concentrate on establishing real-time monitoring systems employing this model, enabling immediate data analysis and swifter responses to environmental changes. Consequently, this technique can be considered to be applied to other water quality parameters such as turbidity, Coloured Dissolved Organic Matter (CDOM), etc., with the aim of creating a continuous and extensive series. Large series would enable a detailed description and understanding of the relationships among these parameters on the Mar Menor and their impact in this aquatic environment, thereby facilitating the development of a reliable monitoring system.
Footnotes
Acknowledgements
This work has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101017861 as well as the Ramon y Cajal Grant RYC2018-025580-I, funded by MCIN/AEI/10.13039/501100011033, “FSE invest in your future” and “ERDF A way of making Europe”. Also in collaboration with funding from the Ministry of Universities, the Recovery, Transformation, and Resilience Plan, and the European Union-NextGenerationEU.
Conflict of interest
The authors have no conflict of interest to report.
