Abstract
Lakes are essential to global freshwater systems, supporting ecosystem services and ecological processes, but they are increasingly impacted by climate change and human activities. This study examined the long-term dynamics of Lakes Abaya and Chamo in the Ethiopian Rift Valley using Landsat images, altimetry-derived lake level data, and a machine learning method within Google Earth Engine, specifically a Multi-Index-based Random Forest (MIRF) classifier. The MIRF classifier achieved high accuracy, ranging from 97.58% to 99.13%, in extracting lake surface water. Substantial fluctuations in lake areas and lake levels were observed: Lake Abaya’s area decreased from 2000 to 2005 at a rate of 6.67 km²/year, then expanded until 2022 at the same increasing rate; Lake Chamo’s area decreased from 2000 to 2010 at a rate of 1.62 km²/year, then expanded until 2022 at a rate of 2.88 km²/year. The correlation analysis between lake areas and environmental factors such as Rainfall (RF), Temperature (TEMP), Normalized Difference Vegetation Index (NDVI), Groundwater Storage (GWS), Terrestrial Water Storage (TWS), and Soil Moisture (SM) revealed important associations. For Lake Abaya, strong correlations were identified with NDVI and TWS, suggesting that vegetation cover and terrestrial water significantly influence its area changes. In contrast, for Lake Chamo, NDVI emerged as the key driver, indicating that vegetation dynamics play a crucial role in the lake’s fluctuations. Furthermore, higher-order polynomial regression models were developed to better capture the complex relationships between lake area and water levels for both lakes. In general, this study integrates remote sensing, machine learning, and cloud computing, offering valuable insights into the lakes’ long-term characteristic and providing critical information for future water resource management strategies.
Keywords
Introduction
Lakes, as essential components of the Earth’s fresh surface water resources, play a crucial role in providing significant ecosystem services, including water supply, biodiversity support, and climate regulation (Huang et al., 2023; Yue et al., 2023). They also contribute to hydrological and ecological cycles, helping to maintain regional environmental balance (Yilmaz, 2023). However, due to ongoing adverse climate changes and substantial human interventions, the spatial and temporal distribution of lake water has undergone profound changes (Tercan & Atasever, 2021; Yue et al., 2023). This makes it imperative to conduct comprehensive studies on lake dynamics and their driving factors. The geography, geology, and hydrodynamics of lake basins significantly influence these changes. For example, sediment transport due to erosion-prone landscapes and changes in groundwater inflow/outflow due to geological characteristics are key elements that shape the hydrodynamics of lakes, ultimately affecting lake area and water volume. Understanding these processes is crucial for ensuring the sustainable management of lakes and the long-term resilience of their ecosystems.
Given the complex interplay between these factors, precise and continuous monitoring of spatial and temporal changes in surface water is essential (Abujayyab et al., 2021). Remote sensing has become a widely used tool for this purpose because it provides frequent, reliable, and easy-to-access data on lake water dynamics (Wei et al., 2020). Numerous remote sensing satellites, including Landsat (Lu et al., 2020; Yilmaz, 2023), Sentinel-2 (Yang et al., 2020), Sentinel-1 (Jiang et al., 2021; J. Wang et al., 2023), and a combination of optical (Landsat or Srntinel-2) and Sentinel-1 (Dervisoglu, 2022; Tang et al., 2022) have been employed for lake water monitoring. The Sentinel-1, a Synthetic Aperture Radar (SAR) satellite, is particularly advantageous due to its ability to penetrate cloud cover and provide high-resolution images of water surfaces under all weather conditions (Dong et al., 2021; Gómez Fernández et al., 2022; Jiang et al., 2021). While Sentinel-2 provides high-resolution optical imagery for cloud-free periods, both Sentinel-1 and Sentinel-2 are limited by their relatively short record of data, as they have only been operational for the last decade. In contrast, Landsat satellites, with a history dating back to the 1970s and a 16-day revisit period, provide an excellent option for long-term monitoring of lake water dynamics due to their extensive time series and global coverage (X. Li et al., 2019).
Satellite images, owing to their large data size, pose challenges for conventional methods in handling and analyzing data for wide spatial extents and longer time series. Consequently, a cloud computing platform, Google Earth Engine (GEE), has been introduced by Google to address these challenges. GEE provides satellite images, reanalysis products, and robust computing infrastructure (R. G. Xu et al., 2019). Numerous studies have successfully applied GEE in surface water resources studies (Albarqouni et al., 2022; Arora et al., 2023; F. Chen et al., 2017; Jiang et al., 2021; Kandekar et al., 2021; K. Li et al., 2022; Sodhi et al., 2024; Tang et al., 2022; Xia et al., 2019; R. G. Xu et al., 2019). For instance, mapping glacial lakes (F. Chen et al., 2017), estimating lake surface water temperature (Albarqouni et al., 2022), and monitoring lake surface area dynamics (Jiang et al., 2021). GEE’s ability to integrate multi-source data and its capacity to process time series from multiple satellite sensors make it particularly useful for long-term monitoring of lake systems.
In extracting water surfaces from images, determining threshold values is a time-consuming and challenging task for long time series images, due to inter-class resemblance between water and non-water classes that can lead to misclassification (Dehkordi et al., 2022; Kebede et al., 2020). Many index-based studies such as Dervisoglu (2022) resort to using affixed threshold value. Alternatively, automatic thresholding algorithms, such as Otsu’s algorithm, are widely used to determine threshold values based on the histogram of grayscale or single-band images (Yilmaz, 2023). For instance, Jiang et al. (2021) applied Otsu’s algorithm to create an automatic water extraction framework for lake monitoring. Similarly, Asfaw et al. (2020) employed Otsu’s method to extract the lake water area of Lake Ziway, Ethiopia, from 2009 to 2018 using spectral indices from Landsat ETM+ and OLI images. Additionally, some studies have employed multiple indices in combination to reduce misclassifications, although this approach is prone to subjectivity (R. Wang et al., 2020; B. Zhao, 2019; Zhou et al., 2019). However, Machine Learning (ML) techniques have emerged as powerful tools for classification, with Random Forest (RF), Support Vector Machine (SVM), and Classification and Regression Tree (CART) being widely applied (Gómez Fernández et al., 2022; C. Wang et al., 2018). These ML methods are built-in functions within the GEE platform, providing higher accuracy compared to thresholding methods (Kandekar et al., 2021; Noi Phan et al., 2020).
Besides evaluating lake water surface dynamics, it is imperative to investigate the factors driving to these changes. Factors such as climate change (Y. Li et al., 2021; VanDeWeghe et al., 2022), human interventions (Gebrehiwot et al., 2022), changes in land use/land cover, and extreme weather events (Havens et al., 2016) are identified as influential elements in the dynamism of lake water. These factors can be broadly categorized into climatic, environmental, and anthropogenic aspects. Within the climatic factors, precipitation and temperature are particularly noteworthy in shaping changes in lake area and influencing hydrological dynamics across contributing basins (X. Li et al., 2019). Hydrological variables for lake area changes are directly or indirectly dependent on climate-related factors such as basin soil moisture, groundwater storage (Vasilevskiy et al., 2022), and terrestrial water storage (Khorrami et al., 2023; Richey et al., 2015). Likewise, the Vegetation status of contributing basins is an environmental factor highly influencing lake water changes. A commonly adopted method has been used to examine the vegetation status of a given basin evaluating the time series Normalized Difference Vegetation Index.
This study aims to assess the time series lake area dynamics of Lakes Abaya and Chamo using Landsat images from 2000 to 2022. The study will employ a multi-index-based Random Forest (RF) classifier within the GEE cloud computing platform to analyze lake area changes and model the relationship between lake area and lake-level data. Furthermore, it will investigate the climatic and environmental factors influencing lake water fluctuations, contributing to better management and conservation of these vital water resources.
Study Area
The study area encompasses the Abaya and Chamo lakes, and the basin at large, a region of ecological and hydrological significance located in the southern portion of the Rift Valley of Ethiopia. Lake Abaya-Chamo basin is geographically bounded with 37°14′58.57″ to 38°38′38.02″ longitude and 5°25′1.2″ to 8°6′37.68″ latitude (Figure 1).

A map showing location and elevation of the Study area (Abaya-Chamo lake basin).
The basin exhibits a unique combination of geographical, climatic, and anthropogenic factors that influence the hydrology and ecological balance of the lakes. The region is also known for its varying tectonic activity (Bewketu, 2015; Daniel et al., 2019), and experiences a semi-arid to sub-humid climate, with distinct wet and dry seasons. Climatic variations, together with land use changes, have implications for the water levels, surface area, water quality, and overall sustainability of the lakes in the basin (Dervisoglu, 2022; Temesgen et al., 2022).
The hydro-ecological significance of the Abaya-Chamo Lake basin extends beyond its immediate vicinity. The lakes and their surrounding basin support a diverse array of flora and fauna. Additionally, the lakes play a crucial role in local agriculture, providing water f or irrigation and influencing the microclimate of the surrounding areas.
Data and Data Sources
This study emphasized on long term analysis of annual fluctuation in the surface area of Lake Abaya and Lake Chamo from the year 2000 to 2022 employing Landsat images. This investigation aimed at elucidating the causal factors inducing the fluctuations in the lake areas, with a careful examination of satellite, reanalysis and hydro-meteorological variables obtained from GEE and station-based observations. The factors responsible for the lake area fluctuation were assessed using satellite reanalysis and station-based hydro-meteorological variables. The following section describes the necessary data elements essential for the execution of this study.
Landsat 5, Landsat 7ETM+, and Landsat 8
From the GEE cloud computing and archiving platform Landsat 5, Landsat 7ETM+, and Landsat 8 OLI imageries were obtained for the Lakes Abaya and Chamo region. These Landsat images have a spatial resolution of 30 m and a 16-day revisit time. About 372 images were filtered based on specified filtering criteria such as cloud cover percentage and region of interest, spanning from 2000 to 2022.
Altimeter-derived lake level data
Lake level data for Abaya and Chamo lakes were collected from two online databases, as well as from the Ministry of Water Resources of Ethiopia, for validation purposes. The two databases used are the Global Reservoir and Lake Monitor (G-REALM) and the Database for Hydrological Time Series of Inland Waters (DAHITI). Both G-REALM and DAHITI provide satellite radar altimeter-derived surface water level data for lakes and reservoirs (K. Li et al., 2022; Schwatke et al., 2015), amalgamating data from various satellite altimetry missions to produce water level information. Specifically, for Lakes Abaya and Chamo, the missions used to generate the water level data include ENVISAT-2, ENVISAT-3, Cryosat-2, SARAL, Sentinel-3A, and Sentinel-3B.
Specifically, lake level data for Lake Abaya were obtained from both databases covering the period from 2002 to 2010 from G-REALM and from 2010 to 2022 from DAHITI. On the other hand, lake level data for Lake Chamo were obtained solely from the DAHITI database, spanning from 2002 to 2016. The sources for accessing these databases are G-REALM (https://ipad.fas.usda.gov/cropexplorer/global_reservoir/), and DAHITI (https://dahiti.dgfi.tum.de/en/virtual-stations/africa/; accessed on 16 March 2024).
Rainfall and temperature data
Rainfall and temperature data from meteorological stations in and around the Lake Abaya-Chamo basin were obtained from the Ethiopian Meteorological Institute (EMI). Remarkably, the station datasets exhibited instances of missing values. To address these gaps, satellite and reanalysis data for precipitation and temperature were employed.
For the imputation of missing rainfall values, two reanalysis products were utilized: the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS; Funk et al., 2015) and the NOAA Climate Data Record (CDR) of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN-CDR) (Nguyen et al., 2018). CHIRPS and PERSIANN-CDR have a spatial resolution of 4.8 and 24 km respectively. These reanalysis precipitation products, known for their reliability, were instrumental in filling in the gaps of station-based rainfall datasets.
Similarly, for filling missing temperature values, the ERA5_Land reanalysis data with a spatial resolution of 11.1 km was utilized. These dataset was accessed via the Climate Engine online platform, ensuring a comprehensive and consistent approach to addressing temperature gaps.
Normalized Difference Vegetation Index (NDVI)
To evaluate the vegetation status over time in the Lake Abaya-Chamo basin, the MOD13Q1 V061 dataset, derived from the MODIS Terra 250 m resolution satellite imagery, was utilized. The data was accessed via the GEE platform (Hackländer & Jung, 2021). MODIS offers advantages such as compositing techniques, including 16- or 8-day NDVI composites, which effectively mitigate cloud cover and atmospheric noise issues (Kandasamy et al., 2013). Its consistent and well-calibrated data has been widely used in studies to monitor vegetation trends (Running & Zhao, 2019). In contrast, the Landsat program has faced challenges with data continuity due to sensor transitions across different generations, requiring thorough calibration to address inconsistencies (Wulder et al., 2012).
Groundwater storage, terrestrial water storage, and soil moisture
Groundwater storage and terrestrial storage were collected from the Global Land Data Assimilation System (GLDAS 2.2) using the Catchment Land Surface Model-Data Assimilation (CLSM-DA) through the Google Earth Engine (GEE) platform, with a spatial resolution of 27.8 km. GLDAS has promising capability in monitoring groundwater storage and terrestrial water storage in data scarce regions (Arora et al., 2022; Berhanu et al., 2024). Concurrently, soil moisture information was acquired from the ERA5_Land reanalysis dataset as soil water fraction at various depths, also accessed through GEE, with a spatial resolution of 11.1 km. The average soil moisture was calculated by averaging the soil water content across the different depth levels.
Methodology
Landsat image pre-processing
Prior to applying the selected classification algorithm, Landsat image collections were pre-processed using Google Earth Engine (GEE). This process included filtering images based on cloud cover percentage less than 20%, region of interest, and acquisition dates. Images with cloud contamination were corrected using cloud masking techniques to improve data quality. Since 2002, Landsat 7 ETM+ images have experienced gaps due to the failure of the scan line corrector (SLC), resulting in missing data for about 22% of each scene. To address these gaps, both cloud masking and SLC-related missing data were filled using the focal mean function, blending the filled gaps with the original imagery to reduce impact of the interpolation on image originality.
To get one representative image for each year, median pixel values were calculated for each band, taking into account all available images for that year. Following this, spectral indices such as the Normalized Difference Water Index (NDWI), Modified Normalized Difference Water Index (MNDWI), Normalized Difference Vegetation Index (NDVI), Automatic Water Extraction Index (AWEI), and Enhanced Vegetation Index (EVI) were computed for the median image collections.
To enhance water surface extraction accuracy, multiple spectral indices were combined to address regional environmental challenges. While NDWI and MNDWI are commonly used for identifying water bodies, MNDWI performs better in separating water from built-up areas (Tew et al., 2022; H. Xu, 2006). AWEI reduces misclassification due to shadows or terrain (Feyisa et al., 2014), and NDVI and EVI help remove vegetation interference (J. L. Chen et al., 2020). The combination of these indices improves classification accuracy, especially in environments with diverse land cover, where a single index may not be sufficient (Deng et al., 2019; Zhou et al., 2019). Thus, the Classification model within GEE was then trained using all the calculated indices to effectively discriminate water surfaces from the background non-water features (Figure 2).

Flow chart to visualize the overall methodology.
Rainfall and temperature data preparation
To address the substantial number of missing values in both rainfall and temperature datasets from EMI, machine learning models, including Gradient Boosting Regression (GBR), were employed. The process began by filling rainfall gaps using the CHIRPS and PERSIANN-CDR reanalysis precipitation products as feature variables, with station rainfall records serving as the target variable. The trained GBR model was then used to predict missing rainfall data based on these reanalysis products.
For the temperature dataset, a similar approach was adopted using the ERA5_Land reanalysis data. In this case, ERA5_Land reanalysis temperature acted as the feature variable, and station temperature records were the target variable. The trained model predicted missing temperature values using ERA5_Land data.
In addition to GBR, other machine learning models, including Linear Regression (LR), K-Neighbors Regression (KNR), and Random Forest Regression (RFR), were tested to fill missing rainfall and temperature values. After comparing these models based on the mean squared error (MSE) from training and validation datasets, GBR was the best-performing model.
This integrated method, which combines station data and reanalysis products, enhances the robustness of the climate data used in the study by leveraging the strengths of multiple data sources.
Once the station-based missing values were imputed, the lumped (single value) daily areal rainfall and temperature for the basin were calculated using the Thiessen polygon method. Like other spatial interpolation methods, Thiessen Polygon is commonly employed for estimating areal rainfall and temperature (Olawoyin & Acheampong, 2017). This method is often preferred for long-term time series due to its computational simplicity, and robustness in handling sparse and uneven station network datasets (Akgül & Aksu, 2021; J. L. Chen et al., 2024).
This spatial interpolation technique was executed using ArcGIS software, allowing for the estimation of areal values based on the available station (point) data. This integrated approach, combining machine learning for missing value imputation and spatial analysis for computing areal value, enhances the accuracy and completeness of the climate data for the Lake Abaya-Chamo basin.
Random Forest classifier
Within the GEE platform, several Machine Learning classifiers including SVM, CART, RF, and XGBoost are available. However, for this particular study, the Random Forest (RF) was selected due to its widespread application in image classification. RF has emerged as one of the most extensively utilized classifier algorithms in the field of remote sensing data, as underlined by numerous studies (Ghorbanian et al., 2021; Luo et al., 2021; Matarira et al., 2022; Noi Phan et al., 2020; Oliphant et al., 2019; Qu et al., 2022; Senanayake & Yeo, 2023; Zhang et al., 2021; F. Zhao et al., 2023).
The popularity of RF can be attributed to its effective management of outliers and noises in datasets, higher performance with high-dimensional and multisource datasets, higher accuracy compared to other classifiers such as SVM and CART (J. Xu, 2021), and enhanced processing speed achieved by selecting important variables (Noi Phan et al., 2020).
Lake water extraction using multiple-index based RF
Landsat 5 TM, Landsat 7ETM+, and Landsat 8 images were utilized to extract lake water from the background non-water features. The extraction of the lake water surface was done using multiple indices by training RF classifier within GEE. These indices were water and vegetation indices including. They were calculated from Landsat images by applying the following equations.
Where Blue, Green, Red, NIR, SWIR1, SWIR2 are Blue band, Green band, Red band, Near Infrared band, Shortwave Infrared1, and Shortwave Infrared2 bands of Landsat images respectively.
Consequently, the five spectral indices, namely NDWI, MNDWI, NDVI, AWEI, and EVI, were employed as features to train the RF classifier. To train the RF classifier model, 67,100 training samples were utilized: 33,587 pixels from the water feature and 33,513 pixels from the non-water feature. The primary objective was to utilize this classifier to discriminate lake surface water from the surrounding non-water features from the images. The classification outcome provided pixels categorized as either water or non-water. Then, a selfMask function was applied to extract the water pixels, and the cumulative area of these pixels was computed using the pixelArea function within the GEE cloud computing platform. This process allowed for the precise identification and quantification of the spatial extent of water bodies in the Lake Abaya-Chamo basin.
Accuracy assessment metrics
This section illustrates the effectiveness of the proposed classification method for extracting lake water surface. At one corner of Lake Abaya, a careful digitization of high resolution Google Earth Pro images was undertaken for the years 2000, 2010, 2015, and 2022 (Figure 3). These digitized polygons included both water and non-water areas, serving as validation samples. These digitized polygons were then uploaded to the Google Earth Engine (GEE) platform to extract validation pixels for water and non-water classes. Then, the classified image and the validation pixels were employed to assess the accuracy of the classification method through pixel-by-pixel assessment based on a confusion matrix.

Validation sample polygons and extracted lake water for selected years 2000, 2010, 2015, and 2022.
Commonly used metrics derived from the confusion matrix for the evaluation of classification accuracies, specifically overall accuracy (OA), Producer accuracy (PA), user accuracy (UA), and kappa coefficient (K) were employed. These accuracy assessment metrics were automatically worked out using built-in functions within the GEE platform.
Statistical analysis
To identify variables having a strong correlation with annual area changes of Lake Abaya and Lake Chamo, linear correlation statistics was used. Pearson Correlation Coefficient (PCC) is the most appropriate measure for multiple linear regressions (MLRs; Ren et al., 2020). Oyebode (2019) also recommended choosing variables using PCC to get higher model accuracy. Therefore, the Pearson linear correlation coefficient was selected for this particular study to investigate variables strongly responsible for lake area dynamics (Dehghani et al., 2022).
Validation of altimeter-derived lake level data
The validation of G-REALM and DAHITI lake level datasets for both Lake Abaya and Lake Chamo was done using measured lake level data obtained from the Ministry of Water Resources of Ethiopia. However, it’s important to note that there was a scarcity of measured lake level data for both lakes during the study period. Specifically, for Lake Abaya, the available measured lake level data spanned from 2002 to 2015. This dataset was used to validate the G-REALM and DAHITI altimetry lake level data for Lake Abaya. Similarly, for Lake Chamo, the measured lake level data for validation were more limited, covering only three years from 2005 to 2007.
Validation of rainfall data
To fill in missing rainfall values, the GBR model was employed using CHIRPS and PERSIANN-CDR reanalysis rainfall datasets. To evaluate the relationship between station rainfall measurements and these reanalysis products, as well as the accuracy of the imputation model, we performed cross-correlation of the station rainfall data post-imputation against CHIRPS and PERSIANN-CDR. Figures 4 and 5 present the cross-validation graphs for Lake Abaya and Lake Chamo, respectively.

cross-correlation of the station rainfall data post-imputation against CHIRPS and PERSIANN-CDR for Lake Abaya.

cross-correlation of the station rainfall data post-imputation against CHIRPS and PERSIANN-CDR for Lake Chamo.
As shown in the figures above, the correlation between station rainfall and the reanalysis products is strong for both lakes. This indicates a good alignment between the datasets and demonstrates the robustness of the imputation model used.
Modeling lake area and lake level relationship
Modeling the relationship between lake area and lake level is essential for understanding the dynamics of lakes and their response to various environmental factors. To model the relationship between lake area and lake level of Lake Abaya and Lake Chamo, time series lake area extracted from Landsat images, and altimeter-derived lake level data were deployed. To understand the patterns and trends of their relationship, time series of lakes area and level were plotted and visually inspected. Based on the pattern of the data, simple linear regression model and nonlinear polynomial regression model were evaluated, and best performing models were selected comparing their coefficient of determination. Performance of the models for both lakes were evaluated using cross validation technique.
Results and Discussion
Accuracy assessment of the extracted lake water map
The lake water surface extracted from Landsat images was compared with pixel-to-pixel validation samples obtained from polygons digitized from high-resolution Google Earth Pro images. The validation was conducted for the years 2000, 2010, 2015, and 2022. This temporal coverage allows for an assessment of the method’s consistency over time. Table 1 presents the number of water and non-water samples, Overall Accuracy (OA), the ratio of correct classifications to total samples, and Kappa Coefficient (K), quantifying model performance relative to chance of both correct and incorrect predictions. User Accuracy (UA), measures true positive precision minimizing false positives, while Producer Accuracy (PA) reflects effectiveness of true positive identification. The Overall Accuracy of the extraction method is reported to be high, range from 97.58% in 2000 to 99.13% in 2022, and the Kappa Coefficients also ranging from .95 in 2000 to .98 in 2022. This indicates a high proportion of correctly classified pixels.
Confusion Matrix and Accuracy Evaluation Metrics.
The result showed that the proposed lake water extraction method has demonstrated a high degree of accuracy, suggesting that the data extracted and the applied classification method were of high quality. The reasonable discrepancy between the classified image and the validation data further supported the effectiveness of the employed method. This accuracy assessment not only validated the reliability of the results but also implied that the method is easily applicable.
Generally, the effectiveness of the method and ease of use make it a promising tool for future research in similar contexts. Researchers may consider employing this method for further studies in different regions to explore its broader applicability and contribution to the understanding of various lake environments.
Area dynamics of Lake Abaya and Lake Chamo
The analysis of temporal evolution of the surface area of Lake Abaya and Lake Chamo from 2000 to 2022 revealed distinct phases of expansion and reduction. Figure 6 illustrates annual area fluctuations of both lakes, providing valuable understanding of the dynamic nature of these lakes, driven by a complex interaction of environmental and climatic factors.

Long term annual area of Lake since 2000 to 2022: (a) annual area fluctuation plot of Lake Abaya and (b) annual area fluctuation plot of Lake Chamo.
Lake Abaya area dynamics
As shown in Figure 6(a), the area of Lake Abaya experienced a remarkable reduction between 2000 and 2005, declining from 1,094.56 km2 to a minimum of 1,066.65 km2. This decline could be attributed to various factors such as changes in precipitation, land-use patterns, or natural hydrological variability. The minimum observed area in 2005, 1,066.65 km2, represents a critical point in the dynamics of Lake Abaya. However, an abrupt expansion occurred in the subsequent years, from 2005 to 2022, indicating a significant increment in annual area of the lake. The lake area experienced a substantial rise, reaching 1,198.35 km2 by 2022 from 1,066.65 km2 in 2005. This remarkable increase amounted to 131.70 km2, signifying a substantial expansion of the lakes. Notably, this increment constituted a 12.35% expansion relative to the minimum observed lake area in 2005. The average rate of surface water area rise of Lake Abaya from 2005 to 2022 was 6.67 km2 per year.
The significant expansion of Lake Abaya after 2005 could be linked to increased rainfall in the region and hydrological factors such as changes in groundwater flow and terrestrial water storage. Studies have been reported in other studies of Rift Valley lakes, where lake expansion is often linked to changes in hydrological inputs and catchment-level processes (Gebeyehu et al., 2019).
Lake Chamo area dynamics
Likewise, the area of Lake Chamo, as shown in Figure 6(b), experienced a noticeable reduction 311.92 km2 in 2000 to 293.85 km2 in 2010. However, the lake area increased substantially after 2010, reaching 336.04 km² by 2022. Notably, this expansion constitutes a 14.36% increase relative to the minimum observed lake area in 2010, with an average rate of rise 2.88 km2 per year.
The fluctuations observed in Lake Chamo’s area align with the findings of other studies on lake dynamics in the region (Zekarias et al., 2021).
Factors influencing lake area changes
Understanding the factors influencing the changes in lake area is crucial for unraveling the complex causes of the expansion in Lake Chamo. To investigate the driving factors, several key environmental variables, including RF, TEMP, NDVI, SM, GWS, and TWS were analyzed. Figures 7 and 8 reveal the temporal trends of these variables for Lake Abaya and Lake Chamo, respectively.

Selected variables and their time series trend over the study period of Abaya Lake.

Selected variables and their time series trend over the study period (Chamo).
The investigation of factors influencing Lake Abaya’s water dynamics involved analyzing trends in RF, TEMP, NDVI, SM, GWS, and TWS. Figure 7 illustrates the trends of these variables over the study period. Analysis of time series data for Lake Abaya and its contributing catchment revealed an increasing trend in SM, GWS, and TWS (Figure 7(b), (d), and (f) respectively). These factors contribute to the expansion of the lake area by enhancing water availability. However, RF and NDVI demonstrated decreasing trends (Figure 7(a) and (e) respectively), suggesting that these factors may have inhibited the lake water availability. The increasing TEMP trend (Figure 7(c)), is counterintuitive to the lake’s expansion, as rising TEMP typically increase evapotranspiration, reducing water availability. However, despite this, the observed lake area expansion suggests that other factors may have played a dominant role in driving lake area changes, offsetting the impact of increased temperature.
In contrast, Lake Chamo showed increasing trends in RF, SM, GWS, and TWS (Figure 8(a), (b), (d), and (f) respectively), all of which positively affecting the lake’s area expansion. Similar to Lake Abaya, the NDVI for Lake Chamo showed a decreasing trend (Figure 8(c) and (e) respectively), indicating vegetation degradation in the catchment. The increase in water availability, as indicated by rising SM, GWS, and TWS, likely contributed to the lake’s expansion, despite the degrading vegetation conditions.
Both Lake Abaya and Lake Chamo exhibited similar characteristics regarding SM, GWS, TWS, and NDVI trends, indicating their potential influence on lake area increment. However, RF and TEMP exhibited contrasting trends between the two lakes. As RF decreased for Lake Abaya and increased for Lake Chamo, TEMP increase for Lake Abaya and decreased for Lake Chamo. These contrasting trends of the two lakes indicate the unique hydrological responses of each lake to environmental changes, reflecting the complex dynamics taking place in their respective ecosystems.
The vegetation degradation observed in both lakes’ catchments, indicated by the declining NDVI trend, is consistent with findings from land use and land cover change studies in the region. Studies by Ayalew et al. (2022); Gebeyehu et al. (2019); Woldeyohannes et al. (2020), and Zekarias et al. (2021) reported significant deforestation, agricultural expansion, and urbanization in the basin, leading to increased sediment inflow into the lakes. This sedimentation likely contributed to the expansion of lake areas by displacing water and raising lake levels.
Correlation analysis
The relationship between lake area and selected variables, RF, TEMP, NDVI, SM, GWS, and TWS of Lake Abaya and Lake Chamo region was examined to realize the response of lakes area to these variables (Abujayyab et al., 2021). The correlation matrices in Figures 9 and 10 provide insights into the strength and significance of the relationships between the selected variables and the lake area of both lakes. There were strong negative or positive significant relationships between the lake area and these variables (p < α = .05).

Correlation matrix (correlation coefficients with corresponding p-values) of Lake Abaya.

Correlation matrix (correlation coefficients with corresponding p-values) of Lake Chamo.
Lake Abaya
Figure 9 provides the correlation coefficients and corresponding p-values of the selected variables influencing the area of Lake Abaya. Among the variables, TEMP, NDVI, and TWS emerge as significant factors driving changes in the lake’s area. The strongest correlation, with a coefficient of .567, is observed between the lake area and TEMP, indicating a positive relationship with high statistical significance (p < .05). However, interpreting this correlation requires caution, as the positive association appears counterintuitive. Typically, an increase in temperature is expected to enhance evapotranspiration, leading to a reduction in water levels and, consequently, a decrease in the lake’s surface area (Adrian et al., 2009). The positive correlation between temperature and lake area may be attributed to the complex interplay between various hydrological factors, such as groundwater inflows, which could offset the expected evapotranspiration losses. Therefore, in this case, the rise in temperature is unlikely to be the primary driver of Lake Abaya’s surface area expansion. A negative correlation, if statistically significant, would more logically explain the influence of temperature on lake area reduction. The observed positive correlation may point to other indirect or complex interactions between temperature and hydrological processes.
The second-highest negative correlation coefficient of −.539 is found between the lake area of Abaya and NDVI of the contributing catchments. This inverse relationship (p < .05) indicates the essential role of vegetation cover in influencing the lake’s area dynamics. The decline in NDVI reflects the degradation of vegetation in the contributing catchments, which in turn leads to increased soil erosion and sediment inflow into the lake. Such sediment influx can fill the lake bed, displacing water and contributing to an apparent rise in the lake’s surface area. This process aligns with findings from other studies that emphasize the impact of land cover changes on sediment transport and its effect on lake morphology (Woldeyohannes et al., 2020).
TWS also exhibited a statistically significant positive correlation with the lake area, with a correlation coefficient of .531 (p < .05). This indicates the substantial role that changes in terrestrial water storage play in influencing the lake’s area. The rising trend of TWS in the Lake Abaya catchment during the study period suggests that the increase in terrestrial water contributed to the expansion of the lake’s surface. The positive correlation points to the interaction between terrestrial and lake water, where enhanced groundwater and soil moisture storage may have led to increased inflow, thus supporting the rise in lake area (Zekarias et al., 2021).
Overall, NDVI, and TWS have the most significant and direct influences on the surface area dynamics of Lake Abaya, while other variables RF, GWS, and SM exhibit weaker correlation implying more minor influence on the area dynamics.
Lake Chamo
Based on the correlation matrix shown in Figure 10, the correlation between lake area and the factors influencing was analyzed. A strong inverse correlation was observed between the Lake Abaya’s area and the NDVI of its contributing catchment, with a correlation coefficient of −.431, which is statistically significant (p < .05). This reveals the critical role of vegetation cover in shaping the lake’s area dynamics. The negative correlation indicates that as NDVI decreases, indicating vegetation degradation, and surface area of lake increases. This finding is consistent with studies that underscore the impact of vegetation decline on lake ecosystems through its effects on erosion and sedimentation (Fernández et al., 2007; Woldeyohannes et al., 2020).
The decreasing trend in NDVI suggests vegetation degradation in the catchment areas surrounding Lake Chamo, similar to the dynamics observed for Lake Abaya. This degradation accelerates soil erosion, resulting in an increased inflow of sediment into the lake. As sediment accumulates on the lake floor, it gradually displaces water, contributing to the apparent expansion of Lake Chamo’s surface area. This process emphasizes the interconnectedness between land cover changes, erosion, sediment transport, and lake morphology. Getaneh et al. (2024) pointed out the continuous vegetation degradation and land use changes, causing accelerate sedimentation rates, which can substantially alter the surface area and volume of lakes.
RF, TEMP, GWS, TWS, and SM exhibited statistically insignificant correlations with the lake’s surface area. Though these variables did not show a strong relationship in this analysis, it is important to recognize that they may still be relevant in specific contexts or when interacting with other factors not included in this study.
In comparison to Lake Abaya, where both NDVI and TWS were identified as major drivers of surface area dynamics, NDVI was the only statistically significant factor affecting Lake Chamo’s surface area. This difference may reflect the varying environmental and hydrological conditions between the two lakes, as well as the unique characteristics of their respective catchments.
Lake level validation
Before addressing the relationship between lake area and lake level, the accuracy of altimeter-derived lake level data was assessed. Validation accuracy for Lake Abaya was 0.88, and for Lake Chamo, it was 0.69, indicating a high correlation coefficient as depicted in Figure 11(a) and (b) respectively.

Validation of lake level data: (a) altimeter lake level versus measured lake level and their coefficient of determination for Lake Abaya and (b) altimeter lake level versus measured lake level and their coefficient of determination for Lake Chamo.
Filling missing lake level values of Lake Chamo
To overcome the scarcity of lake level data for Lake Chamo, the lake level data of Lake Abaya was used to estimate the lake level data of Lake Chamo through a linear regression model. The plot showing the lake level data of Abaya against that of Chamo of the original altimeter lake level dataset, as shown in the Figure 12(a), yielded an R2 value of .91, indicating a strong correlation between the two.

The relationship of lake level of Abaya and Chamo: (a) before filling the missing lake level values of Chamo and (b) after filling the missing lake level values of Chamo.
The missing data periods in the Lake Chamo dataset were then filled using the regression function, resulting in an improved R2 value of .98 after correction by substituting missing values with predicted values as presented in Figure 12(b). Given that Lake Abaya and Lake Chamo share a common basin and have similar hydro-meteorological settings, a linear regression model was deployed to relate lake level of the two lakes.
The relationship of lake area and lake level
The relationship between lake area and lake level is a fundamental aspect of understanding the dynamics of freshwater ecosystems. In Figure 13, the time series and correlation plots of lake area and lake level for Abaya and Chamo lakes is presented.

(a) Time series of area and level of Lake Abaya, (b) correlation of area and level of Lake Abaya, (c) time series of area and level of Lake Chamo, and (d) correlation of area and level of Lake Chamo.
Figure 13(a) and (b) depict the time series and correlation plots of lake area and lake level for Lake Abaya, respectively. The graphs illustrate a strong correlation between the two variables, indicating that changes in lake area correspond closely with fluctuations in lake level. Specifically, the expansion in the area and the rise in level of Lake Abaya exhibit a high correlation coefficient of determination (R2) of .94. This strong correlation suggests that changes in lake level significantly influence the surface area of Lake Abaya, which could be attributed to factors such as precipitation, evaporation.
Similarly, Figure 13(c) and (d) show the time series and correlation plots of the area and level of Lake Chamo, respectively. Despite the scarcity of lake level data for Chamo, covering only the period from 2002 to 2016, the correlation between lake area and lake level remains strong. The correlation coefficient of determination (R2) for the relationship between lake level and lake area of Chamo is .80, indicating a substantial association between the two variables. This suggests that, despite data limitations, changes in lake level still have a notable impact on the surface area of Lake Chamo during the observed time period.
Overall, the findings underscore the importance of monitoring both lake area and lake level to gain insights into the dynamics of freshwater systems. The strong correlations observed in both Lake Abaya and Lake Chamo indicate the interdependence between these variables and emphasize the need for continued data collection and analysis to better understand and manage these valuable natural resources.
Modeling lake area and lake level relationship
Applying polynomial regression models to assess the relationship between lake area and lake level offers valuable intuitions into the dynamics of lake water. In this discussion, the implications of employing fourth-order and third-order polynomial regression models for Lake Abaya and Lake Chamo, respectively, are explored, based on the reported modeling and cross-validation results presented in Figure 14.

(a) Fourth order polynomial model results plot for Lake Abaya, (b) Model cross validation for Lake Abaya, (c) fourth order polynomial model results plot for Lake Chamo, and (d) Model cross validation for Lake Chamo.
For Lake Abaya (see Figure 14(a) and (b)), the utilization of a fourth-order polynomial regression model produces compelling results. During the modeling phase, the fourth-order polynomial regression demonstrates a high level of explanatory power, with an R-squared (R2) value of .96. This indicates that approximately 96% of the variability in lake level can be accounted for by the model. Furthermore, the cross-validation result shows a negligible decrease in performance, with an R2 value of .95. These findings suggest that the fourth-order polynomial regression model accurately captures the nonlinear relationship between lake area and level in Lake Abaya, both in terms of modeling and predictive capabilities.
Similarly, for Lake Chamo (see Figure 14(c) and (d)), the application of a third-order polynomial regression model provides valuable intuitions into the lake’s dynamics. Despite the slightly lower R2 values compared to Lake Abaya, the third-order polynomial regression model demonstrates a strong explanatory power during the modeling phase, with an R2 value of .86. This indicates that approximately 86% of the variability in lake level can be explained by the model. The cross-validation result maintains a reputable level of performance, with an R2 value of .84. These findings suggest that the third-order polynomial regression model effectively captures the relationship between area and level in Lake Chamo, allowing for reliable predictions of lake level based on changes in area.
Conclusion
This study examined the dynamic nature of Abaya and Chamo lakes in Ethiopia’s Rift Valley using Landsat images and Altimeter-derived lake level data from 2000 to 2022 with Google Earth Engine. Significant expansions in the area, and rise in the level of the lakes were observed, highlighting the complex interaction of environmental and climatic factors.
The high accuracy achieved in lake water area extraction through the proposed MIRF classification method demonstrated the effectiveness of remote sensing techniques in monitoring lake dynamics. The overall accuracy ranged from 97.58% to 99.13%, while the Kappa Coefficient ranged from .95 to .98. Thus, this method offers a valuable tool for future studies aiming to understand and manage similar lake environments worldwide.
The results emphasized lake water surface area extraction using the MIRF classifier and the critical influence of factors such as soil moisture, groundwater storage, temperature, and vegetation status on the dynamics of these lakes. Notably, the strong correlation between lake area and NDVI stressed the ecological significance of maintaining catchment vegetation cover for both lakes. The study also elucidated the significant correlation between terrestrial water storage and lake area, indicating the interconnectedness of terrestrial and lake systems, especially in the case of Lake Abaya. Additionally, higher-order polynomial regression models representing lake area and lake level relationships were developed, yielding validation R² values of .95 and .84 for Lake Abaya and Lake Chamo, respectively.
Overall, this study contributes to the growing perception of lake dynamics and underlines the importance of continued monitoring and research efforts in freshwater resources management. By integrating remote sensing, statistical analysis, and hydrological modeling, enhances understanding of the complex interactions determining lake ecosystems and informs evidence-based conservation strategies for the long-term safeguarding of these precious natural resources.
Footnotes
Acknowledgements
The author expresses sincere gratitude to the Ethiopian Meteorology Institute (EMI) and the Ministry of Water and Energy of Ethiopia (MoWE) for providing the essential weather and lake level datasets.
Author contributions
Bewketu Assefa Mulu: Conceptualization, Data curation, methodology, software, Analysis, Investigation, visualization, writing original draft, Conceptualization, methodology, visualization, supervision, writing review and editing; Fasikaw Atanaw Zimale: Conceptualization, methodology, supervision, editing; Mulugeta Genanu Kebede: Conceptualization, methodology, supervision, editing. All authors have read and agreed to this version of the manuscript.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declartion of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
All the data used for this research work are available from the author upon request.
