Introduction
Macrophytes are visible aquatic plants constituting an important environmental component; they include emergent, floating and submerged species (Chambers et al. Reference Chambers, Lacoul, Murphy and Thomaz2008). Macrophytes impact the physical structure of the littoral zone and predictably change community composition, biomass and life forms with depth (Vadeboncoeur Reference Vadeboncoeur and GE2009). They react to shifts in energy inflow, cycling of nutrients and sedimentation (Bornette & Puijalon Reference Bornette and Puijalon2009). Monitoring macrophytes is also important in the context of the widespread shift in freshwater bodies from stable clear-water ecosystems to degraded algae-dominated states (Zhang et al. Reference Zhang, Liu, Qin, Shi, Deng and Zhou2016, Wang et al. Reference Wang, Zhang, Wang and Chen2023, Qiu et al. Reference Qiu, Huang, Luo, Xiao, Shen and Xiao2025). The macrophyte index of the European Water Framework Directive is a long-term trophic status indicator (WFD 2000) and a crucial element for evaluating the ecological health of water bodies (Kumar et al. Reference Kumar, Singh, Kumar, Kumar, Bauddh, Singh, Kumar and Kumar2023).
The monitoring of macrophytes has relied on field observations using the transect method (DES 2009, Ministry of Environment of Lithuania 2013) or surveys of key points to assess the ecological condition of lakes based on deviations in taxonomic composition (Clayton & Edwards Reference Clayton, Edwards, Caffrey, Dutartre, Haury, Murphy and Wade2006, Zviedre et al. Reference Zviedre, Upena and Vizule-Kahovska2013, Reference Zviedre, Vītola, Vizule-Kahovska and Upena2015, Reference Zviedre, Vītola, Vizule-Kahovska and Upena-Rasuma2016, Ministry of Environment of Lithuania 2021). These require substantial human and financial resources.
Detailed studies of specific lakes (Sinkevičienė Reference Sinkevičienė2007, Marčiulioniene et al. Reference Marčiulioniene, Montvydiene, Kazlauskiene and Kesminas2011, Ghirardi et al. Reference Ghirardi, Bresciani, Free, Pinardi, Bolpagni and Giardino2022, Liang et al. Reference Liang, Gong, Wang, Zhao and Zhao2022, Ozolins et al. Reference Ozolins, Grinberga, Skuja and Kokorite2023, Novković et al. Reference Novković, Cvijanović, Anđelković, Mesaroš, Drešković and Radulović2024, Robran et al. Reference Robran, Kroth, Kuhwald, Schneider and Oppelt2024) or particular species (Salako et al. Reference Salako, Adebayo, Sawyerr, Adio and Jambo2016, Abeysinghe et al. Reference Abeysinghe, Milas, Arend, Hohman, Reil, Gregory and Vázquez-Ortega2019, Tiškus et al. Reference Tiškus, Vaičiūtė, Bučas and Gintauskas2023) have explored macrophyte distributions. Among these, remote sensing techniques were applied in the studies by Salako et al. (Reference Salako, Adebayo, Sawyerr, Adio and Jambo2016), Abeysinghe et al. (Reference Abeysinghe, Milas, Arend, Hohman, Reil, Gregory and Vázquez-Ortega2019), Ghirardi et al. (Reference Ghirardi, Bresciani, Free, Pinardi, Bolpagni and Giardino2022), Liang et al. (Reference Liang, Gong, Wang, Zhao and Zhao2022), Tiškus et al. (Reference Tiškus, Vaičiūtė, Bučas and Gintauskas2023), Novković et al. (Reference Novković, Cvijanović, Anđelković, Mesaroš, Drešković and Radulović2024) and Robran et al. (Reference Robran, Kroth, Kuhwald, Schneider and Oppelt2024). However, most remote sensing studies focus on single-time acquisitions or vegetation indices, overlooking seasonal variations in reflectance.
Current challenges in macrophyte ecology require advanced monitoring approaches to detect ecosystem changes in response to climate variability, anthropogenic pressure and pollution. Macrophytes are often used as bioindicators, especially for detecting heavy metals and other pollutants in aquatic environments (Farias et al. Reference Farias, Hurd, Eriksen and Macleod2018). Automatic identification and classification algorithms using remote sensing data can accelerate bioindication efforts by enabling continuous monitoring of macrophyte dynamics with high levels of development over large geographical regions, such as river basins.
To our knowledge, it has not been possible to simultaneously identify emergent, submerged and floating macrophytes in water bodies based on seasonal reflectance data, apart from an existing vegetation index-based approach (Villa et al. Reference Villa, Bresciani, Bolpagni, Pinardi and Giardino2015), the complexity of which limits its practical applicability.
Recent reviews of remote sensing of lakes (Krtalić & Krtalić Reference Krtalić and Krtalić2023, Batina & Krtalić Reference Batina and Krtalić2024, Deng et al. Reference Deng, Zhang, Pan, Yang and Gharabaghi2024) have predominantly focused on parameters such as water quality, temperature or chlorophyll-a concentrations, and they have rarely considered macrophytes as long-term indicators of ecosystem health and trophic status. Optical satellite Sentinel-2 reflectance data have been used to classify macrophyte types for specific water bodies, employing spectral signature (Albright & Ode Reference Albright and Ode2011, Tiškus et al. Reference Tiškus, Vaičiūtė, Bučas and Gintauskas2023) and machine-learning methods (Piaser & Villa Reference Piaser and Villa2023). Due to their 5-day periodicity, 10-m spatial resolution over four bands and free access, Sentinel-2 images have been useful for mapping regional water bodies and vegetation (Du et al. Reference Du, Zhang, Ling, Wang, Li and Li2016).
Optical remote sensing has a critical drawback for vegetation determination: it cannot identify vegetation under clouds. This is less essential for persistent vegetation, for which obtaining only one cloud-free image within a month is sufficient. Emergent, submerged and floating aquatic plants and the open water surface differ in solar reflectance throughout the year due to differences caused by variations in vegetation phenological stages; this provides a basis for their identification.
Our objective was to develop and validate a straightforward, reproducible algorithm for identifying and classifying emergent, submerged and floating macrophytes in lakes using Sentinel-2 data. Designed for broad applicability across lakes within the Nemunas River Basin, the algorithm combines cluster analysis with machine learning to improve ecological monitoring. We hypothesized that seasonal changes in near-infrared reflectance patterns could be used to identify and distinguish macrophyte types, despite environmental factors.
Materials and methods
Study area
Two natural lakes in Lithuania were studied: Žuvintas and Salotė. Extensive overgrowth of aquatic plants in Lake Žuvintas (Northeastern Žuvintas Biosphere Reserve) provides a challenging spatial distribution of vegetation for algorithm application and testing of remote sensing technologies for vegetation identification. Lake Žuvintas is 9.38 km2 in area, has an average ecological status (Sinkevičienė Reference Sinkevičienė2007, Paukštys Reference Paukštys2011) and is surrounded by wooded swamps, with its shores adorned by marsh plants (Zviedre et al. Reference Zviedre, Vītola, Vizule-Kahovska and Upena2015). Thirty-three macrophyte species have been identified within Lake Žuvintas, with 19 being floating or submerged. The prevailing macrophyte species is Phragmites australis, with others including Hydrilla verticillata, Potamogeton lucens, Potamogeton perfoliatus and Potamogeton compressus (Zviedre et al. Reference Zviedre, Vītola, Vizule-Kahovska and Upena2015).
Lake Salotė is 0.5 km south of Pilaitė (Vilnius) and is shallow and irregular in shape, with an area of 0.13 km2 (Krevš & Kučinskienė Reference Krevš and Kučinskienė2009). The lake has several bays and a small island in the middle. The south-western shore and the ends of the bays are swampy, whereas the eastern shore is dry.
Algorithm development
The algorithm was developed through satellite data preprocessing, clustering or label creation and building a machine-learning model, which involved searching, testing and implementing the algorithm (Fig. S1).
We utilized the Harmonized Sentinel-2 MultiSpectral Imagery (MSI) data in the visible and near-infrared bands with 10-m resolution for Lake Žuvintas (ESA 2015). We employed Level-2A products, which include Scene Classification and Atmospheric Correction applied to the Top-Of-Atmosphere (TOA) Level-1C orthoimage products (ESA 2021). This ensured that our analyses were based on satellite imagery with minimal atmospheric effects.
Initially, for Lake Žuvintas, all data were collected on cloudless days during the snow-free period in 2021: 19 April, 11 May, 18 June, 10 July, 8 September, 26 September and 31 October (Fig. S2); and in 2022: 22 March, 9 May, 25 June, 20 July, 24 August and 26 September. We used Lake Salotė reflectance data for algorithm application on 11 May and 17 August 2023. The data were resampled by Google Earth Engine and scaled by 10 000; for 2022 data, their Digital Number (DN; value) range shifted by 1000 (GEE 2022).
Subsequently, the data underwent cropping through the Google Earth Engine platform, employing a polygon mask encompassing the lake and the largest island. This mask covered an area of 9.95 km2.
The 2022 reflectance data from Lake Žuvintas and the 2023 reflectance data from Lake Salotė were employed to assess the model’s performance on independent data; this enabled the model to be examined regarding its capacity to identify macrophytes in disparate temporal and spatial contexts.
We employed clustering on the reflectance data of Lake Žuvintas to identify areas with similar seasonal reflectance dynamics due to the limited amount of verified ground-based data, acknowledging that only macrophytes and open water surfaces could be identified in the lake.
The gap-statistics metric applied on three samples of 1000 random pixels suggested the division of Lake Žuvintas into at least two clusters (Fig. S3). Ultimately, we divided Lake Žuvintas into four clusters.
The CLARA (Clustering Large Applications; Gentle et al. Reference Gentle, Kaufman and Rousseuw1991) algorithm utilized the Manhattan distance method, which is particularly effective for large datasets (Gupta et al. Reference Gupta, Gupta and Panda2019). This method extends the k-medoids (PAM; i.e. Partitioning Around Medoids) techniques, addressing computing time and RAM storage issues in data with numerous objects (Kassambara Reference Kassambara2017).
Using the Manhattan distance, the CLARA clustering algorithm with five sample sets divided the area of interest into four distinct clusters. Each pixel was represented by 49 min–max normalized reflectance values (seven bands across seven dates) used for clustering (Gopal et al. Reference Gopal, Patro and Kumar Sahu2015).
We validated the clustering results by comparing the area with satellite data and a digital raster orthophotography map of Lithuania (ORT10LT – 1:10 000) from the National Land Service of the Ministry of Environment (LSIP), using remote sensing data from Google Maps.
The clustering results became target labels for model training, for which we used machine-learning techniques. The model was trained using the Recursive Partitioning and Regression Trees (RPART) method (Strobl et al. Reference Strobl, Malley and Tutz2009), which is a robust classifier for high dimensionality (Georganos et al. Reference Georganos, Grippa, Vanhuysse, Lennert, Shimoni, Kalogirou and Wolff2018). We applied repeated cross-validation on the training set for the model evaluation and its behaviour on independent data. The training set was divided into 10 folds, and the test was carried out with five repeats.
We used only 20.2% of pixels as the foundation for model building, with equal portions of all vegetation types and open water surfaces. The training and test datasets comprised 70% and 30% of the data, respectively.
Tuning the model with a complexity parameter (cp) equal to 0.40 enhanced the accuracy of identification. The lowest value of cp corresponds with the highest accuracy value (Fig. S4). The model’s accuracy was significantly reduced using a cp higher than 0.41, but employing a minimal value resulted in a cumbersome model with numerous variables. The model’s performance was applied independently to the macrophytes of Lake Salotė.
We used data from the Phantom 4 unmanned aerial vehicle with a 12-megapixel camera that captured high-quality images with precise detail and clarity for validation of the model application. The horizontal accuracy was ±0.3 m and the vertical accuracy was ±0.1 m. The date of acquisition for Lake Salotė was 21 July 2023 and the date of acquisition for Lake Žuvintas was 24 September 2023. Additionally, we used data from Lake Želva and Lake Želvykštis (3 July 2023) and from Lake Lielukas (2 September 2023) for validation. Several flight altitudes (10–50 m) were tested to determine the altitude that provided the optimum image quality for accurately identifying aquatic vegetation in the lakes. The images were processed and analysed using ArcGIS Pro 3.3.1 software. The validation dataset contains 344 points, each representing a 10-m2 surface, corresponding to Sentinel-2 pixels from five lakes.
Results
Clusters of lake surface types
The first and second clusters predominantly consisted of emergent macrophytes, the third cluster was characterized by submerged and floating vegetation and the fourth cluster represented open water (Fig. 1). Seasonal variations in reflectance data from spring to autumn during the vegetation period enabled the differentiation of submerged and floating macrophytes into a distinct cluster (Fig. S5).

Figure 1. Lake Žuvintas surface types. The lake is divided into emergent macrophytes, submerged and floating macrophytes and open water surface groups. The surface area under open water is 42.1% of the area of interest, with submerged and floating macrophytes being distributed on 19.9% of the surface. Emergent macrophytes occupy the remaining 38.0%.
Drone data revealed that the differences between clusters were attributed to varying humidity levels. The second cluster partially covered the open water and contained less old vegetation than the first cluster (Fig. S6). We thus joined the first and second clusters as emergent macrophytes.
The seasonal dynamics of macrophytes and open water reflectance confirm that in spring and autumn the reflectance of submerged macrophytes is scarcely distinguishable from water across all spectral ranges. In contrast, emergent macrophytes were quite distinguishable during this period. The highest distinctiveness was achieved during the summer period, during which each cluster demonstrated a range of reflectance values across almost all bands (Fig. S7).
Macrophyte identification model
The model demonstrated that the month of May’s reflectance value in the near-infrared range (B8_05) of 0.063 serves as a threshold, dividing the surface into two categories: emergent macrophytes and open water with submerged and floating macrophytes. The surface of a water body with a reflectance value in the 842-nm band (B8) of less than 0.063 in May corresponded to emergent macrophytes. The probability of correctly identifying emergent macrophytes using this method was 0.96. The emergent macrophyte type constituted 33% of the training dataset (Fig. 2).

Figure 2. Macrophyte identification algorithm based on Lake Žuvintas reflectance data. For identifying emergent, floating and submerged macrophytes and open water surfaces, we needed to obtain data in the near-infrared (842 nm) band in the months of May (B8_05) and July (B8_07). Each node shows a surface type label, probability of identification and portion of the Lake Žuvintas dataset.
The rest of the pixels with the month of July’s reflectance values in the near-infrared band (B8_07) of equal to or greater than 0.066 were classified as open water with a probability of 0.93. Identifying floating and submerged macrophytes within this group had a probability of 0.86. This approach classified 34% of the training dataset as floating and submerged macrophytes and 32% as open water.
The model achieved an overall accuracy of 92.21% when applied to the 2021 dataset. Notably, only 14.14% of the 2021 data was used for training, meaning the model had prior exposure to part of this dataset. When applied to previously unseen 2022 data, the model determined vegetation types with an accuracy of 87.93%, which is slightly lower than its performance on the full 2021 dataset.
When these two matrices were compared, the most significant decrease was in the accuracy of the open water class, which dropped by 8.23%, primarily due to increased misclassification with the submerged and floating macrophytes class, which rose from 8.95% to 17.06%.
The submerged and floating macrophytes class also experienced a decline in accuracy, decreasing by 0.45%, influenced by a slight increase in misclassification with both emergent macrophytes and open water classes.
Lastly, the emergent macrophytes class demonstrated a notable accuracy reduction of 1.95%, primarily due to an increase in misclassification with the submerged and floating macrophytes class (Table 1).
Table 1. Confusion matrices of applying the model to 2021 and 2022 Lake Žuvintas reflectance data. Correct classifications are indicated in bold

Em_m = emergent macrophytes; S&F_m = submerged and floating macrophytes.
Model validation shows that the model correctly identified 81 pixels as emergent macrophytes, 184 pixels as submerged and floating macrophytes and 26 pixels as open water. However, some misclassifications occurred: 21 submerged and floating macrophytes pixels were misidentified as emergent macrophytes, and 23 open water pixels were misidentified as submerged and floating macrophytes. Based on this matrix, the model achieved F1 scores of 0.85 for emergent macrophytes, 0.88 for submerged and floating macrophytes and 0.67 for open water. These results indicate high classification accuracy for aquatic vegetation, particularly submerged and floating types, whereas lower performance for the open water class probably reflects spectral overlap with algae or transitional vegetation zones.
Applying the model to Lake Salotė reflectance data showed that the macrophyte detection algorithm identifies open water and two types of vegetation (Fig. S8). However, some errors were also revealed. For example, shallow water near the beach and the beach itself was identified as emergent macrophytes; in the area of the island, some surface areas were identified as submerged macrophytes, although this is simply the border area between emergent macrophytes and open water.
Discussion
We provide a practical and interpretable approach to macrophyte identification and classification using the near-infrared band of Sentinel-2 in May and July. This method reduces computational complexity while simplifying the classification process and maintaining strong accuracy. This study demonstrates that classification based on two temporally distinct near-infrared values is sufficient for reliable macrophyte identification and classification.
An innovation of our method is using clustering results as ground truth for supervised learning, which reduces dependence on field surveys and enables efficient dataset generation. This approach streamlines the classification process and opens new possibilities for automating macrophyte monitoring in data-scarce regions. However, although our model provides a useful tool for large-scale macrophyte detection, it does not replace in situ observations. Macrophyte indices are typically taxonomic, whereas our model identifies the presence and types of macrophytes rather than species.
The reliance on optical satellite data introduced constraints related to spatial resolution and atmospheric conditions. Additionally, the algorithm’s inability to directly detect submerged macrophytes beneath the water surface represents a significant limitation, as deeper-growing species remain undetected unless they reach or float on the surface (Vahtmäe & Kutser Reference Vahtmäe and Kutser2007). This could lead to underestimating macrophyte coverage and bias ecological assessments.
There was potential for misclassification in areas with mixed or transitional vegetation, where the spectral signatures of macrophytes may overlap with those of other land-cover types, such as algae or terrestrial vegetation. This issue is exacerbated in shallow or turbid waters, where the reflectance properties of the water column can interfere with the detection of aquatic vegetation. Furthermore, the algorithm’s performance may vary across different aquatic ecosystems, as local environmental conditions, such as water chemistry, sediment type and light availability, can influence the spectral characteristics of macrophytes. For instance, in highly turbid or eutrophic waters, the reflectance signals from macrophytes may be obscured, reducing classification accuracy.
The algorithm was also dependent on predefined lake boundaries, which do not reflect seasonal shoreline shifts and may lead to the omission or misclassification of vegetation in temporarily flooded or exposed areas.
Seasonal changes in near-infrared reflectance can nevertheless effectively distinguish between emergent, submerged and floating macrophyte types. We provide a novel approach for remote macrophyte assessment, where aquatic vegetation is often overlooked in lake remote sensing studies.
Conclusions
Seasonal reflectance values from Sentinel-2 data, specifically in the near-infrared band in May and July, were able to distinguish emergent macrophytes and submerged and floating macrophytes with high accuracy. The low data requirements of this approach make it suitable for broader application in similar temperate lakes.
This approach could facilitate more comprehensive aquatic ecosystem monitoring, but future work should focus on improving detection in variable shoreline and turbid conditions and on testing the model across diverse lake systems.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0376892925100167.
Acknowledgements
We are grateful to the Associate Editor and anonymous reviewers for providing valuable comments.
Financial support
This research was supported by the Science Promotion Fund of Vilnius University (MSF-JM-06-2023).
Competing interests
The authors declare none.
Ethical standards
Not applicable.