Introduction
Vegetables are recognized as nutrient-dense foods, rich in essential vitamins, minerals, and antioxidants (Kumar et al. Reference Kumar, Kumar and Shekhar2020). Vegetables account for approximately 35% of per capita dietary intake in China, making it the world’s largest consumer of vegetables (Dong et al. Reference Dong, Gruda, Li, Cai, Zhang and Duan2022). Weeds pose a significant challenge by competing with vegetables for sunlight, water, and nutrients (Berge et al. Reference Berge, Aastveit and Fykse2008; Hamuda et al. Reference Hamuda, Glavin and Jones2016). Manual weeding, while effective, is both labor-intensive and time-consuming (Slaughter et al. Reference Slaughter, Giles and Downey2008). The development of automated weeding technologies offers a promising solution to these challenges (Memon et al. Reference Memon, Chen, Shen, Liang, Tang, Wang, Zhou and Memon2025).
Extensive research has been conducted on machine vision technologies for weed detection (Bakhshipour et al. Reference Bakhshipour, Jafari, Nassiri and Zare2017; Gerhards et al. Reference Gerhards, Andujar Sanchez, Hamouz, Peteinatos, Christensen and Fernandez-Quintanilla2022; Pantazi et al. Reference Pantazi, Moshou and Bravo2016; Perez et al. Reference Perez, Lopez, Benlloch and Christensen2020). These technologies typically classify weed and crop features into four categories: color, shape, texture, and spectra (Chen et al. Reference Chen, Liu, Han, Jin, Wang, Kong and Yu2024; Kong et al. Reference Kong, Li, Liu, Han, Jin, Chen and Yu2024). While these methods perform well under controlled conditions, their effectiveness often diminishes in field environments due to challenges such as leaf overlap and occlusion (Jin et al. Reference Jin, Sun, Che, Bagavathiannan, Yu and Chen2022c; Tao and Wei Reference Tao and Wei2024). Furthermore, vision-based approaches rely heavily on manually designed features, which introduces subjectivity and limits robustness, especially given the high similarity between weeds and crops (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Jin et al. Reference Jin, Liu, Yang, Xie, Bagavathiannan, Hong, Xu, Chen, Yu and Chen2023).
The rapid advancements in graphics processing units (GPUs) have significantly accelerated the evolution of deep learning (Jordan and Mitchell Reference Jordan and Mitchell2015; Mahesh Reference Mahesh2020). With powerful learning and generalization capabilities, deep learning has been widely adopted for image identification (LeCun et al. Reference LeCun, Bengio and Hinton2015; Pak and Kim Reference Pak and Kim2017), speech recognition (Zhang et al. Reference Zhang, Geiger, Pohjalainen, Mousa, Jin and Schuller2018), natural language processing (Otter et al. Reference Otter, Medina and Kalita2020), and autonomous driving (Grigorescu et al. Reference Grigorescu, Trasnea, Cocias and Macesanu2020). The capacity to process massive datasets and leverage high-performance computing makes deep learning particularly well suited for deciphering, measuring, and understanding data-intensive agricultural processes (Liakos et al. Reference Liakos, Busato, Moshou, Pearson and Bochtis2018). In agriculture, deep learning has been applied to a wide range of tasks, including yield prediction (Liu et al. Reference Liu, Abbas and Noor2021), disease detection (Chung et al. Reference Chung, Huang, Chen, Lai, Chen and Kuo2016), weed detection (Grinblat et al. Reference Grinblat, Uzal, Larese and Granitto2016), crop quality (Peng et al. Reference Peng, Li, Zhou and Shao2022), species recognition (Jin et al. Reference Jin, Bagavathiannan, McCullough, Chen and Yu2022b), and more (Pantazi et al. Reference Pantazi, Moshou and Bravo2016; Sengupta and Lee Reference Sengupta and Lee2014). These advancements highlight the transformative potential of deep learning in modern agriculture, offering innovative solutions to complex challenges.
Numerous studies have been conducted on the use of deep convolutional neural networks (DCNNs) for precise weed detection (Rai et al. Reference Rai, Zhang, Ram, Schumacher, Yellavajjala, Bajwa and Sun2023; Xu et al. Reference Xu, Shu, Xie, Song, Zhu, Cao and Ni2023). For instance, Modi et al. (Reference Modi, Kancheti, Subeesh, Raj, Singh, Chandel, Dhimate, Singh and Singh2023) trained six models with varying hyperparameters to identify weeds in actively growing sugarcane (Saccharum officinarum L.) crops. Among these, DarkNet53 outperformed the other models with a high F1 score greater than 99%. Dyrmann et al. (Reference Dyrmann, Karstoft and Midtiby2016) proposed a new network, which was trained and tested on images from various datasets under different lighting conditions and soil types. This network achieved an 82% accuracy rate in classifying 22 species of weeds. The capability of deep learning for precision weed detection in turf was first reported by Yu et al. (Reference Yu, Sharpe, Schumann and Boyd2019c). Three DCNNs were trained to detect broadleaf weeds in turfgrass, with VGGNet emerging as the best-performing model, achieving both an F1 score and overall accuracy exceeding 0.99, and a recall value of 1.00. A series of additional studies have further compared and analyzed weed detection using DCNNs from various perspectives (Jin et al. Reference Jin, Bagavathiannan, McCullough, Chen and Yu2022a, Reference Jin, Bagavathiannan, McCullough, Chen and Yu2022b; Yu et al. Reference Yu, Schumann, Sharpe, Li and Boyd2019a, Reference Yu, Schumann, Sharpe, Li and Boyd2020), consistently demonstrating the potential of DCNNs in precision weed detection.
Despite significant advancements in deep learning methods for weed detection, several challenges remain. Natural environments often contain diverse weed species, ecotypes, densities, and growth stages, making it difficult to establish comprehensive weed datasets (Pei et al. Reference Pei, Sun, Huang, Zhang, Sheng and Zhang2022; Zhuang et al. Reference Zhuang, Li, Bagavathiannan, Jin, Yang, Meng, Li, Li, Wang, Chen and Yu2022). Additionally, weeds exhibit distinct appearance characteristics at different growth stages and densities, even within the same field. Direct weed detection requires the collection of a massive number of weed images, which often results in reduced robustness and generalization capabilities in detection systems. To address these challenges, this research proposes a novel deep learning method for weed detection and mapping. Vegetables are first detected using an innovative network based on the YOLOv8 architecture, and the remaining green vegetation (weeds) is subsequently segmented using image processing techniques. The objectives of this research were to (1) evaluate the performance of the improved vegetable detection network (IVD), (2) segment weeds from the background images and establish a weed mapping system for precision weeding application, and (3) evaluate the effectiveness of path planning algorithms to guide the operation of weeding actuators.
Materials and Methods
Overview
This study focuses on developing and applying the IVD network based on the YOLOv8 architecture to detect bok choy [Brassica rapa ssp. chinensis (L.) Hanelt]. Bok choy is a fast-growing leafy vegetable that is widely cultivated in Asia, particularly in China. It is valued for its short growth cycle, high nutritional content, and significant contribution to local diets. Typically, bok choy reaches maturity within 25 to 35 d after planting. In this study, bok choy plants at the 2- to 4-true leaf stage, with an average height of approximately 5 to 10 cm, were selected for image acquisition. Once bok choy was accurately detected, the remaining green vegetation in the background was identified as weeds. Image processing techniques were then employed to segment weeds from the background, with area filtering applied to eliminate potential random noise. The original images were subsequently divided into grid cells, and cells containing weeds were labeled in red to create a weed mapping system. Finally, a path planning algorithm was implemented to guide the mechanical actuators along the most efficient and shortest path for operation. The entire procedure is illustrated in Figure 1.

Figure 1. The workflow illustrating the detection and mapping process for bok choy (Brassica rapa ssp. chinensis) using the improved vegetable detection (IVD) model. Target vegetables are first identified, and the remaining green vegetation is segmented as weeds through image processing and area filtering. The processed images are divided into grid cells, with weed-containing cells marked in red to generate a distribution map. A path planning algorithm is then applied to optimize the route for weed control operations.
Image Acquisition
The images of bok choy and weeds were captured from multiple vegetable fields located in Jiangning District (approximately 31.95°N, 118.90°E) and Qixia District (approximately 32.15°N, 118.95°E) of Nanjing, Jiangsu Province, China, during May and October 2022. These fields were selected to represent diverse planting conditions and growth stages. Images were taken by a digital camera (HV1300FC, DaHeng Image, Beijing, China) with an aspect ratio of 4:3 and a resolution of 1,792 × 1,344 pixels. The camera was positioned approximately 0.6 m above the vegetable ground, operating in automatic mode for focus, exposure, and white balance settings. To ensure the diversity of the training dataset, images were collected under various lighting conditions, such as sunny, cloudy, and partly cloudy.
Training and Testing
A total of 1,500 images were annotated using the LabelImg (https://github.com/HumanSignal/labelImg) software. Rectangular bounding boxes were drawn around bok choy to generate corresponding XML label files for the dataset. The annotated images were then divided into training, validation, and testing datasets comprising 1,200 images (80%), 150 images (10%), and 150 images (10%), respectively.
Improved Vegetable Detector
The IVD network was developed by enhancing the YOLOv8 architecture. As a leading example of one-stage deep learning frameworks, YOLO architectures are widely used in real-time object detection due to their exceptional efficiency and precision (Terven et al. Reference Terven, Córdova-Esparza and Romero-González2023). YOLOv8 introduces significant advancements, making it versatile for instance segmentation, key point detection, object detection, and classification tasks (Kashyap Reference Kashyap2024).
In the YOLO architecture, the backbone is responsible for extracting key features from input images, while the neck aggregates and refines these features before passing them to the detection head (Deng et al. Reference Deng, Miao, Zhao, Yang, Gao, Zhai and Zhao2025). A slim-neck design further improves computational efficiency while preserving essential feature information. Optimizing these components is critical for enhancing both detection accuracy and speed, which are essential for real-time weed detection in agricultural environments.
Although YOLOv8 performs well in general object detection tasks, its feature extraction and detection speed require further optimization for bok choy detection, particularly to distinguish fine-grained features within cluttered field environments. To address these challenges, a novel vegetable detection network was developed with two key improvements:
-
1. The previous feature fusion layer was modified with a slimmed neck (slim-neck) module in the neck layer.
-
2. In the backbone layer, Attention Mechanism and FasterNet were referred to with the convolution to fully connected (C2f) layer replaced by the C2f-Faster-EMA module.
YOLOv8-C2f-Faster-EMA
The YOLOv8-C2f-Faster-EMA network is an enhancement of the YOLOv8 deep learning architecture (Zhu et al. Reference Zhu, Hu, Zheng, Zhou, Ge and Hong2024), and two principal items for improvement were introduced:
-
1. Efficient multi-scale attention (EMA): This component integrates multiscale feature fusion and attention mechanisms to enhance the network’s identification capabilities.
-
2. Faster Block of FasterNet: invested the Faster Block of FasterNet, which employs parallel processing, into the neck of YOLOv8 to improve the detection precision.
Figure 2 describes the optimized architecture of the network. In this research, the conception of C2f-Faster-EMA was adopted at the backbone stage based on the primitive YOLOv8 network, substituted for the original C2f module. This enhanced architecture is referred to as YOLOv8-C2f-Faster-EMA.

Figure 2. The architecture of YOLOv8-C2f-Faster-EMA. The original convolution to fully connected (C2f) modules are replaced with C2f-Faster-EMA modules to improve feature extraction and computational efficiency. Additionally, in the backbone network, the bottleneck operators in the C2f modules at stages 3, 5, 7, and 9 were hierarchically substituted with the proposed C2f-Faster-EMA units to enhance feature extraction and information flow. SPPF in the model is the abbreviation of Spatial Pyramid Pooling Fast, which is a module used for pooling operations at different scales.
Slim-Neck
The neck of a network is regularly configured between the head and backbone, serving to enhance the expressive potential of features and deliver more impactful feature information to the head part for image classification and object detection. The slim-neck module was designed to modify the neck of the network for greater efficiency. The depthwise separable convolutions (DSC) were introduced to alleviate the high computational cost associated with large-scale processing. However, this approach comes with a trade-off, leading to reduced effectiveness in feature extraction and fusion compared with standard convolutions (SC). The fusion of the SC, DSC, and the shuffle strategy, named group shuffle convolution (GSConv), was tactfully devised, uniformly exchanging local features between different channels by utilizing the shuffle convolution to transfuse information generated by the SC into DSC (Chollet Reference Chollet2017). Therefore, it is recommended that the slimmed neck be combined with the general backbone. The architecture of GSConv is illustrated in Figure 3.

Figure 3. Architecture of the group shuffle convolution (GSConv) module. The standard convolution operators in the neck module were systematically replaced with GSConv units, which are specifically designed to enhance cross-level feature fusion through a lightweight channel-spatial attention mechanism.
While the computational cost was reduced by 50% or more compared with SC, the model’s learning ability remains limited. To further enhance performance, a single-stage aggregation module based on VoVNet (VoV-GSCSP) was used to replace the neck of the model, with GSbottleneck introduced into GSConv. The slimmed neck design significantly improves inference efficiency.
The IVD network was meticulously designed with an optimized neck and backbone, implementing a targeted design based on the primary YOLOv8 architecture. The whole flowchart of this architecture is presented in Figure 4.

Figure 4. Overall architecture of the improved vegetable detection (IVD) model. The group shuffle convolution (GSConv) units were introduced for Slim-neck construction, and VoV-GSCSP modules were integrated into the You-Only-Look-Once-v8 (YOLOv8) framework. During inference, multiscale feature maps undergo channel compression via GSConv, followed by bilinear upsampling and concatenation to establish cross-resolution connections. These features are further refined through secondary GSConv filtering and final consolidation via a single-stage aggregation module based on VoVNet Volumetric Grid Spatial Cross Stage Partial (VoV-GSCSP) fusion gates. In the backbone, computational redundancy is reduced by replacing conventional bottlenecks in the convolution to fully connected (C2f) modules with Faster-EMA blocks, which apply the efficient multiscale attention (EMA) mechanisms to enhance salient spatial-frequency feature extraction.
Experiment Setup
The training and testing platform was the PyTorch v. 1.8.1 deep learning environment (https://pytorch.org; Facebook, San Jose, CA, USA) with the GPU of NVIDIA (GeForce RTX 2080 Ti). Transfer learning is usually employed to apply the knowledge gained from data in related fields to address novel, yet analogous challenges in the present domain (Weiss et al. Reference Weiss, Khoshgoftaar and Wang2016). In this research, the IVD network was pretrained on ImageNet, a large-scale dataset with more than 14 million labeled images (Deng et al. Reference Deng, Dong, Socher, Li, Li and Fei-Fei2009). During training, all layers of the network were fine-tuned on the bok choy detection dataset without freezing any backbone or neck parameters, allowing full adaptation of feature representations to the target domain. The following hyperparameters were used, in accordance with YOLOv8 default settings: a batch size of 16, momentum of 0.937, an initial learning rate of 0.01, Stochastic Gradient Descent (SGD) as the optimizer, a weight decay of 0.0005, and a training duration of 100 epochs.
Evaluation Metrics
Accuracy and efficiency are crucial for real-time applications. This research employed precision, recall, mean average precision (mAP), and giga floating-point operations per second (GFLOPS) as metrics to evaluate the model’s performance.
The network’s training and testing results were organized into a binary confusion matrix with four outcomes: true positive (TP), false positive (FP), true negative (TN), and false negative (FN) (Baldi et al. Reference Baldi, Brunak, Chauvin, Andersen and Nielsen2000).
Precision represents the ratio of correctly predicted positive instances to the total number of instances predicted as positive by the model (Prati et al. Reference Prati, Batista and Monard2011; Sokolova and Lapalme Reference Sokolova and Lapalme2009). It was calculated as:

Recall represents the proportion of correctly predicted positive instances out of all actual positive instances (Grandini et al. Reference Grandini, Bagli and Visani2020). It was calculated as:

Intersection over union (IoU) measures the ratio of the overlap between the predicted bounding box and the actual bounding box. A higher IoU indicates a more accurate prediction. It was calculated as:

While precision and recall represent distinct evaluation criteria, average precision (AP) provides a comprehensive index that considers both metrics (Everingham et al. Reference Everingham, Eslami, Van Gool, Williams, Winn and Zisserman2015). It was calculated as:

where
${\rm{p}}\left( {\rm{R}} \right){\rm{\;}}$
is the precision-recall curve, with precision plotted on the vertical axis and recall on the horizontal axis. mAP, a commonly used metric in object detection, is the average AP value across all categories. It was calculated as:

The values of mAP50 and mAP50-95 are commonly utilized as the evaluation metrics of the detection performance. The mAp50 value is defined as the value of mAP when the threshold of IoU is set to 50%, while mAP50-95 value is the average value of mAP when the IoU threshold varies from 50% to 95%. It is obvious that mAP50-95 is a more precise metric, as it considers multiple IoU thresholds.
GFLOPs is a metric that quantifies the computational resources required by a processor during the inference period. Smaller GFLOPs values indicate lower computational demands and faster inference. GFLOPs is a standard metric for evaluating the efficiency of YOLO networks.
Image Processing
Both vegetables and weeds are green, while the soil has a distinct color. Once the network detects the vegetables, the remaining pixels in the background are weeds, straw, or soil. Vegetable pixels are removed first, and the remaining green vegetation in the background is identified as weeds.
The excess green (ExG) index (Morid et al. Reference Morid, Borjali and Del Fiol2021), previously explored for weed identification (Jin et al. Reference Jin, Sun, Che, Bagavathiannan, Yu and Chen2022c; Sun et al. Reference Sun, Liu, Wang, Zhai and Yu2024), was optimized in this research to enhance weed segmentation performance. The modified ExG index is defined as:

To reduce sensitivity to varying illumination, the modified ExG index uses normalized RGB values:

The Otsu method (Otsu Reference Otsu1975) was applied to convert grayscale images into binary images. This was followed by area filtering to eliminate random noise in the background. As a result, weeds were effectively segmented from the original images.
Weed Mapping
A custom program was developed to divide the original images (1,792 × 1,344 pixels) into 48 equal grid cells measuring 224 × 224 pixels, arranged in 6 rows and 8 columns. Once the positions of the weeds were determined, the corresponding grid cell(s) were labeled as weeding area(s), and a weed map was generated.
For a weeding system equipped with a mechanical weeding machine, each grid cell represents a unit of weeding area, facilitating the integration of weed detection results with field application. In actual applications, the size of each grid cell should be equal to or slightly smaller than the mechanical actuator’s footprint. This configuration ensures that the mechanical actuators are directed only toward grid cells marked as weed infested, thereby achieving precise and efficient weeding.
Path Planning
As the weed mapping was constructed, path planning algorithms were elaborately designed to guide the mechanical actuators to cross over the grid cells to ensure the optimum route for real-time weeding. The performances of three path planning algorithms were compared and analyzed, including the Christofides algorithm (Papadimitriou and Vazirani Reference Papadimitriou and Vazirani1984), the Dijkstra algorithm (Xu et al. Reference Xu, Liu, Huang, Zhang and Luan2007), and DP (Bellman Reference Bellman1954).
-
1. The Christofides algorithm is an approximate algorithm for the traveling salesman problem on a metric space that is distance symmetric and satisfies the triangle inequality. It strikes a delicate balance between resolution quality and computational time.
-
2. The Dijkstra algorithm is targeted on the shortest path of weighted graphs by computing the nearest way between two points. The process will finally be terminated when all of the points have been visited.
-
3. DP is a method to solve the optimization problem of a multistage decision-making process, and the key point is to disassemble the entire problem into smaller ones, storing what has been computed in the procedure to reduce the computation cost (Bellman Reference Bellman1954).
For field application, the mechanical actuators are aligned with the grid cells and follow the optimal path determined by the selected path planning algorithm. To assess the performance of the path planning algorithms, execution time and the length of the planning path (measured by pixels) were analyzed and compared.
Results and Discussion
Vegetable Detection
The ablation experiment was adopted to validate the efficiency of the IVD network. The C2f-Faster-EMA module, the slim-neck module, and the complete network were evaluated against the baseline YOLOv8 network. The results of the ablation experiment are summarized in Table 1. When only the C2f-Faster-EMA module was implemented at the backbone stage to replace the original C2f module, precision increased by 0.9%, and computational costs were significantly reduced. The number of parameters, GFLOPS, and model size decreased by 23.2%, 19.7%, and 22.2%, respectively. These results demonstrate that the C2f-Faster-EMA module significantly improved computational efficiency. However, there was a slight reduction in the mAP50-95 and recall values, which decreased by 0.5% and 1.9%, respectively. This reduction can be attributed to the simplified feature extraction inherent in the lightweight backbone design. Nevertheless, given the increased precision and substantial efficiency gains, this trade-off remains acceptable for real-time field applications with limited computational resources.
Table 1. Ablation study results evaluating the impact of C2f-Faster-EMA and Slim-neck modules on detection performance and model complexity.a

a mAP50, the mean average precision at 0.5; mAP50-95, the mean average precision varies from 50% to 95%; GFLOPS, giga floating-point operations per second; YOLOv8, You-Only-Look-Once-v8; G, Giga; M, Megabyte.
The results showed that the slim-neck module, designed to achieve lightweight optimization while enhancing computational performance, also demonstrated reductions in parameters, GFLOPS, and model size. Notably, the mAP50-95 value was maintained, further validating the module’s efficiency. When both the C2f-Faster-EMA and slim-neck modules were integrated into the YOLOv8 network, a well-balanced outcome was achieved. The mAP50 value was preserved, while computational costs were effectively reduced, highlighting the synergy of these modules in improving performance.
Figure 5 illustrates the performance of the IVD model in vegetable detection under complex field conditions, including cluttered backgrounds, dense weed–vegetable overlap, and strong illumination. The model demonstrated accurate localization, high precision, and strong robustness across these challenging scenarios, confirming its suitability for real-world deployment. These qualitative results are complemented by the training performance shown in Figure 6, where the IVD model exhibits a steeper loss curve with faster convergence compared with YOLOv8, indicating more efficient optimization during training.

Figure 5. Detection results of the improved vegetable detection (IVD) model on vegetables under challenging conditions, including complex backgrounds and dense weed–vegetable clusters.

Figure 6. Training loss curve of the improved vegetable detection (IVD) model over 100 epochs. The IVD model exhibits a steeper loss curve with faster convergence compared with You-Only-Look-Once-v8 (YOLOv8), indicating more efficient optimization during training.
To clearly illustrate the processing results at each stage, the original images, along with those processed through DCNN detection, image processing, and weed mapping, are presented in Figure 7 for comparison. The images in the first row represent the original images, while those in the second row display the detection results from the IVD network, with each detected vegetable framed within a bounding box. Pixels within these bounding boxes represent vegetables and were removed, allowing the remaining green vegetation to be identified as weeds. The subsequent step involved segmentation, performed using image processing techniques, including the ExG index and area filtering algorithm, to isolate weeds from the background. The third row in Figure 7 depicts the preprocessing stage for segmentation, while the fourth row displays the segmentation approach. Weeds within vegetable crops were indirectly identified through the integration of DCNNs and image processing methods.

Figure 7. Weed mapping workflow from original images to trajectory planning. The first row shows the original images of vegetable fields. The second row displays the detection results from the improved vegetable detection (IVD) network, with vegetables highlighted by bounding boxes. The third row presents binary segmentation images generated through excess green (ExG)-based vegetation enhancement followed by Otsu thresholding. The fourth row shows the results after vegetable removal and area filtering to isolate true weed regions. The fifth row displays the generated weeding trajectories used to guide precision weed control operations.
Weed Mapping
A precise weed map was established based on the weed detection results. The original images were divided into smaller, equally sized grid cells. Cells containing weeds were marked in red, representing the designated weeding areas, while the remaining grid cells were identified as requiring no weeding. The weed mapping results are displayed in the fifth row of Figure 7. With the weeding regions clearly highlighted, this approach enhances the feasibility of practical weeding applications.
Path Planning
The path planning strategy was executed based on weed mapping results. Three path planning algorithms were carefully designed and tested for comparison and analysis. The path planning results of the four previously cited images are shown in Figure 8, while the evaluation metrics for efficiency and effectiveness are depicted in Table 2. The blue line in Figure 8 represents the weeding trajectory for a smart machine. The Dijkstra algorithm exhibited a significant advantage in computation efficiency in this experiment. For the four given images, the Dijkstra algorithm consistently produced the shortest path and required the least computation time for the weeding operation. In contrast, the Christofides algorithm performed poorly, with longer computation times and path lengths. Notably, for the third image (Figure 8C), the Christofides algorithm took 13 times longer to compute and required 216 more pixels for the weeding path compared with the Dijkstra algorithm. It is worth noting that DP showed inconsistent performance; while it required less time than the Christofides algorithm for images in Figure 8A and 8B, it took relatively more time for images in Figure 8C and 8D. In general, the Dijkstra algorithm performed exceptionally well in terms of both computing efficiency and optimal path planning.

Figure 8. Path planning results for precision weeding based on weed mapping. The blue lines represent the optimized weeding trajectories generated by different path planning algorithms (Christofides, Dijkstra, and dynamic programming [DP]) across four sample images. These results illustrate the application of trajectory optimization for efficient weed control operations.
Table 2. Performance comparison of three path planning algorithms on four sample weed maps, with execution time and shortest path length (in pixels) reported for each algorithm across four images labeled A, B, C, and D

a DP, dynamic programming.
Direct detection of different weed species, morphologies, densities, and growth stages is a challenging task, as it requires labeling a large volume of weed image data, which is both labor-intensive and time-consuming (Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019c). Additionally, collecting and labeling weed datasets is tedious, and such datasets are often nontransferable across different crops. This study proposes an efficient deep learning network based on YOLOv8 trained to detect vegetables instead of weeds. By focusing on vegetables, the approach bypasses the complexities associated with managing diverse weed characteristics.
With rising living standards, there is increasing demand for green, organic vegetables, which are grown without the use of synthetic herbicides (Rahman et al. Reference Rahman, Mele, Lee and Islam2021; Reganold and Wachter Reference Reganold and Wachter2016). In this context, smart mechanical weeding machines equipped with accurate weed detection systems offer an ideal solution for performing weeding tasks in organic vegetable crops. Effective weed detection systems aim to eliminate weeds while avoiding damage to crops. The proposed method achieved this by accurately detecting vegetable crops and excluding them from the weeding process, ensuring precision in weed control.
The YOLO series of deep learning architectures is widely recognized for its efficiency in object detection and adaptability to diverse tasks (Badgujar et al. Reference Badgujar, Poulose and Gan2024). The IVD was developed based on the YOLOv8 architecture, with enhancements such as the C2f-Faster-EMA module in the backbone stage and an improved feature fusion with a slim-neck at the neck stage. Ablation experiment results showed reduced computation costs, with parameters reduced by 0.57 million, model size by 1.1 MB, and GFLOPs by 1.8 compared with the original YOLOv8 network. This optimization makes the network more lightweight while maintaining excellent detection precision, making it highly suitable for real-time weeding applications.
Some bounding boxes generated by the trained network were observed to partially or completely overlap, as clearly illustrated in row 3 of Figure 7. This overlap often occurs when vegetables are closely spaced, potentially reducing the recall value. However, this issue has minimal impact on final weed detection, because the vegetables are accurately identified within the bounding boxes and excluded before weed segmentation through image processing methods.
The attention mechanism is commonly employed to enhance the processing of sequential data (Hassanin et al. Reference Hassanin, Anwar, Radwan, Khan and Mian2024). EMA, a novel and highly efficient attention mechanism, captures both channel and spatial information simultaneously, improving feature representation without increasing computational costs (Marsella and Gratch Reference Marsella and Gratch2009). FasterNet is recognized for its high processing speed, owing to its use of partial convolution to reduce redundant computations and memory access (Chen et al. Reference Chen, Kao, He, Zhuo, Wen, Lee and Chan2023). When EMA and the Faster Block of FasterNet are combined, overall efficiency is significantly boosted. This improvement was clearly demonstrated in the ablation experiment, where only the C2F-Faster-EMA module was integrated.
Extensive research has been conducted on detecting weeds across various crop categories, achieving outstanding detection accuracy and significantly advancing the development of precision agriculture (Peng et al. Reference Peng, Li, Zhou and Shao2022; Wang et al. Reference Wang, Zhang and Wei2019; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019b). To further utilize the detection results, weed mapping was constructed after detecting vegetables, followed by weed segmentation through image processing. The original images were systematically divided into grid cells, with only those containing weeds marked as weeding areas. The size of the grid cells can be tailored to the operational area of weeding actuators using weed mapping. This adaptability is crucial, as the size of weeding actuators can vary, thereby enhancing the applicability and efficiency of weeding applications.
Path planning algorithms were integrated with weed mapping to guide the mechanical actuators exclusively to the grid cells containing weeds. In this study, three path planning algorithms were evaluated, with the Dijkstra algorithm emerging as the most effective by balancing computational costs with the shortest path length. Interestingly, the performance of the DP algorithm varied across different images in terms of time consumption, likely due to its memory allocation requirements, which warrants further investigation. In contrast, the Christofides algorithm consistently generated longer paths and required more computation time than the other two algorithms. As a heuristic method based on the Hamiltonian circuit, the Christofides algorithm provides an approximate solution that, while not optimal, ensures that the loop length never exceeds 1.5 times the optimal length, even in the worst-case scenario.
In this study, path planning was creatively applied to vegetable weeding, enabling precise machine-guided weed control. These algorithms, based on weed mapping, can also be adapted for other precision weeding applications. For instance, a smart sprayer can be integrated with path planning algorithms to accurately and efficiently apply herbicides only to the grid cells containing weeds. Further investigation is required to assess the feasibility of integrating path planning and weed mapping for weed control in other cropping systems.
This research proposed an innovative system integrating weed detection, weed mapping, and path planning into a unified approach for precise weeding. Weed detection was performed indirectly by first identifying vegetables through the IVD, with the remaining green vegetation classified as weeds. The IVD demonstrated significant improvements in both precision and efficiency, achieving a 0.2 increase in mAP50 while reducing parameters, GFLOPS, and model size compared with the original YOLOv8 network. Weed mapping serves as a bridge between weed detection and precise weeding applications, effectively defining operational areas for targeted weed control. Among the three path planning algorithms evaluated, the Dijkstra algorithm emerged as the most efficient, offering the shortest weeding path with optimal computational efficiency. This proposed method provides a robust solution for precise weeding and introduces a novel approach with significant potential for broader applications in weed management.
Funding statement
This research is supported by the Weifang Science and Technology Development Plan Project (grant no. 2024ZJ1097), the Key R&D Program of Shandong Province, China (ZR202211070163), the Taishan Scholar Program, and the National Natural Science Foundation of China (grant no. 32072498).
Competing interests
The authors declare no conflicts of interest.