Introduction
Echinostoma caproni Richard, 1964 is a digenetic trematode of considerable experimental and epidemiological interest (Fried and Reddy Reference Fried and Reddy2000). In nature, E. caproni utilises a range of vertebrate and invertebrate hosts. Its natural definitive host includes the bird Falco newtoni, while Biomphalaria pfeifferi serves as an intermediate snail host (Munoz-Antoli et al. Reference Munoz-Antoli, Toledo and Esteban2013). Notably, several strains previously reported as Echinostoma revolutum in early studies have since been reclassified as E. caproni, highlighting the historical challenges of accurate species identification within this group (Kristensen and Fried Reference Kristensen and Fried1991).
The mitochondrial genome (mitogenome) provides valuable insights into the evolutionary history of parasitic flatworms due to their relatively conserved gene content and high mutation rate. The partial mitogenome of E. caproni has been deposited in GenBank (Ec-p; accession No. AP017706); however, it lacks the transfer RNA (tRNA)-Ser2 gene and has an incomplete non-coding region (NCR). In the absence of a corresponding publication, its phylomitogenomic relationships have not been investigated. In the present study, we report the nearly complete mitogenome (Ec-c) of Echinostoma caproni, which has been reconstructed and accurately annotated.
Materials and methods
Genomic data retrieved from the NCBI database and preprocessing
Whole-genome sequencing (WGS) data of E. caproni were retrieved from the ‘50 Helminth Genomes Project’ (International Helminth Genomes Consortium 2019). The Illumina paired-end reads (2 × 100 bp) were retrieved from the publicly available Sequence Read Archive (accession No. ERS055227), where each dataset (27.42 GB) had originally been generated using the Illumina HiSeq 2000 platform. A total of 24 GB of clean data per paired-end read sample was obtained after adapter removal and quality trimming (Phred score cutoff of 33) using Trimmomatic v0.39 (Bolger et al. Reference Bolger, Lohse and Usadel2014).
De novo assemblies and reconstruction of the mitogenome
We utilised the cleaned data to reconstruct the nearly complete mitogenome assembly of E. caproni in three ways. First, GetOrganelle toolkit v1.7.7.1 (Jin et al. Reference Jin, Yu, Yang, Song, dePamphilis, T-S and Li2020) was employed as the best-performing assembler, as suggested by the benchmarking assembly report (Mahar et al. Reference Mahar, Satyam, Sundar and Gupta2023). We customised the seed/label sequence databases using Trematoda RefSeq mitogenomes (Taxonomy ID: 6178). The resulting assembly was visualised and manually refined in Bandage v0.8.1 (Wick et al. Reference Wick, Schultz, Zobel and Holt2015) to resolve topological ambiguities. Second, MitoZ v3.6 (Meng et al. Reference Meng, Li, Yang and Liu2019) was used as a fast and all-in-one pipeline. For parameter selection, SPAdes v4.2.0 (Prjibelski et al. Reference Prjibelski, Antipov, Meleshko, Lapidus and Korobeynikov2020) was specified as the assembler, and the customised clade sequence from Platyhelminthes RefSeq mitogenomes (Taxonomy ID: 6157) was applied as the clade parameter. Third, as suggested by a previous study (Palmisano et al. Reference Palmisano, Farrell, Gustafson and Fitak2023), de novo mitogenome scaffolds were generated using MitoFinder v1.4.1 (Allio et al. Reference Allio, Schomaker-Bastos, Romiguier, Prosdocimi, Nabholz and Delsuc2020), with a reference mitogenome composed of Platyhelminthes RefSeq mitogenomes. Then, the largest mitochondrial scaffold (14,028 bp) from MitoFinder was used as the seed sequence for an assembly using NOVOPlasty v4.3.5 (Dierckxsens et al. Reference Dierckxsens, Mardulyn and Smits2017) with a k-mer length of 33. All assembly results were compared, and then the final mitogenome assembly was selected.
Functional annotation and visualisation
Mitogenome annotation, including the mitochondrial protein-coding genes (mPCGs) and functional RNAs, such as tRNA and ribosomal RNA (rRNA), was performed using the MITOS2 (Donath et al. Reference Donath, Juhling, Al-Arab, Bernhart, Reinhardt, Stadler, Middendorf and Bernt2019). The open reading frames (ORFs) of the mPCGs were manually confirmed using the ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) with the invertebrate mitochondrial genetic code chosen and were identified by comparison with the reported mitogenomes of the Echinostomatidae family. tRNA genes undetected by MITOS2 were recovered through BLASTn searches against the Rfam v15 database of Echinostomatidae sequences (Ontiveros-Palacios et al. Reference Ontiveros-Palacios, Cooke, Nawrocki, Triebel, Marz, Rivas, Griffiths-Jones, Petrov, Bateman and Sweeney2025). Repeat units (RUs) were predicted using Tandem Repeats Finder v4.09.1 (Benson Reference Benson1999) with default parameters. All gene organisation features were visualised using Proksee v6.0.2 (Grant et al. Reference Grant, Enns, Marinier, Mandal, Herman, Chen, Graham, Van Domselaar and Stothard2023). The Dynamic Genomic Alignment server (DiGAlign) v2.0 (Nishimura et al. Reference Nishimura, Yamada, Okazaki and Ogata2024) was employed with default parameters to compare synteny and perform alignment of genomic elements using BLASTn.
Phylomitogenomic analysis
A total of 12 mPCGs, as well as the individual nad1 and cox1 genes, were retrieved for 13 Echinostomatidae taxa (including E. caproni) from NCBI GenBank (Supplementary Table S1). Amino acid sequences were obtained in FASTA format and concatenated for the mPCGs dataset. Each dataset (concatenated mPCGs, nad1, and cox1) was aligned at the amino acid level using MAFFT (Katoh and Standley Reference Katoh and Standley2013) with default parameters. Ambiguously aligned or poorly conserved regions were manually inspected and removed in MEGA11 prior to phylogenetic analysis. Phylogenetic trees were inferred under the maximum likelihood framework in MEGA11 (Tamura et al. Reference Tamura, Stecher and Kumar2021). For each dataset, the Le and Gascuel substitution model with a discrete Gamma distribution to account for among-site rate variation (five categories) was applied. The shape parameter (α) was estimated from the data. Initial trees for the heuristic search were generated by applying the Neighbour-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the JTT model, and the topology with the highest log likelihood was selected. Branch support was assessed by bootstrap analysis with 1,000 pseudoreplicates. Two Schistosoma species served as outgroups, namely, Schistosoma mansoni and S. japonicum.
Results and discussion
Improved assembly of the nearly complete mitogenome of E. caproni
Three assembled sequences of E. caproni mitochondria were reconstructed from WGS data in three ways, as described in the ‘Materials and methods’ section. GetOrganelle toolkit (Jin et al. Reference Jin, Yu, Yang, Song, dePamphilis, T-S and Li2020), MitoZ pipeline (Meng et al. Reference Meng, Li, Yang and Liu2019), and a combination of MitoFinder (Allio et al. Reference Allio, Schomaker-Bastos, Romiguier, Prosdocimi, Nabholz and Delsuc2020) and NOVOPlasty (Dierckxsens et al. Reference Dierckxsens, Mardulyn and Smits2017) produced 14,549 bp, 14,771 bp, and 323 bp in length, respectively (Supplementary Table S2). Among these assemblies, only GetOrganelle created a circularised, long, and well-annotated mitogenome sequence, while the other assemblers produced short and linear mitogenome sequences. The assembled sequence was visually adjusted using Bandage v0.8.1 (Wick et al. Reference Wick, Schultz, Zobel and Holt2015), with sequencing depth ranging from 7.5- to 95.4-fold (Supplementary Figure S1). The nearly complete annotated mitogenome sequence was deposited in the Third Party Annotation database of GenBank (accession No. BK071757).
Since we tried to reconstruct the mitogenome from WGS data, a filtering process with mitogenome seed sequences should be included prior to assembly (Ye et al. Reference Ye, Samuels, Clark and Guo2014). From this perspective, GetOrganelle, MitoZ, and NOVOPlasty were applied. Our results are consistent with benchmarking studies of multiple assemblers on the human mitogenome, which also demonstrated that GetOrganelle is among the top-performing assemblers (Mahar et al. Reference Mahar, Satyam, Sundar and Gupta2023).
Pairwise alignment of the nearly complete mitogenome (Ec-c) with the partial mitogenome (Ec-p) of E. caproni revealed that the two sequences share a highly conserved genomic content, exhibiting over 99.6% nucleotide identity; however, tRNA-Ser2 and RUs were absent in the partial mitogenome (Supplementary Figure S2). Unfortunately, there are several reported cases in which the partial mitogenome deposited in GenBank was mistakenly treated as a complete genome, such as a reference genome and tRNA annotations (Li et al. Reference Li, Qiu, Zeng, Diao, Chang, Gao, Zhang and Wang2019), phylogenomic inference (Qian et al. Reference Qian, Zhou, Li, Wang, Miao and Hu2018), and primer design (Ran et al. Reference Ran, Zhao, Abuzeid, Huang, Liu, Sun, He, Li, Liu and Li2020). Therefore, the nearly complete mitogenome reported in this study is expected to serve as an important and reliable source of mitogenome data.
General characterisation of the nearly complete mitogenome
Two mitogenomes of E. caproni were displayed in circular maps showing 36 genes and 35 genes, respectively (Figure 1). The nearly complete mitogenome (Ec-c) of E. caproni includes 12 mPCGs (nad1–6, nad4L, cox1–3, cytb, atp6), 2 rRNA genes (rrnL, rrnS), and 22 tRNA genes, which is consistent with the mitogenomes of the other Echinostoma species (Liu et al. Reference Liu, Zhang, Liu, Chang, Su, Fu, Yue, Gao and Wang2016; Pham et al. Reference Pham, Van Quyen, Saijuntha, Doan, Le and Lawton2024) (Table 1). In particular, the nearly complete mitogenome (Ec-c) includes tRNA-Ser2 and 4 RUs. The total length of the mPCGs was 10,128 bp, which is a proportion of 69.61% of the entire mitogenome, compared to 10,143 bp (71.68%) of the partial mitogenome (Ec-p). nad5 (1,566 bp) was the longest while nad4L (273 bp) was the shortest of mPCGs. ATG and TAG were the most prevalent codons for initiation and termination, respectively. cox1, nad1, and nad5 use GTG initiation codon, similar to other Echinostoma species (Fu et al. Reference Fu, Jin, Li and Liu2019; Li et al. Reference Li, Qiu, Zeng, Diao, Chang, Gao, Zhang and Wang2019; Liu et al. Reference Liu, Zhang, Liu, Chang, Su, Fu, Yue, Gao and Wang2016; Ran et al. Reference Ran, Zhao, Abuzeid, Huang, Liu, Sun, He, Li, Liu and Li2020). The whole length of tRNA genes was 1,437 bp, with the length ranging from 58 nucleotides (trnR) to 73 nucleotides (trnS2) in the nearly complete mitogenome (Ec-c). However, the partial mitogenome (Ec-p) has tRNA genes of 1,358 bp in length, lacking trnS2 and the shortest trnE (55 nucleotides). NCR (1,256 bp) of our mitogenome (Ec-c) contains a 715-bp segment composed of four 166-bp RUs and an additional 52-bp short repeat sequence, whereas the partial mitogenome (Ec-p) includes a shorter NCR (687 bp) lacking any RUs (Table 1 and Supplementary Figure S3).

Figure 1. Circular maps of the Echinostoma caproni mitogenomes (Egyptian isolate), showing annotated features of the nearly complete mitogenome (Ec-c) and the partial mitogenome (Ec-p). CDS, coding sequence; GC, guanine-cytosine; rRNA, ribosomal RNA; RU, repeat unit; tRNA, transfer RNA.
Table 1. Gene content, sequence length, and initiation/stop codons of Echinostoma caproni mitogenomes, partial mitogenome (Ec-p), and nearly complete mitogenome (Ec-c)

a The partial mitogenome (Ec-p) starts at 423 bp while the complete mitogenome (Ec-c) begins at 1 bp.
b n.a., not available.
Comparative and phylogenetic mitogenome analyses
The genetic differences of 12 mPCGs of the mitogenomes of representative species in the Echinostomatidae family were compared at the nucleotide level (Supplementary Table S3). The sequence differences between the nearly complete mitogenome (Ec-c) of E. caproni and the other nine Echinostoma species ranged from 16.36% (E. miyagawai, Th) to 38.47% (E. hortense, Ch). Interestingly, E. hortense exhibited the greatest divergence across all individual genes. Large differences in sequences were detected in cox3 (50.8%), cox2 (48.2%), nad2 (47.73%), and nad6 (47.61%). Although the length of the cox3 was identical between Ec-c and Ec-p mitogenomes, a nucleotide sequence divergence of 0.16% was observed. The cox3 amino acid sequences of most Echinostoma species begin with ‘MS’, except for the partial mitogenome (Ec-p), which starts with ‘MI’. Phylogenetic relationships among Echinostoma species are well resolved with high nodal support (Supplementary Figure S4). Echinostoma caproni mitogenomes are closely related to three E. miyagawai strains, while E. revolutum is quite distant. Phylogenetic topologies were almost same across the concatenation of 12 mPCGs, cox1, and nad1.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0022149X25100886.
Data availability
The annotated mitogenome sequence presented in this study was deposited in the Third Party Annotation database of GenBank (accession No. BK071757).
Author contribution
KC and WGY designed the study and experiments and drafted the manuscript. KC analysed the data and performed data curation. MK and WGY reviewed the manuscript. WGY contributed to funding acquisition, supervision, and final approval of the manuscript.
Financial support
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (Ministry of Science and ICT) (grant numbers: RS-2022-NR070067 and RS-2024-00509361) (http://www.nrf.re.kr) and was partially supported by the Research Institute for Veterinary Science, Seoul National University. Additional support was provided by the New Faculty Startup Fund from Seoul National University.
Competing interests
The authors declare no competing interests.
Ethical standard
Not applicable.