Introduction

Tephritid fruit flies pose a major threat to agriculture worldwide. Around ten per cent of the 4000 species composing the Tephritidae family are considered a pest, including members of the genera Anastrepha, Bactrocera, Ceratitis, Rhagoletis and Zeugodacus [1]. To control their spread in agricultural areas, the sterile insect technique (SIT) has been the most extensively employed method and has successfully reduced crop production losses [2,3,4,5]. SIT involves the production and systematic area-wide release of sterile males. However, a significant concern in the mass production of males and their competitive ability is the presence of insect pathogens. These pathogens can lead to outbreaks in reared colonies, or can produce covert infections that have detrimental effect on the released males [6].

Insect specific viruses (ISVs) specifically infect insect hosts, mainly causing covert infections with negligible behavioural and physiological effects [7]. However, in some cases, they can result in overt infections that lead to the death of the insects [8]. Over the past decade, the increasing availability of high throughput RNA sequencing data has reshaped our understanding of the RNA virome in a range of species across kingdoms. This wealth of data has also unveiled numerous ISVs that infect true fruit fly species. In particular, two RNA viruses have been identified in Zeugodacus species, 36 in seven Bactrocera species and 13 in the Mediterranean fruit fly, Ceratitis capitata [9,10,11,12] (Table S1). These viruses can represent a threat to mass rearing facilities, but, more importantly, they can influence their host’s ecology by affecting fitness and physiology, as well the immune status [13, 14].

In insects, the cornerstone of viral immunity is represented by the small interfering RNA (siRNA) pathway [15]. However, in Aedes mosquitos, another class of small RNAs called PIWI-interacting RNAs (piRNAs) has been discovered to interfere with RNA virus replication [16,17,18]. This phenomenon has not been observed in D. melanogaster, where the piRNA pathway does not have clear antiviral role [19]. The extent of this mechanism in other dipteran species, such as true fruit flies, remains unknown. The piRNAs function by recognizing and degrading RNAs through nucleotide complementarity. The first described and primary role of piRNAs is to repress transposable elements (TEs) in germline cells [20, 21]. Within this system, some genomic island called piRNA clusters are enriched in TE sequences and undergo modulation following the mobilization of novel TEs, thereby serving as a heritable immune memory against TEs [20]. However, it has been recently described that during viral infection, fragments of viral sequences can integrate into the insect genomes creating endogenous viral elements (EVEs). The presence of EVEs has been reported for different insect genomes [22,23,24,25] and in Aedes mosquitoes, some of these EVEs produce piRNAs that reduce virus replication [18]. Viral integration is a process necessary to complete the replication of retroviruses [26]. However, integration of fragments from viral genomes can also occur for non-retroviral RNA viruses, which lack an RNA-dependant RNA polymerase in their genomes and do not go through a DNA stage during their cycle [22, 24, 25]. This step may be facilitated by the activity of a host RNA-dependent RNA polymerase [17]. If viral integration happens in the germline, the resulting non-retroviral EVE (nrEVE) will be vertically transmitted to the progeny remaining in the genome as a marker of past infections that occurred even millions of years ago [27].

This study focuses on the identification of nrEVEs within the large dipteran family of Tephritidae. Specifically, we analysed ten true fruit fly species that are of high economic interest. These species were selected as a comparative framework to gain insights into their nrEVE landscapes. Our results show that each of these species contains nrEVEs ranging from 1 to 8 in most species, except for Eutreta diana containing 22 nrEVEs. The majority of these nrEVEs derive from viruses of the Rhabdoviridae family, originate from viral regions that encoded for structural proteins and show low sequence similarity with the genomes of currently known circulating RNA viruses. Using RNA-seq data, we further analysed the transcriptional patterns of nrEVEs in a selected number of species to identify potential transcripts. Lastly, we investigated whether C. capitata nrEVEs produce piRNAs, to unravel the potential antiviral role of nrEVEs in these species.

Methods

Viral Protein Database and Tephritid Fruit Fly Genomes

To identify non-retroviral endogenous viral elements (nrEVEs) in the genomes of tephritid fruit flies, a viral database was created including 4787 protein sequences from viruses classified within the Riboviria realm, and infecting invertebrates (NCBI Virus, accessed in September 2021). In addition, protein sequences from RNA viruses recently described in arthropods were included in the database to increase the chances of finding divergent viral sequences [10, 28,29,30].

Tephritid fruit fly species were selected based on the availability of a reference genome, and their relevance as current or potential agricultural pest. Following these criteria, we investigated the presence of nrEVEs in the reference genomes of Ceratitis capitata, Eutreta diana, Ragholetis zephyria, Tephritis californica, Trupanea jonesi, and five species of Bactrocera: Bactrocera (Zeugodacus) cucurbitae, Bactrocera dorsalis, Bactrocera latifrons, Bactrocera oleae, and Bactrocera tryoni. Accession numbers and quality parameters of the reference genomes are displayed in Table S2.

Characterization of nrEVEs Repertoire

For the nrEVEs identification, each reference genome was annotated separately using a blastx search against the above-mentioned viral protein database. The E-value cut-off for blastx identification was set to 10-6, with the rest of the settings left as default. To verify the viral origin of the putative nrEVEs, each sequence was mapped back to the NCBI non redundant (nr) protein database using blastx search with an E-value cut-off of 10-4 [24]. nrEVEs sequences with high similarity to host insect proteins, retroviruses or transposable elements were discarded. For those putative nrEVEs mapping to viral sequences, the top viral hit was considered for taxonomic classification. nrEVEs names include a four letters abbreviation of the host name, the root of the viral family originating the nrEVE, and a number to differentiate the nrEVEs with the same viral origin present in each fruit fly species.

Nucleotide Similarity Between nrEVEs and Circulating Viruses

Blastn was used to identify the similarities between nrEVEs and RNA viruses infecting tephritid fruit flies at the nucleotide level. Fifty genomes of tephritid fruit fly RNA viruses were recovered from the NCBI nucleotide repository and recent publications [9,10,11,12] to create the database for the analysis (Table S1). The cut-off for blastn identification was set to 50% query cover and 50% nucleotide identities, with the remaining algorithm parameters maintained as default.

nrEVEs Distribution Along Viral Genomes

To identify whether some viral genomic regions are more prone to generate nrEVEs, nrEVEs were individually aligned to the genome of a reference virus, which was specific for each viral family. Blastx against the nr protein sequence database filtered by viruses (taxid: 10239) was employed for the identification of the reference viral genome, and ClustalW (v2.0) for the multiple alignment. The results of the alignments were visualized using BioEdit (Hall, T.A. et al., 1999) to assess the start and end positions of the nrEVEs. The distribution analysis was only performed for those viral families represented by more than ten nrEVEs: Partitiviridae and Rhabdoviridae.

Assessment of nrEVEs Transcription

First, the availability of RNA-seq data in the Sequence Read Archive (SRA) was determined for each of the tephritid fruit fly species. Four different RNA-seq experiments were selected for each of the six species with available datasets. Selection was based on sequencing technology (Illumina) and library construction strategy (paired-end reads). When possible, samples from different developmental stages (eggs, larvae, pupae, adult) were included. SRA accession numbers and characteristics are listed in the Table S3. The quality of the sequenced reads was analysed using fastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Raw datasets were then trimmed with Trimmomatic [31] to remove adaptors and low-quality sequences. The remaining reads were de novo assembled using Trinity-v2.9.0 [32]. The presence of nrEVEs in the assembled contigs was determined using blastn with default parameters and an E-value cut-off of 10-4.

Secondly, the 53 C. capitata RNA-seq datasets available at NCBI were investigated in silico for the presence of nrEVEs (Table S4). SRA reads were trimmed with Trimmomatic [31] and mapped against the nrEVEs described in C. capitata reference genome using Bowtie 2 v 2.3.5.1 [33]. nrEVEs transcription levels were determined using RSEM v 1.3.1 [34] with default parameters. The relative abundance of each nrEVE was calculated relative to the endogenous L23a gene of C. capitata and represented using the heatmap.2 function from the gplots v 3.1.1 package in R [35].

Small RNA Sequencing

Small RNAs were sequenced from ovaries and somatic tissues from two C. capitata strains (Control and Wild-F4). The control strain was established in 2001 using wild flies captured at experimental fields located in Moncada (Valencia, Spain), and has been reared under laboratory conditions for more than 100 generations. The Wild-F4 strain (W) derives from C. capitata pupae collected on infested figs fruits (Ficus carica) from commercial citrus orchards located in Alcira (Valencia, Spain) in August 2020. Differently from the control strain, Wild-F4 strain has been reared in laboratory conditions for only four generations. In both cases, laboratory conditions were 26 °C, 40–60 % humidity, and 14/10h light/dark cycles (Arouri et al., 2015).

Virgin adult females from each population were isolated after emergence and reared separately from the adult males with water and food provided ad libitum. Eight days after emergence virgin females were collected and dissected using entomological tweezers and pins to isolate the ovaries whereas the thorax and the head were used as somatic tissues. Immediately after dissection, the samples were preserved in RNA later (Sigma Aldrich R091, St. Louis, MO, USA). RNA extraction was performed with the TriPure isolation reagent (cat. No. 11667157001; Roche, Mannheim, Germany) using pools of ten heads/thoraxes and 20 ovaries. RNA integrity and quantity were measured using 1% agarose gel electrophoresis and spectrophotometry.

Small RNA library construction and sequencing were performed by Macrogen (Seoul, Korea). Libraries were obtained using the TruSeq Small RNA Library Prep Kit (Illumina, San Diego, CA, USA) and sequencing was run in the HiSeqX platform (Illumina). Paired end reads of 150 nt and 1 Gb of raw data were generated from each of the four libraries under analysis (NCBI; SAMN21882268, SAMN21882269, SAMN21882270, and SAMN21882271).

Characterization of nrEVE-Derived Small RNAs

Small RNA-Seq reads were bioinformatically treated to eliminate adapters, empty sequences, low complexity sequences, and reads with more than 20% low-quality [36]. After cleaning, reads were filtered by length using custom Perl scripts, and only the sequences between 18 and 32 nt were maintained. nrEVEs-derived sRNA sequences were identified by mapping the clean reads against the sequences of the nrEVEs described in medfly using Bowtie 2 v 2.3.5.1 [33]. Identical reads were merged to calculate the percentage of nrEVE-derived sRNAs from each sRNA library mapping to each nrEVE. The mapping position of the sRNA reads along the nrEVEs was visualized using an Integrative Genomics Viewer [37] and the length distribution, base composition, and strand distribution of the sRNAs were analysed using a custom Python script described by Lewis et al., 2018 (accessible on GitHub: https://github.com/SamuelHLewis/sRNAplot) [38].

Results

The Genomes of Tephritid Fruit Flies Harbor nrEVEs

A total of 64 nrEVEs were identified and distributed among the species as follows: Bactrocera (Zeugodacus) cucurbitae (n=1), Bactrocera dorsalis (n=3), Bactrocera latifrons (n=4), Bactrocera oleae (n=8), Bactrocera tryoni (n=8), Ceratitis capitata (n=4), Eutreta diana (n=22), Rhagoletis zephyria (n=8), Tephritis californica (n=5), and Trupanea jonesi (n=1) (Fig. 1). The number of annotated nrEVEs varied among the selected genomes, with E. diana having the highest number of nrEVEs (n=22), while some species had a single nrEVE (T. jonesi and B. cucurbitae) (Fig. 1). In some cases, nrEVEs were found to cluster together within the same genomic region (Table 1). For example, two chromosomes of B. oleae contained nrEVEs that were less than 5000 bp apart. Similarly, two scaffolds of R. zephrya contained two nrEVEs located within 1000 bp of each other. In the case of E. diana, three scaffolds contained nrEVEs that were less than 200 bp apart, with EVE_EdiaRhabdo_9 and EVE_EdiaRhabdo_10 being separated by only 60 bp. Despite their proximity, we considered these nrEVEs to be distinct entities based on their distribution across the sequence of their representative virus (Fig. 2D). Moreover, due to the high fragmentation of some reference genomes, it is possible that the nrEVEs clusters are more numerous than reported here.

Fig. 1
figure 1

Classification of nrEVEs from the reference genomes of ten tephritid fruit flies. The nrEVEs were assigned to a viral family according to the top hit of blastx searches against the non-redundant database on NCBI. Viral families are shown using different colours, as specified in the legend. A Bar graph shows the percentage of nrEVEs assigned to each viral family for each tephritid fruit fly. The total number (n) of nrEVEs detected in each tephritid fruit fly species is shown on the right side of the bar graph. B The pie chart shows the proportion of the nrEVEs described in tephritid fruit flies which derive from each viral family. The total number of nrEVEs (n) is indicated

Table 1 Characterization of the 64 nrEVEs retrieved in the genomes of tephritid fruit flies
Fig. 2
figure 2

Functions encode in the viral regions originating the nrEVEs. A Pie chart of the viral functions assigned to the tephritid nrEVEs. B Distribution of the viral functions within each viral family. The total number of nrEVEs per viral family is displayed on the right of the bar graph. C and D nrEVEs distribution across a representative viral genome. The genomic structure of a representative member of Partitiviridae (C) and Rhabdoviridae (D) is shown in the upper panel. Boxes represent the viral ORFs, and viral functions are indicated in capital letters: N (capsid), G (Glycoprotein), L (RNA-dependent RNA polymerase), P (uncharacterized). Best blastx hits shared by different nrEVEs are shown. nrEVEs are displayed according to their length and mapping position on the representative viral genome

The nrEVEs Have Diverse Viral Origins and Are Not Related to Circulating Viruses

The 64 nrEVEs identified in tephritid fruit flies derived from 11 different viral families, excluding five nrEVEs which were unclassifiable beyond the Riboviria realm (n=5; 8%) (Fig. 1). The majority of nrEVEs derived from negative ssRNA viruses (n=37; 58%) from the following families: Rhabdoviridae (n=21), Phasmaviridae (n=6), Orthomyxoviridae (n=3), Phenuiviridae (n=4), and Xinmoviridae (n=3) (Fig. 1). Among these, the Rhabdoviridae–derived sequences were the most abundant (n=21; 33%), and they were identified in eight out of the ten tephritid fruit fly species (Fig. 1). On the other hand, nrEVEs derived from positive ssRNA viruses were less common (n=8; 13%), and they were distributed across four families: Virgaviridae (n=4), Tombusviridae (n=2), Narnaviridae (n=1), and Tymoviridae (n=1) (Fig. 1). Finally, dsRNA viruses contributed to a total of 14 nrEVEs (22%), originating from two different families: Partitiviridae (n=11) and Totiviridae (n=3). Similar to Rhabdoviridae-derived nrEVEs, albeit in lesser abundance, Partitiviridae-derived nrEVEs were identified in eight out of ten tephritid fruit fly species (Fig. 1).

When comparing the nrEVEs with the 51 RNA viruses infecting tephritid fruit flies and described so far in the existing literature (Table S1), we found that only five nrEVEs exhibited certain similarity at the nucleotide level with currently known circulating viruses. However, the percentage of nucleotide identity in all cases was lower than 80% (Table 2). Furthermore, we observed that none of the tephritid species that hosted the queried nrEVEs matched with the known hosts of the best matching circulating virus obtained through blastn, except for nrEVE_CcapRhabdo_1 and its host Ceratitis capitata sigmavirus (CcaSV) infecting C. capitata. Nevertheless, the nucleotide similarity between them was only 66%.

Table 2 Nucleotide sequence similarity between nrEVEs and circulating viruses. Only results with more than 50% query cover (QC) and nucleotide identities (Identity) are included

Most of the nrEVEs found in the genomes of tephritid fruit flies originated from viral regions that encoded structural viral proteins (n=39; 61%), specifically the capsid protein (n=32; 50%) and the glycoprotein (n=7; 11%). nrEVEs derived from genes encoding non-structural proteins were three times less abundant (n=14; 22%), and they were mainly derived from the viral gene encoding the RNA-dependent RNA polymerase (n=13; 20%) (Table 1 and Fig. 2A).

Several nrEVEs distributed across the genomes of various fruit flies exhibited overlapping viral open reading frame (ORF) and shared the same best blastx hit (Fig. 2). For instance, seven Partitiviridae-derived nrEVEs distributed across five species had the Vera Virus as best blast hit and originated from the ORF that encodes the nucleoprotein (Fig. 2C). Similarly, five overlapping Rhabdoviridae-derived nrEVEs generated from the ORF encoding the nucleoprotein had the Shayang fly virus 2 as best blast hit and were present in B. tryoni, B. dorsalis, T. californica and R. zephyra (Fig. 2C). These findings raise the possibility that these overlapping nrEVEs are orthologues originated from a viral insertion that occurred in the ancestor of the existing lineages. However, it is important to note that the nucleotide identity of the overlapping regions is relatively limited (Figure S1). On the other hand, the majority of nrEVEs we identified in our study did neither shared the same best blastx hit nor originated from the same the viral region. This suggests that they are independent insertions that occurred in the existing lineages.

Additionally, we identified potentially paralog nrEVEs that may have originated by duplication events that occurred within the host. For example, in the genome of B. oleae, we retrieved three nrEVEs located in the same contig and derived from the transcript encoding the glycoprotein of Phasmaviridae. Of them, the longest nrEVE_Bole_Phasma_1 shared close to 100% homology at the nucleotide level with nrEVE_Bole_Phasma_2 in its 5’region, and with nrEVE_Bole_Phasma_3 in the 3’region (Figure S1).

Some nrEVEs Can Be Transcribed and Processed Through the piRNA Pathway

To assess the transcriptional activity of nrEVEs, we explored the presence of transcripts derived from the nrEVEs in publicly available transcriptomic datasets. Six out of the ten species under analysis had multiple public datasets: B. cucurbitae, B. dorsalis, B. latifrons, B. oleae, B. tryoni and C. capitata (for a total of 28 nrEVEs). From these, we randomly selected four datasets per species (Table S3). Our results revealed that ten out of the 28 nrEVEs analysed were actively transcribed (Table 1). They derived from Rhabdoviridae (n=6), Partitiviridae (n=2), Phenuiviridae (n=1) and Virgaviridae (n=1) families, and mainly represented structural viral proteins as the capsid protein (7/10) and the glycoprotein (1/10).

To determine whether the transcribed nrEVEs could encode for viral proteins, we investigated the presence of open reading frames (ORFs) within their sequences. Among the transcribed nrEVEs, nrEVE_CcapRhabdo_1, nrEVE_BdorRhabdo_1 and nrEVE_BdorRhabdo_2 contained ORFs of 117, 255 and 305 amino acids in length, respectively (Table 1). According to blastx alignments, these amino acidic sequences represent only a partial fragment of structural viral proteins.

Next, we focused on the four nrEVEs identified in C. capitata, a world-wide distributed and devastating agricultural pest with ample genomic resources available. First, the presence of the four medfly nrEVEs was confirmed in ten genomic datasets originating from six diverse medfly strains distributed worldwide (Table S5). To expand our analysis of transcriptional patterns, we included data from 53 medfly transcriptomes obtained from NCBI (Table S4). The results confirmed that nrEVE_CcapRhabdo_1 and nrEVE_CcapPartiti_1 were consistently transcribed. In contrast, nrEVE_CcapPhenui_1 transcription varied between samples (36/53), and transcripts derived from nrEVE_CcapRhabdo_2 were barely detected (7/53) (Figure S2).

Given that the production of piRNAs by nrEVEs in mosquito ovaries has been suggested to play a protective role against viral infections and decrease transovaric transmission of viruses [18], we aimed to determine whether C. capitata nrEVEs generate piRNAs. For this purpose, we sequenced small RNAs from ovaries and somatic tissues of two C. capitata strains (Control and Wild-F4). The results revealed that a fraction of the total sRNAs (< 0.5% of total sRNAs) mapped to two out of the four nrEVEs, namely nrEVE_CcapRhabdo_1 and nrEVE_CcapRhabdo_2 (Fig. 3). The mapped sRNAs exhibited a distinctive peak between 25 and 30 nt in length and were produced by only one of the strands of the nrEVE. Additionally, nrEVE-derived sRNAs had a bias of uracil as first nucleotide (Fig. 3), which is a common characteristic of piRNAs. Altogether, these observations strongly support the notion that the sRNAs mapping to the nrEVEs were indeed piRNAs.

Fig. 3
figure 3

sRNA profile of nrEVE_CcapRhabdo_1 and nrEVE_CcapRhabdo_2 in somatic tissues and ovaries of the control and wild-F4 medfly strains. sRNA reads were filtered by length (18 to 32 nt). Positive strand reads are shown above the solid horizontal line while negative-strand reads are shown below. The sRNA counts are shown in the vertical axis. Percentages indicate the number of unique reads mapping to each EVE sequence compared to the total unique sRNA reads obtained for each sRNA library. Colours indicate the first nucleotide of the sRNA read (red, U; green, A; blue, C; yellow, G)

piRNAs produced by nrEVE_CcapRhabdo_1 were detected in both C. capitata strains, with higher abundance in ovaries than somatic tissues, especially in the control strain. In this strain, the ovaries were also responsible for the production of few piRNAs that mapped against nrEVE_CcapRhabdo_2, which were not detected in Wild-F4 strain (Fig. 3).

We then mapped the in silico identified piRNAs back to the nrEVEs from which they originated, and no specific hotspots were found (Figure S3). Moreover, we mapped the piRNAs originated from nrEVE_CcapRhabdo_1 against Ceratitis capitata sigmavirus (CcaSV), which is the most similar known circulating virus to the nrEVE. Only three piRNAs exhibited sequence similarity to CcaSV, although it was limited to 22 or 24 identities in a 29 nt sequence, or 23 identities in a 28 nt sequence, respectively. These mismatches were further identified in the regions between nucleotides 2-8 and 14-22 of the piRNAs, which have been previously described as the essential seed for the proper alignment between the piRNAs and their targets [39]. Moreover, the abundance of these piRNAs in the ovaries of control strain was only between three and nine copies.

Discussion

In this study, we characterized the repertoire of nrEVEs integrated in the genomes of ten tephritid fruit flies with high impact on crop production. We retrieved an average of five nrEVEs in their genomes (range from 1 to 8), except for E. diana which presented a total of 22 nrEVEs. The high number of nrEVEs in E. diana did not correlate with the size of its reference genome, which was the second smallest among the species analysed (Table S2). Overall, the number of nrEVEs characterized in the genomes of tephritid fruit flies aligns with the number of nrEVEs reported in other dipteran families, with the notable exception of Aedes mosquitos. For instance, previous studies have found zero or one nrEVEs in the genome of the insect model Drosophila melanogaster [23, 40]. Other examples of the limited number of nrEVEs within the dipteran order are found for the biting midge Culicoides sonorensis, in which four nrEVEs were identified; the housefly Musca domestica, whose genome contains seven nrEVEs; and the sand fly Phlebotomus papatasi, which possesses a single nrEVE [23, 40]. In mosquitos, big differences have been revealed in the number of nrEVEs among Culicidae and Anopheles mosquitos, with an average of one and three nrEVEs per species; and Aedes mosquitos, in which more than 100 nrEVEs were initially identified [22, 41]. Subsequent studies have expanded the nrEVEs repertoire in Aedes mosquitoes to up to 200 [23,24,25, 42], reflecting that the quality of the reference genome assemblies, the selection of the viral query, and the filtering parameters are key for the identification of nrEVEs. In this vein, it is possible that the number of the nrEVEs retrieved in tephritid fruit flies will increase in the future as genome assembly quality improves and new viral species infecting insects will be discovered [29, 30, 28].

In concordance with previous studies, our findings support the notion that nrEVEs are predominantly derived from insertions of negative ssRNA viruses [23, 40]. Among these, Rhabdoviridae-derived nrEVEs were the most abundant, as previously observed in mosquitos [22]. Interestingly, this pattern extends beyond the phylum Arthropoda, since negative ssRNA viruses from the Bornaviridae and Filoviridae families represented the 72% of the nrEVEs discovered in the genomes of thirteen Australian marsupial species [43]. Our study also confirms the prevalence of structural gene-coding regions in nrEVEs, as previously reported in other insect species [22, 23]. In particular, 61% of the nrEVEs retrieved in tephritid fruit flies were derived from viral nucleocapsid and glycoprotein genes, suggesting that the transcripts encoding these proteins are more prone to insertion than other viral transcripts. Notably, this observation cannot be solely attributed to the expression levels of viral mRNAs since nucleoprotein genes are typically located at 5’end and transcription initiation occurs at the 3’end in negative single-stranded RNA (ssRNA) [44]. Instead, it is tempting to hypothesize that nucleocapsid and glycoprotein sequences were selectively maintained in true fruit fly genomes because they cause a benefit for the host fitness, although more research need to be done to test this hypothesis.

nrEVEs reflect the long-term and intimate relationships of viruses with their hosts, and their identification can shed light into past and present host distribution of viral genera and species. Our findings suggest two distinct scenarios for the origin of nrEVEs. Some nrEVEs may be orthologs, indicating they originated from viral insertions that occurred in the common ancestor of certain host lineages. On the contrary, other nrEVEs likely derive from integrations events occurred after speciation. To illustrate, we can date the origin of certain nrEVEs to the split of two species. For example, B. dorsalis, B. tryoni and B. latifrons diverged approximately 1.5 million of year ago [45], indicating that the non-ortholog nrEVEs identified in these three species represent relatively recent integrations. Interestingly, potential orthologous nrEVEs derived only from Partitiviridae and Rhabdoviridae families while known viruses capable of infecting multiple true fruit fly species mainly belong to the Dicistroviridae and Iflaviridae families. This lack of overlap between known viruses and nrEVEs points to a limited role of nrEVEs in providing an antiviral function. In line with this, we found no connections between known currently circulating viruses infecting a single tephritid species and the nrEVEs harboured in the correspondent genome. The only observed match was between the nrEVE_CcapRhabdo_1 and the Ceratitis capitata sigmavirus 1, which shared a 66% nucleotide-level identity. All these findings support the hypothesis that the nrEVEs present in true fruit fly genomes are likely viral remnants of ancient viruses that once infected tephritids but do not confer current protection against viral infection. We suggest that viruses responsible for the existing nrEVEs have undergone significant mutations over time or even disappeared entirely, leaving behind traces in the form of the nrEVEs. However, we cannot rule out the possibility that additional viruses currently infecting tephritid fruit flies remain undiscovered.

Numerous studies have highlighted the important roles that nrEVEs can play in host antiviral immunity, both in vertebrates and invertebrates [18, 46,47,48]. In tephritid, many of identified nrEVEs are not transcribed. However, a subset of them undergoes active transcription within tissues, resulting in the production of transcripts that may represent either mRNAs or long non-coding RNAs. In the case of C. capitata and Bactrocera species, ten out of 28 nrEVEs were found to be actively transcribed. Among these, three contained a putative ORF and are therefore presumed to encode a fragment of viral mRNAs with structural functions. According to previous studies, the synthesis of viral proteins mediated by the nrEVEs may have a role on nrEVE-derived immunity. For instance, it has been hypothesized that nrEVE-derived viral proteins may act as antibodies that maintain an immune memory against circulating viruses in mammals [47]. On the other hand, nrEVEs may participate on the immune response by producing regulatory noncoding RNAs. Small RNAs produced by nrEVEs have been shown to limit the cognate virus replication in Aedes mosquitoes [18]. Additionally, a recombinant Sindbis virus modified to contain a sequence complementary to an A. aegypti nrEVE failed to replicate in Aag2 cells [17]. Our results indicate that two C. capitata nrEVEs may be a source of piRNAs in ovaries, and one of them also in somatic tissues. Both piRNA-producing nrEVEs were derived from a rhabdovirus and one of them, nrEVE_CcapRhabdo_1, was consistently transcribed in different C. capitata populations. However, when comparing the piRNAs produced by these nrEVEs with the known rhabdovirus specific to this species, Ceratitis capitata sigmavirus [49], no matches were found despite nrEVE_CcapRhabdo_1 and Ceratitis capitata sigmavirus sharing 66% identity at the nucleotide level. This suggests that the piRNA pathway may not be involved in the immune defence against this particular virus.

In conclusion, our study confirms the integration of nrEVEs in the genomes of tephritid fruit fly species, as observed for other dipteran species, and unravels relatively recent insertion events occurred after the speciation of the extant tephritid lineages. The highly dynamic landscape of observed nrEVEs suggest that the nrEVEs may not be under selection. Despite some nrEVEs are transcriptionally active and produce piRNAs in C. capitata, the low sequence similarity observed between nrEVEs, piRNAs and known viruses indicates that nrEVEs from this species may not play a significant role in combating circulating viral infections. In the context of insect mass-rearing, an antiviral function of nrEVEs could have been exploited through a higher production of nrEVEs transcripts, although our results do not support this hypothesis in the case of true fruit flies. Overall, we provided valuable insights into the landscape of nrEVEs in true fruit flies, offering a comprehensive understanding of their presence and potential roles in fruit fly species.