Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 13:15:1350267.
doi: 10.3389/fimmu.2024.1350267. eCollection 2024.

Transcription termination and readthrough in African swine fever virus

Affiliations

Transcription termination and readthrough in African swine fever virus

Gwenny Cackett et al. Front Immunol. .

Abstract

Introduction: African swine fever virus (ASFV) is a nucleocytoplasmic large DNA virus (NCLDV) that encodes its own host-like RNA polymerase (RNAP) and factors required to produce mature mRNA. The formation of accurate mRNA 3' ends by ASFV RNAP depends on transcription termination, likely enabled by a combination of sequence motifs and transcription factors, although these are poorly understood. The termination of any RNAP is rarely 100% efficient, and the transcriptional "readthrough" at terminators can generate long mRNAs which may interfere with the expression of downstream genes. ASFV transcriptome analyses reveal a landscape of heterogeneous mRNA 3' termini, likely a combination of bona fide termination sites and the result of mRNA degradation and processing. While short-read sequencing (SRS) like 3' RNA-seq indicates an accumulation of mRNA 3' ends at specific sites, it cannot inform about which promoters and transcription start sites (TSSs) directed their synthesis, i.e., information about the complete and unprocessed mRNAs at nucleotide resolution.

Methods: Here, we report a rigorous analysis of full-length ASFV transcripts using long-read sequencing (LRS). We systematically compared transcription termination sites predicted from SRS 3' RNA-seq with 3' ends mapped by LRS during early and late infection.

Results: Using in-vitro transcription assays, we show that recombinant ASFV RNAP terminates transcription at polyT stretches in the non-template strand, similar to the archaeal RNAP or eukaryotic RNAPIII, unaided by secondary RNA structures or predicted viral termination factors. Our results cement this T-rich motif (U-rich in the RNA) as a universal transcription termination signal in ASFV. Many genes share the usage of the same terminators, while genes can also use a range of terminators to generate transcript isoforms varying enormously in length. A key factor in the latter phenomenon is the highly abundant terminator readthrough we observed, which is more prevalent during late compared with early infection.

Discussion: This indicates that ASFV mRNAs under the control of late gene promoters utilize different termination mechanisms and factors to early promoters and/or that cellular factors influence the viral transcriptome landscape differently during the late stages of infection.

Keywords: African swine fever virus (ASFV); Oxford Nanopore; RNA polymerase; long-read sequencing; transcription readthrough; transcription termination; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Mapping of the long-read sequencing (LRS) reads to the BA71V genome. Visualized in R using the gggenes (65) package. For visualization purposes, the genome was split in half, and each panel shows the long-read sequencing reads aligned from 5 hpi to 16 hpi (indicated by the left access). Arrows indicate BA71V ORFs oriented and colored according to their coding strand (red for plus, blue for minus).
Figure 2
Figure 2
Comparison between LRS reads and SRS 3′ RNA-seq-based annotations. (A) Simplified schematic representation of how reads were classified according to their 3′ end location relative to the 3′ RNA-seq annotated pTTS. (B) Distribution of 3′ end (red) locations relative to that annotated 3′ RNA-seq TTS, from the 20,189 of 41,265 ONT reads which matched the 5′ end and the 3′ end—defined as within 100 bp of the TSS or TTS, respectively. (C) The locations of 3′ ends for the 21,076 reads which matched their 5′ end to the TSS location, but not the 3′ end with the pTTS location (bin width = 50 bp). (D) For each ASFV gene, the percentage of TSS-matched LRS reads that terminate prematurely, correctly, or readthrough relative to the SRS pTSS. Pearson correlation coefficients and p-values for percentage termination type versus polyT length at the SRS pTTS are shown underneath.
Figure 3
Figure 3
The 3′ ends of reads: enriched motifs and gene examples. MEME motif searches were carried out on all reads whose 5′ ends matched the CAGE-seq data for an annotated gene. The 3′ end nt of each read plus and minus 20 nt on either side were extracted for 3,823 and 9,216 total reads from 5 h to 16 h, respectively. (A, B) The first and second most significant motifs, respectively, detected from reads at 5 h. (A) This was found at 958 sites (E-value 3.2e−188) and (B) at 177 sites (E-value 1.1e-028). (C, D) The first (685 sites, E-value: 8.3e−050) and second motifs (580 sites, E-value 2.4e−005) found at the 3′ ends of reads from 16 h, respectively. (E) Summary schematic of analysis and results from panels (A–D), i.e., after matching reads from 5 hpi to 16 hpi to their respective TSSs; the significant motifs found at 3′ ends of reads were mostly polyT (polyU) during early infection and an almost equal mix of polyT and polyA during late infection. (F) Full-length transcript landscape including, and downstream of, the gene B646L, representing non-discrete termination sites. The asterisk (*) indicates a polyT-rich region that could facilitate termination for either of the genes B385R and B646L but shows no clear enrichment of 3′ ends. (G) Full-length transcript landscape surrounding the gene CP312R, representing discrete termination sites. Reads are capped at 2,000 total reads for visualization. Total reads from 16 h are shown for the region of the BA71V genome indicated with the bottom scale for both (F, G). Blue (minus) and red (plus) indicate strandedness of ORFs, polyT stretches of ≥4, and reads.
Figure 4
Figure 4
(A) Schematic representing the categories of tandem and convergent gene layout, along with examples of LRS read mapping, colored according to termination type. (B) Summary Venn diagram representing the positioning of genes relative to one another across the BA71V genome. BEDTools was used to classify each of the 153 BA71V genes [as annotated in Cackett et al. (4)], according to the next genes up- and downstream. (C) Bar chart showing the proportion of reads showing each termination type versus their gene organization (tandem or convergent). Termination type is colored as before (red, amber, and green represent correct, premature, and readthrough, respectively). The bar height represents the percentage of termination type per gene layout type, annotated with the number of reads per termination type. These reads were extracted from the 41,265 which matched the 5′ ends from CAGE-seq; 24 reads were excluded due to no annotated gene downstream (at the genome termini). (D) Correlation matrix plot following a chi-squared test of independence on read frequency per termination type against each gene layout. Pearson’s chi-squared test of independence: χ 2 = 4214.7, p-value < 0.001. The scale indicates Pearson residuals, with navy indicating a strong positive association (e.g., between converging genes and reads prematurely terminating or between contiguous genes and readthrough) and white indicating a strong negative association (e.g., between converging genes and reads prematurely reading through or between contiguous genes and reads prematurely terminating). (E–H) Distribution of distances between the 3′ read ends from LRS versus the 3′ RNA-seq TTSs. Shown as histograms with a bin width of 150 nt for every graph. There are two examples each for early tandem and convergent genes (Y118L and CP312R) and late tandem and convergent genes (A224L and B646L). Color scheme as before: amber, red, and green represent premature, correct, and readthrough termination relative to the 3′ RNA-seq TTS (or ORF stop codon in the case of A104R, shown in blue).
Figure 5
Figure 5
LRS TTSs versus LRS 3′ ends. (A) Summary schematic for how locations with an accumulation of 3′ transcript ends were extracted from LRS reads and used for peak calling to identify LRS TTSs. (B) For all the 41,265 reads whose 5′ ends matched the CAGE-seq TSSs, the distribution of 3′ end locations is shown relative to the newly LRS-defined pTTS—colored magenta if within 50 bp of this pTTS, all other TTSs in green. (C) For the total 376 LRS-defined TTSs, their location and role relative to the gene from which the transcripts originated and its ORF were defined into four groups: pTTS for the most-used TTS downstream of a gene’s ORF, npTTS for less used TTSs downstream of a gene, intra-ORF for TTSs within the originating gene’s ORF, and intergenic if the transcripts terminating at a TTS had no matching 5′ end to an annotated gene. (D) An example of TTS sharing between genes D79L and D339L, showing reads aligned in this region capped at 500 reads for visualization purposes. TTSs for both genes are annotated and their surrounding motifs are shown on the right.
Figure 6
Figure 6
LRS reads aligning between 23,550 and 38,000 on the BA71V genome at (A) 5 hpi and (B) 16 hpi. Novel LRS-annotated pTTSs and known ORFs are labeled, while all strands, ORFs, and TTSs are colored red or blue according to the strand (plus and minus, respectively). A528R, A506R, and A542R are also known as MGF505-7R, MGF505-8R, and MGF505-10R, respectively.
Figure 7
Figure 7
Significantly enriched DNA motifs detected via MEME, searching the 10 bp up- and downstream of the TTS, separated according to type (pTTS, npTTS, and intra-ORF), ordered according to abundance. (A, B) The only two significant motifs detected at 71 (E-values 3.2e−056) and 21 (3.5e−007) sites, respectively, from a total of 111 pTTSs. (C) The most significant motif detected from 179 npTTSs, which was found in 65 sites (E-value 8.9e−036). (D) The second most common motif detected among npTTSs was detected at 27 sites (E-value 9.9e−005). (E) This was the only significant motif found at 22 of the 87 intra-ORF TTSs (E-value 2.4e−002). WebLogo was used to create these motifs from the MEME fasta output. (F) The distances in nt from each of the 158 lacking any polyT TTSs to the next polyT downstream. One non-polyT TTS was omitted as it had no polyT downstream—being at the genome terminus. (G) A summary of TTS types according to their classification as primary, non-primary, or intra-ORF, whether their sequence contains a polyT or not and if the TSS from which their reads predominantly originate was defined as an early or late gene TSS according to previous CAGE-seq data. (H) Correlation matrix plot following a chi-squared test of independence, on the number of early and late gene terminators per motif category. Pearson’s chi-squared test of independence: χ 2 = 24.9, p-value < 0.001. The scale indicates Pearson residuals, with dark purple indicating a positive association and white indicating a negative association.
Figure 8
Figure 8
In-vitro transcription termination with recombinant core ASFV-RNAP. (A) Example of scaffold (native CP312R) with TTS motif identified from transcriptomic analysis. (B) Schematic of the step-by-step process for carrying out transcription elongation assay. The main final products being the 32P-labeled RNA which had not been elongation, products of pausing or termination at terminators, and finally readthrough transcripts which are generated from RNAP reaching the end of the template strand. (C) Following the process in (B), denatured samples were run on an 11% TBE-polyacrylamide 7 M urea denaturing sequencing gel for a range of scaffolds. The sequences of template and non-template strands, as well as the lengths of transcribed products, are shown in Supplementary Table 5 . The polyA and polyT transcripts were synthetic based on a previous work (69), while CP312R (polyT), CP530R (no motif), and D117L (polyA) were native ASFV terminators. (D) Transcripts from native CP312R polyT motif (9 nt), followed by CP312R 7T, 5T, and 3T as the same scaffolds with subsequent replacement of 2T with 2A in the sequence (see Supplementary Table 5 ). (E) Transcripts from native E184L polyT motif (9 nt), followed by E184L 6T, 5T, 4T, and 3T, as the same scaffolds with subsequent replacement of a T with an A in the sequence. (F) Schematic summary of how the CP312R scaffold sequence generates transcripts with specific lengths in the presence and absence of GTP in transcription reactions. (G) In-vitro reactions from CP312R 7T in (C) ran on a TGX 4%–15% gel under native conditions in TG buffer. Lanes where GTP was omitted from the reactions are indicated, inducing a pausing prior to the terminator motif, wherein the sequence contains only 2 G’s.
Figure 9
Figure 9
Schematic summary of ASFV transcription termination and putative mechanisms of RNA 3′ end formation. (A) The mechanisms for “correct” termination (red highlight), premature termination (yellow highlight), and terminator readthrough (green highlight) are illustrated in boxes. During the early stages of infection (5 hpi), termination is dominated by concise mRNA 3′ formation associated with strong polyU stretches at RNA 3′ ends (red nt). A more complex mRNA 3′ end landscape can be observed in late infection (16 hpi), alongside concise termination, abundant terminator readthrough (green), and premature 3′ end formation (yellow), many of which are not associated with polyT motifs. Transcripts appearing as prematurely terminated include mRNA 3′ ends generated by bona fide termination, head-on collisions of RNAPs which transcribe convergent gene pairs, or alternatively by mRNA degradation or processing. (B) ASFV particles include at least two termination factor candidates, Q706L and B962L, and the CE, important for termination in VACV (described in Table 1 ). (C) The ASFV genome encodes additional putative termination factors including A859L and QP509L, but their molecular mechanisms and exact roles during termination are still not well understood (96).

Similar articles

Cited by

References

    1. Werner F, Grohmann D. Evolution of multisubunit RNA polymerases in the three domains of life. Nat Rev Microbiol (2011) 9:85–98. doi: 10.1038/nrmicro2507 - DOI - PubMed
    1. Salas ML, Kuznar J, Vinuela E. Polyadenylation, methylation, and capping of the RNA synthesized in vitro by African swine fever virus. Virology (1981) 113:484–91. doi: 10.1016/0042-6822(81)90176-8 - DOI - PubMed
    1. Alejo A, Matamoros T, Guerra M, Andres G. A proteomic atlas of the African swine fever virus particle. J Virol (2018) 92:JVI.01293–18. doi: 10.1128/JVI.01293-18 - DOI - PMC - PubMed
    1. Cackett G, Matelska D, Sýkora M, Portugal R, Malecki M, Bahler J, et al. . The African swine fever virus transcriptome. J Virol (2020) 94:e00119-20. doi: 10.1128/JVI.00119-20. - DOI - PMC - PubMed
    1. Nemeroff ME, Barabino SM, Li Y, Keller W, Krug RM. Influenza virus NS1 protein interacts with the cellular 30 kDa subunit of CPSF and inhibits 3'end formation of cellular pre-mRNAs. Mol Cell (1998) 1:991–1000. doi: 10.1016/S1097-2765(00)80099-4. - DOI - PubMed

Publication types

LinkOut - more resources