Pervasive downstream RNA hairpins dynamically dictate start-codon selection

Plant growth, treatment with elf18 and transformation

Arabidopsis seedlings were grown on 1/2 Murashige and Skoog (MS) plates containing 0.8% agar and 1% sucrose or in soil, both at 22 °C under 12–12-h light–dark cycles with 55% relative humidity. Unless specified, all Arabidopsis plants used in the experiments were in the Col-0 background. N. benthamiana plants were grown under the same conditions in soil as those for Arabidopsis for four to five weeks before experiments. For treatment with elf18, Arabidopsis seedlings were grown on plates for seven days, transferred to liquid 1/2 MS solution and grown for one more day before being treated with 10 μM elf18 or water for 1 h. Transgenic plants were generated using the agrobacterium-mediated transformation method involving floral dipping51.

Cell line

The HEK293FT cell line was purchased from the Duke Cell Culture Facility (Invitrogen, R700-07). All cells tested negative for mycoplasma contamination. Cell line identity was confirmed by STR authentication. Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% heat-inactivated fetal bovine serum and 100 U ml−1 penicillin-streptomycin at 37 °C and incubated with 5% CO2, 95% air.

Plasmid construction

The backbone (pTC090-32) for the dual-luciferase constructs used for expression in plants was generated in a previous study22. The 5′ leader sequences of the transcripts being tested were PCR-amplified from the Col-0 cDNA, and that of the TUB7 transcript was synthesized by IDT before being inserted into the backbone through ligation-based reactions (NEB) or using the ClonExpress II One Step Cloning Kit (Vazyme). The site mutations and hairpin structures were introduced by primer-based PCR.

For in vitro transcription and expression in the mammalian cell line, the 5′ leader sequence of the ATF4 transcript was PCR-amplified from the normal lung fibroblast cell line IMR90 cDNA. The 5′ leader sequence of the BRCA1 transcript was PCR-amplified from genomic DNA from the human breast cancer cell line MCF7. All of the 5′ leader sequences were cloned into the plasmid backbone with the FLUC reporter by Gibson Assembly (NEB). The site mutations and hairpin structures were introduced by primer-based PCR.

To generate the plasmids with dex-inducible expression of RNA helicases, the CDSs of RH11 and RH37 were PCR-amplified from the Col-0 cDNA and cloned into pBSDONR p1-p4, separately. Each of these clones was then paired with the YFP tag, which was cloned in pBSDONR p4r-p2, to generate fusion constructs in the pBAV154 destination vector by multisite LR reaction (LR clonase II plus, Thermo Fisher Scientific). The CRISPR knock-out lines were built through a highly efficient multiplex editing method48. In brief, to construct the shuttle vectors, four guide RNA sequences, TAAACCGCCCGTGAACCACG, TAGACTCCCCGAACTCCACG, TAGACTGTTCGTGAACCACG and TGGTCTTGACATTCCCCACG, were loaded into the pDEG332, pDEG333, pDEG335 and pDEG337 modules, respectively. Then these guide RNA sequences were assembled into arrays in the recipient vector (pDGE666).

All of the primers and oligos used in this study are listed in Supplementary Table 1. All constructs were confirmed by Sanger sequencing before use.

Ribo-seq and RNA sequencing

Arabidopsis seedlings treated with elf18 or water as described above were collected, frozen in liquid nitrogen and ground using the Genogrinder (SPEX SamplePrep). Polysome profiling was performed as described previously20. In brief, the ground tissue was homogenized in the polysome extraction buffer and centrifuged to remove cell debris. The supernatant was then layered on top of a sucrose cushion and the ribosome pellet was collected after ultracentrifugation. The pellet was then washed with cold water and subjected to RNase I (Ambion) digestion. The reaction was quenched by adding SUPERaseIn (Invitrogen). Ribosome-bound RNA was purified and subjected to treatment with PNK (NEB) and size selection through gel (Invitrogen) extraction. The recovered RNA was then subjected to library preparation using the NEBNext Multiplex Small RNA Library Prep Kit with slight modifications. Specifically, after the reverse transcription, rRNA depletion was performed. In brief, the cDNA product was cleaned up with the Oligo Clean & Concentrator Kit (Zymo) and then eluted with water. The eluted product was mixed with 0.4-nmol probes used in previous studies20,52 in the saline-sodium citrate (SSC) solution, and the mixture was subjected to denaturation at 100 °C for 90 s, followed by a gradual decrease of temperature from 100 °C to 37 °C to allow annealing of the ribosomal DNA (rDNA) and the biotinylated oligos. The mixture was then incubated with 200 μg pre-washed Dynabeads MyOne Streptavidin C1 beads (Invitrogen) for 15 min at 37 °C with constant shaking. The tube was then placed on a magnetic rack for another 5 min and the flow-through was collected and cleaned up using the Oligo Clean & Concentrator Kit (Zymo). This rDNA-depleted product was used as the template for PCR amplification and library preparation. The Agilent 2100 Bioanalyzer was used for the sample quality control (Extended Data Fig. 1a). RNA from the same lysate was isolated and subjected to library preparation using the KAPA Stranded mRNA-Seq Kit (Roche). The six libraries for Ribo-seq (three mock and three elf18-induced) were pooled at equal amounts of DNA and subjected to next-generation sequencing using the Illumina NovaSeq (S2, full flow cell) with paired-end reads of 50 bp in length. The six libraries for RNA sequencing (RNA-seq) (three mock and three elf18-induced) were pooled at equal amounts of DNA and subjected to next-generation sequencing using the Illumina NovaSeq (S Prime, 1 lane) with paired-end reads of 50 bp in length.

Ribo-seq and RNA-seq data processing

Ribo-seq read processing was performed following the steps shown in Extended Data Fig. 2a. Specifically, raw reads were trimmed using Trim Galore v.0.6.6, a wrapper tool of Cutadapt53 and FastQC54. The trimmed reads with a length longer than or equal to 24 nt and shorter than or equal to 35 nt were kept and mapped to the rRNA and tRNA library from the Arabidopsis TAIR 10 genome using Bowtie 2 v.2.4.2 (ref. 55). The unmapped reads were then assigned to the Arabidopsis TAIR 10 genome using STAR v.2.7.8a (ref. 56) with –outFilterMismatchNmax 3 –outFilterMultimapNmax 20 –outSAMmultNmax 1 –outMultimapperOrder Random. FastQC v.0.11.9 (ref. 54) and MultiQC v.1.9 (ref. 57) were applied for quality control during each step. Similarly, RNA-seq reads were trimmed and mapped using the same programs under default parameters.

To assess the data quality, we first determined the read length distribution (Extended Data Fig. 1b) and the reads per kilobase of transcript per million mapped reads (RPKM) for all the transcripts in each replicate for the RNA-seq- and Ribo-seq-mapped reads using the featureCount program58 embedded in the Subread package v.2.0.3, and plotted the Pearson correlations between every two replicates (Extended Data Fig. 1c,d). Then we determined the P-site offset near start and stop codons for reads with a length ranging from 24 nt (24-mers) to 35 nt (35-mers) in Ribo-seq using Plastid v.0.6.1 (ref. 59; Extended Data Fig. 1e). Next, we determined the nucleotide periodicity 300 nt downstream of the start codons by calculating the power spectral density (Extended Data Fig. 1f). In addition, we calculated the distribution of RNA-seq and Ribo-seq reads in the 5′ leader sequence, CDS and 3′ UTR of each transcript from mock- and elf18-treated samples (Extended Data Fig. 1g). A metaplot of the normalized distribution of Ribo-seq reads on the normalized transcript was calculated using the computational genomics analysis toolkit (CGAT)60 (Extended Data Fig. 1h). Changes in translational efficiency were calculated using deltaTE61. GO enrichment was performed online using the Gene Ontology resource23,24,25 ( and the results were visualized using enrichplot62.

Identification of translating mAUGs and uAUGs

To identify transcripts with detectable translation initiation from mAUGs, we analysed 25,554 detected transcripts that had an RPKM of exon ≥ 1 in all of the six RNA-seq samples and a RPKM of CDS ≥ 1 in all of the six Ribo-seq samples (Extended Data Fig. 2a). We then calculated ribosome footprints spanning every mAUG for all the 25,554 detected transcripts and normalized each count by total read count and transcript abundance. To set the background read count, we took the top (Q3) quartile of the normalized read counts from regions 50 nt upstream of mAUGs of 5,482 transcripts that have 5′ leader sequences ≥ 100 nt without uAUGs (Extended Data Fig. 2b). Using the resulting background cut-off at 23.17, transcripts with normalized read counts at mAUG ≥ 23.17 and with raw read counts at mAUG ≥ 10 in all of the six Ribo-seq samples were retained, and this yielded 13,051 ‘expressed transcripts’ with detectable translation initiation from mAUGs (Extended Data Fig. 2a).

To identify the uAUGs that can engage ribosomes and facilitate translation initiation, we performed similar calculation and normalization steps for ribosome footprints spanning every uAUG located in the 5′ leader sequences of all the 13,051 expressed transcripts. uAUGs with normalized read counts ≥ 23.17 and with raw read counts ≥ 10 in all of the three replicates in the mock condition and/or in response to elf18 were selected and termed ‘translating uAUGs’ (Extended Data Fig. 2a). A total of 5,626 translating uAUGs were identified from the 13,051 expressed transcripts. The remaining 7,968 uAUGs in the 13,051 expressed transcripts are ‘non-translating uAUGs’.

In vivo SHAPE-MaP in plants and in mammalian cells

The SHAPE reagent, 2-methylnicotinic acid imidazolide (NAI), was synthesized as described previously63. For in vivo SHAPE-MaP in plants, Arabidopsis seedlings treated with elf18 or water or tobacco leaves transiently expressing the dual-luciferase reporters were collected and immediately immersed in the fresh NAI solution (100 mM NAI) or in dimethyl sulfoxide (DMSO) solution as previously described64. To enhance the permeability of NAI, samples immersed in the solution were vacuum-infiltrated and incubated at room temperature for 20 min. To quench the reaction, DTT (dithiothreitol; Roche) was added to the solution for a final concentration of 0.5 M, and incubated for 2 min. The tissue was then washed with water three times, frozen in liquid nitrogen, ground and subjected to total RNA isolation using the Direct-zol RNA Miniprep Plus Kit (Zymo).

For in vivo SHAPE-MaP in the human HEK293FT cell line, cells were collected, washed once with cold 1× PBS after the removal of culture medium and collected in a 1.5-ml tube. Cells were immediately resuspended in 500 μl fresh NAI solution (100 mM NAI) or in 500 μl DMSO solution, and incubated at room temperature with gentle rotation for 5 min. The reaction was stopped by centrifuging the samples at 100,00g at 4 °C for 1 min and removing the supernatant. The sample was immediately resuspended in Trizol (Invitrogen) for total RNA isolation using the Direct-zol RNA Miniprep Plus Kit (Zymo).

The purified total RNA from plants or HEK293FT cells was subjected to DNase treatment by adding 2 μl Turbo DNase (2 U μl−1) and incubated at 37 °C for 30 min, followed by the addition of another 2 μl Turbo DNase (2 U μl−1) and incubation for another 30 min. RNA was then purified by the RNA Clean & Concentrator Kit (Zymo). mRNA was enriched twice through poly(A) selection using Oligo d(T)25 Magnetic Beads (NEB), and subjected to reverse transcription (mRNA in 2.5 μl nuclease-free water, 1 μl 10 mM dNTP (NEB), 1 μl Random Primer 9 (NEB) and 2 μl 5× First-Strand Buffer (Invitrogen), 0.5 μl 0.2 M DTT (Invitrogen), 0.5 μl TGIRT-III (InGex), 0.5 μl SUPERaseIn (Invitrogen) and 2 μl 5 M betaine solution (Sigma-Aldrich)). The cDNA product was cleaned up using the Oligo Clean & Concentrator Kit (Zymo) and the library preparation was performed as described previously65, under the randomer library preparation workflow. Agilent 2100 Bioanalyzer was used for the sample quality control. For the global SHAPE-MaP, libraries were pooled and subjected to next-generation sequencing using the Illumina NovaSeq (S4, full flow cell) with paired-end reads of 150 bp in length. For the targeted SHAPE-MaP, gene-specific PCR primers (Supplementary Table 1) were used for the library preparation as described previously65, under the amplicon library preparation workflow.

In vitro SHAPE-MaP in plants

Arabidopsis seedlings treated in the mock condition were collected, frozen in liquid nitrogen, ground and subjected to total RNA isolation using the Direct-zol RNA Miniprep Plus Kit (Zymo). The purified RNA was subjected to DNase treatment, clean-up and poly(A) selection as mentioned above. To probe the in vitro RNA secondary structures, 500 ng purified mRNA was mixed with NAI (100 mM) or DMSO in a SHAPE reaction buffer (100 mM HEPES, 6 mM MgCl2 and 100 mM NaCl) and incubated at room temperature for 5 min. The reaction was then quenched by purifying RNA using the RNA Clean & Concentrator Kit (Zymo). The treated mRNA was then subjected to reverse transcription, library preparation and next-generation sequencing as described above.

SHAPE-MaP data processing

For global SHAPE-MaP data processing, raw reads were trimmed with Trim Galore v.0.6.6. The trimmed reads were mapped to the rRNA and tRNA library from the Arabidopsis TAIR 10 genome using Bowtie 2 v.2.4.2 (ref. 55), and the unmapped reads were aligned to the Arabidopsis TAIR 10 transcriptome using Bowtie 2 v.2.4.2 (ref. 55). Mapped reads from all four replicates in each group were combined for the following analyses66,67: (1) parse the mutations using shapemapper_mutation_parser; (2) count mutation events using shapemapper_mutation_counter; (3) summarize mutation events and calculate SHAPE reactivities using and Unless specified, only nucleotides with ≥1,000 read coverage and with 0 ≤ SHAPE reactivities ≤ 6 were used for subsequent analyses to ensure accurate structural prediction30. To examine the correlation between replicates, SHAPE reactivity for every transcript in each replicate was calculated individually, and the Pearson correlation coefficient for each transcript was determined in R v.4.1.0 using the Hmisc package ( For targeted SHAPE-MaP data processing, raw reads were processed using ShapeMapper 266. To ensure adequate read coverage and completeness, more than 100,000 reads per nucleotide were achieved for more than 90% of the targeted regions. Delta SHAPE reactivity was calculated by taking the log2-transformed fold change (elf18/mock) for the SHAPE reactivities of the nucleotide in each position. These values were then smoothed over 10-nt sliding windows68. It is worth noting that among the four nucleotides, the increases in mutation rates for adenines in NAI-modified samples were comparably modest (Extended Data Fig. 3d), suggesting that adenine might be less sensitive than the other three residues to NAI modification. However, this does not affect the conclusion of this study, which focuses on identifying the base-pairing status of a region rather than individual nucleotides.

Training and validation of TISnet

To analyse the structure patterns in downstream regions of initiating AUG, we trained a deep neural network to predict translation initiation sites by adapting the PrismNet model69. Downstream regions (101 nt) of mAUGs in transcripts with the top 40% translational efficiency (mAUGs, high likelihood of initiating translation) were used as positive samples and downstream regions of AUGs randomly selected from CDSs or 3′ UTRs (internal AUGs, unlikely to initiate translation) were used as negative samples. Both positive and negative samples must have high SHAPE reactivity coverage (>25%). For the downstream region (101 nt) of each AUG, we predicted RNA secondary structures using RNAfold70 with SHAPE reactivity data used as a soft constraint involving a pseudo-free energy calculation under default parameters (the slope ‘m’ is 1.8 and the intercept is –0.6)71. Then we trained TISnet to classify initiating and non-initiating AUGs by integrating sequence and secondary structure information.

More specifically, we labelled the positive samples as 1, and negative samples as 0. We then encoded the sequence by one-hot encoding (A, C, G, U, 4-dimension), and encoded RNA secondary structures of each nucleotide to 0 or 1 (0 for nucleotides in double-stranded structures; 1 for nucleotides in single-stranded regions). The labels and encodings of samples were used as the input for the deep neural network. We then randomly split the positive and negative samples into a training set and a validation set by 4:1, and trained the network and validated the prediction performance of the network using the two sets, respectively.

Identification of structural elements

To find the sequence pattern of hairpin elements, we extracted the hairpin elements with long stems (more than 15 base pairs) from the downstream regions of predicted initiating AUGs. Then we calculated the k-mer (k = 3) frequency of the loop sequences and the frequency of base pairs in each position (for example, base pairs are counted starting from the loop) of the stem. We further identified conserved structure elements by clustering hairpin elements into classes, on the basis of the sequence similarity between each two hairpin elements. For two sequences, we aligned them by the Needleman–Wunsch algorithm and defined sequence identity as:

$${\rm{Sequence\; identity}}=\,\frac{{\rm{Number\; of\; aligned\; nucleotides}}}{{\rm{Number\; of\; aligned\; and\; unaligned\; nucleotides}}}$$

We divided each hairpin element into 5′ stem sequence (stem-1), loop sequence and 3′ stem sequence (stem-2) (Extended Data Fig. 6c), and calculated the average of sequence identities of these three parts to represent the sequence similarity between two hairpin elements. We calculated the sequence similarity between each two hairpin elements and clustered all hairpin elements in downstream regions of predicted initiating AUGs by the hierarchical clustering algorithm. For each class of hairpin elements, we performed multiple alignment of the stem sequences and the loop sequences and calculated the frequency of nucleotides in each position to construct the position weight matrix (PWM) of the sequence motif. The secondary structures of downstream regions of AUGs were visualized by VARNA72.

5′ rapid amplification of cDNA ends

For the 5′ rapid amplification of cDNA ends (RACE) experiment on the RNA products from all the constructs expressed in plants, a FLUC-specific reverse transcription primer (Supplementary Table 1) and the Template Switching RT Enzyme Mix (NEB) were used during cDNA synthesis; this was followed by template switching using the Template Switching Oligo. PCR amplification of the 5′ region of transcripts was performed using Q5 Hot Start High-Fidelity Master Mix (2×) (NEB).

In vitro transcription

For in vitro transcription, the PCR product containing a T7 RNA polymerase promoter (GCTAATACGACTCACTATAGGG) was used to generate mRNA by using the mMESSAGE mMACHINE T7 ULTRA Transcription Kit (Ambion, AM1344) according to the manufacturer’s instructions. The mRNA product was purified using the MEGAclear Transcription Clean-Up Kit (Ambion, AM1908). To validate the quality of the mRNA product, samples were run on 1% denaturing agarose gel and stained with SYBR Gold (Invitrogen).

Dual-luciferase assay

The dual-luciferase assay for plant samples was performed as described20. In brief, an overnight culture of the Agrobacterium strain GV3101 transformed with the dual-luciferase construct was collected, resuspended in the infiltration buffer (10 mM MgCl2, 10 mM MES and 200 μM acetosyringone), adjusted to an optical density at 600 nm (OD600 nm) of 0.2 and incubated at room temperature for an additional 2 h before infiltrating into N. benthamiana for transient expression. After 24 h of incubation, leaf discs were collected, ground in liquid nitrogen and lysed with 1× passive lysis buffer (Promega). The lysate was centrifuged at 12,000g for 3 min, and 10 μl supernatant was used for measuring FLUC and RLUC activities as previously described20. For the experiment with dex-induced expression, the Agrobacterium strain with the dual-luciferase construct and the strain with the dex-inducible RNA helicase construct were co-infiltrated into N. benthamiana leaves and incubated for 20 h. Then, the leaves were sprayed with 25 μM dex solution in water and incubated for another 4 h before sample collection.

The dual-luciferase assay in the human cell line was performed according to the manufacturer’s instructions (Promega). In brief, HEK293FT cells were seeded into 96-well plates and grown overnight to approximately 70% confluence at the time of transfection. Then, 100 ng of FLUC mRNAs and 100 ng of RLUC mRNAs were co-transfected into HEK293FT cells using 0.3 µl Lipofectamine MessengerMAX Transfection Reagent (Invitrogen) for each well. After a 5-h incubation, cells were collected and washed once with cold 1× PBS after the removal of the culture medium. Fifty microlitres of 1× passive lysis buffer (Promega) was used to extract the proteins according to standard procedures, and 10 µl lysate was used for measuring FLUC and RLUC activities as previously described20.

Western blotting assay

To detect the dex-induced YFP-tagged proteins, the blot was probed with anti-GFP (Clontech, 632381, 1:5,000) primary antibodies. To detect HA-tagged proteins, the blot was probed with anti-HA HRP-conjugated antibody (Cell Signaling Tech, 2999, 1:3,000). To detect endogenous proteins, the blot was probed with anti-ARF2 primary antibody (PhytoAB, PHY2435A, 1:2,000), anti-CH1 primary antibody (PhytoAB, PHY1909S, 1:2,000), anti-RBOHD primary antibody (Agrisera, AS15 2962, 1:2,000), anti-ICS1 primary antibody (Agrisera, AS16 4107, 1:2,000) or anti-β-tubulin primary antibody (Santa Cruz Biotech, sc-166729, 1:2,000). For secondary antibodies, anti-rabbit-HRP antibody (Cell Signaling Tech, 7074, 1:3,000) or anti-mouse-HRP antibody (Abcam, Ab97040, 1:10,000) were used.

Elf18-induced resistance to Psm ES4326

The elf18-induced resistance experiment was performed as previously described20. In brief, Arabidopsis plants were grown in soil for three to four weeks and infiltrated with 1 μM elf18 or mock treatment (water) one day before infection with Psm ES4326 (in 10 mM MgCl2 solution at OD600 nm = 0.001) in the same leaf. Bacterial growth was measured two days after infection.

Statistics and reproducibility

Unless specified, statistical tests were performed using GraphPad Prism v.8.0 or in R v.4.1.0. The statistical methods and number of experimental replicates are indicated in the figure legends. Unless specified in the figures or legends, no adjustments were made for multiple comparisons. In the graphs (except for Fig. 3b,c), asterisks and lower-case letters indicate statistical significance reflecting the P values (*P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001; NS, not significant). The number of data points for the analyses shown in Figs. 2c,d and 3g and Extended Data Fig. 4d are as follows: upstream, n = 50; downstream, n = 50. For Fig. 2e, m/iAUG, predicted non-initiating AUG, n = 7,083; predicted initiating AUG, n = 2,917; uAUG, predicted non-initiating AUG, n = 895; predicted initiating AUG, n = 933. For Fig. 2i, only transcripts with high expression levels (RPKM > 19) were used for the analysis. Predicted non-initiating AUG, n = 450; predicted initiating AUG, n = 464. For Fig. 4e, WT, n = 50; rh37rh52, n = 50. For Extended Data Fig. 4c,e, in vivo, n = 50; in vitro, n = 50. Unless specified, experiments were repeated at least three times with similar results. Original gel images can be found in Supplementary Fig. 1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

Rate this post

Leave a Comment