Design and synthesis of reference plasmid
A reference construct was first designed, with the intention of optimising the production of RNA therapeutics for pre-clinical research. The coding sequence of eGFP30 was selected as a reporter in the coding region, as its protein product can be assayed simply through Flow cytometry and other fluorometric methods. Transcription is initiated using a modified T7 promoter (5’-TAATACGACTCACTATAAGG-3’), which is compatible with CleanCap AG reagents (TriLink BioTechnologies) for co-transcriptional capping during mRNA IVT, and the 5’ and 3’ UTR sequences were selected from Human alpha globin and AES-mtRNR1 respectively, due to their demonstrated effect on mRNA stability and expression31,32,33. A 126nt segmented poly(A) tail was selected, as it was predicted to reduce the occurrence of plasmid recombination in E. coli during plasmid propagation and cloning, without compromising mRNA half-life and efficiency of translation in mammalian cells17. Our construct was synthesised and cloned into a pUC57-Kan backbone (GenScript), which contains an origin of replication and a kanamycin resistance gene.
Plasmid DNA was transformed using heat shock into NEB Stable competent E. coli cells (NEB, C3040H). A single colony containing the plasmid was then amplified in LB media with 30 µg/mL Kanamycin. E. coli was harvested from 2500 mL of culture using centrifugation and lysed using alkaline lysis. The supercoiled plasmid DNA was then purified using multistep FPLC purification, first using anion exchange chromatography, and then using a desalting column, where the buffer was exchanged for TE. One mg of supercoiled plasmid DNA was linearised using Bsal-HFv2 restriction enzyme (NEB, R3733), generating a single fragment that terminates at the end of the poly(A) tail. This linearisation ensures run-off in vitro transcription, terminating immediately 3’ to the poly(A) tail. After linearisation, the plasmid DNA was purified further with hydrophobic interaction chromatography (HIC). The linearised plasmid DNA was diluted with 3 volumes of 4 M ammonium sulfate, then purified using HIC to remove other isoforms of plasmid DNA. The pooled fractions corresponding to linear plasmid DNA were buffer exchanged with TE Buffer on a desalting column to remove the ammonium sulfate. The purified linear plasmid DNA was concentrated using isopropanol precipitation to an approximate concentration of 500 ng/µL.
We evaluated the length and purity of our linearised plasmid template using a range of different analytical methods. First, the size distribution of the linearised plasmid was compared to supercoiled plasmid using Agarose gel electrophoresis. This permits the analysis of the proportion of linearised plasmids in the preparation. Second, the purity of the supercoiled and linearised plasmid DNA was analysed by HPLC using a CIMac™. The pDNA was analysed on a 0.3 mL (1.4 mm) diethyl-aminoethyl (DEAE) weak anion-exchange column (BIA Separations, Adjovščina, Slovenia) connected to a PATfix® analytical HPLC system (Sartorius, Goettingen, Germany). HPLC was performed using mobile phase Buffer A (0.1 M Tris, 0.3 M Guanidine-HCl, 1% Tween-20 (w/v), pH 8.0) and Buffer B (0.1 M Tris, 0.3 M Guanidine-HCl, 0.7 M NaCl, 1% Tween-20 (w/v), pH 8.0).
Oxford Nanopore plasmid sequencing
Oxford Nanopore sequencing was used to measure the accuracy and purity of plasmid template preparations. Ligation Sequencing libraries (SQK-LSK109) were prepared from linearised plasmid templates (described above) and were labelled with PCR-Free Native Barcodes (EXP-NBD104). Libraries were prepared according to manufacturer’s instructions (Oxford Nanopore Technologies), with the exception that 2 µg of template was used as an input rather than the recommended 1 µg. The resulting libraries were quantified on a Qubit instrument (Invitrogen) with the dsDNA HS kit and qualitative analysis of fragment length distribution was completed with D5000 ScreenTapes (Agilent Technologies, USA). The results of the quantitative and qualitative analyses were used to calculate the required library concentrations for pooling and loading. Barcoded libraries were sequenced on either R9.4.1 (FLO-MIN106D) or Flongle Flow Cells, with High Accuracy live base-calling enabled (Guppy v5.1.13 and MinKNOW Core 4.5.4). All nanopore reads with quality scores >9 were used to form a concatenated FASTQ file and proceeded to further analysis.
Bioinformatic analysis of Oxford Nanopore plasmid sequencing
First, barcodes were removed with Porechop v0.2.4 Oxford Nanopore pDNA sequencing data were then mapped and analysed using a custom pipeline. The quality-filtered, concatenated FASTQ reads were aligned to the plasmid reference via Minimap2 (Release 2.20-r1064) with -ax map-ont for Nanopore34. The resulting SAM alignment files were processed via SAMtools v1.15 to generate sorted and indexed BAM files, as well as various other mapping analysis files35. The generated BAM files were viewed and analysed utilising Integrative Genomics Viewer (IGV v2.12.3)36. Further run and sample quality statistics were acquired via NanoPlot v1.38.1 and pycoQC v v2.5.237,38.
The per-nucleotide error profile of mapped reads was determined relative to the reference index sequence using pysamstats v 1.1.2 with options –max-depth=300000000 –FASTA –type variation (https://github.com/alimanfoo/pysamstats). To perform simple error correction, the per-nucleotide error profile of the plasmid sequences was subtracted from the corresponding nucleotides within the cDNA/dRNA sequences. Plotting and statistical analyses were performed using Excel (v 16.67 for Mac) and GraphPad Prism (v 9.3.1 for Mac) software.
To investigate the content of the unmapped reads for possible contamination a BAM file was generated for unmapped reads via SAMtools (v1.15) utilising SAMtools view -S -b -f 4 and this was converted to a FASTA file via SAMtools FASTA. This FASTA file was aligned to an E.coli reference sequence utilising Minimap2 then sorted and indexed BAM files and alignment statistics were generated with SAMtools as described earlier. Sequence homology of the reads that did not align to the E.coli reference was investigated using the Basic Local Alignment Search Tool (BLAST) Nucleotide Collection nr/nt provided by the National Center for Biotechnology Information.
Variant calling and consensus sequence generation was completed using bcftools (v1.15)35 with the following set of commands comparing the BAM file to the reference and generating a VCF; bcftools mpileup -d 300000000 –no-BAQ –min-BQ 0 -Ou -f | bcftools call -c -M –ploidy 1 -Oz -o *.vcf.gz. The resulting VCF was indexed and normalised to the reference and a BCF generated. A consensus.fa sequence was generated after comparing the VCF to the reference sequence with command bcftools consensus and option -a – to replace positions absent from the VCF (zero coverage) with a character.
Illumina plasmid DNA sequencing
Plasmid DNA templates were next sequenced on an Illumina MiSeq instrument. Barcoded Illumina DNA PCR-free libraries were prepared from the same linearised plasmid template used for ONT sequencing, according to the manufacturer’s instructions (Illumina). The resulting libraries were sequenced on a MiSeq instrument at the Australian Genome Research Facility using v2 chemistry, set to 150 base paired end.
BCL files generated on the Illumina MiSeq were processed with Illumina DRAGEN BCL Convert 07.021.609.3.9.3 pipeline to generate FASTQ.gz files. The quality of the reads within these files were checked with FastQC (v0.11.9). Reads were then adaptor and quality trimmed utilising TrimGalore (v0.6.8dev) to create FASTQ.gz files with reads passing the Q20 threshold39. Adaptors were trimmed from the Illumina PCR-Free libraries using –stranded_illumina –paired. Additionally, T overhangs were removed from the Illumina stranded mRNA libraries using –clip_r1 1 –clip_r2 1. The resulting trimmed files were checked with FastQC and then aligned to an indexed plasmid reference file40 using BWA-MEM (bwa-0.7.17-r1188).
In vitro transcription of mRNA
Capped mRNA with modified nucleotides was produced by in vitro transcription (IVT) using T7 RNA polymerase following protocols described in Henderson et al.41 and according to the manufacturer’s instructions (NEB, E2080S). Briefly, 50 µg/mL purified linear plasmid DNA was used as template for an IVT reaction at 32 °C for 3 h with 16 µg/mL T7 RNA polymerase (NEB M0251), ribonucleotides (6 mM ATP, 5 mM CTP, 5 mM GTP; NEB, N0450), 5 mM N1-methylpseudouridine-5’-triphosphate (TriLink BioTechnologies, TRN108110), or 5 mM UTP for matched unmodified controls, transcription buffer (40 mM Tris·HCl pH 8.0, 16.5 mM magnesium acetate, 10 mM dithiothreitol (DTT), 20 mM spermidine, 0.002% (v/v) Triton X-100), 2 U/mL yeast Inorganic pyrophosphatase (NEB, M2403) and 1000 U/mL murine RNase inhibitor (NEB, M0314). Cap1 analogue was co-transcriptionally incorporated to the mRNA 5’ end by addition of 4 mM CleanCap AG reagents (TriLink, TRN711310) to the reaction. The mRNA IVT reaction was stopped by addition of 200 units DNaseI (NEB, M0303) per mL of IVT reaction and incubation at 37 °C for 15 min. The resulting mRNAs were purified using Monarch RNA cleanup kits (NEB, T2050) according to the manufacturer’s instructions with the final elution in distilled ultrapure water (ThermoFisher Scientific, 10977015).
We evaluated the yield, length and purity of the IVT mRNAs using a range of different analytical methods. mRNA was quantified by UV spectrophotometry analysis using a NanoPhotometer N120 (Implen) and the size distribution was evaluated using TapeStation electrophoresis with RNA ScreenTapes (Agilent Technologies, USA, 5067-5576).
Detection of dsRNA contaminants by dot blot analysis
The IVT mRNA samples were diluted in nuclease-free water to final concentrations of 200, 500, 1000 and 2000 ng/μL. From the diluted samples, 5 μL aliquots were loaded onto a positively charged nylon membrane (Roche, Basel, Switzerland) which resulted in a total loading amount of 1000, 2500, 5000 and 10,000 ng of IVT mRNA samples on each blot respectively. dsRNA was manufactured in-house42 as a positive control and nuclease-free water was used for the negative control. Samples were loaded onto a Bio-Dot® Microfiltration Apparatus (Bio-Rad, CA, USA) as per manufacturer’s instructions. The membrane was air dried, then blocked by immersion and incubated with blocking buffer containing 5% non-fat dry milk in TBS-T (50 mM Tris-HCl, 150 mM NaCl, 0.05% Tween (w/v)) at room temperature for 1 hour with agitation.
For the detection of dsRNA, the membrane was incubated overnight at 4 °C with two different dsRNA-specific murine monoclonal antibodies (mAb), which were derived from clones 3G1 and 2G4 (Mozzy Mabs, Brisbane, Australia). Both antibodies were incubated separately overnight at a 1:5 dilution, diluted in incubation buffer (1% Non-Fat Dry Milk in TBS-T). The membranes were then rinsed 3 times then washed 3 times for 15 min each wash with TBS-T. The membranes were then incubated with horseradish peroxidase (HRP)-conjugated goat anti-mouse immunoglobulin G (IgG) secondary antibody (Abclonal, MA, USA) at 1:5000 dilution, diluted in incubation buffer, for 1 h with agitation. The membranes were then rinsed 3 times then washed 3 times for 15 min each wash. Chemiluminescence detection was performed using Novex™ ECL chemiluminescent substrate kit (Invitrogen, MA, USA) and the signal intensities of the dots were visualised using the ChemiDoc MP Imaging System (Bio-Rad, CA, USA).
ONT cDNA-PCR sequencing
cDNA-PCR sequencing was used to determine the accuracy and purity of the IVT mRNAs. First, mRNA concentration was calculated using the Qubit RNA BR kit (ThermoFisher Scientific). mRNAs were diluted in nuclease free water to an appropriate concentration for library preparation (~1 ng/µL), and concentrations were confirmed using a Qubit RNA HS kit (ThermoFisher). Barcoded ONT cDNA-PCR libraries (SQK-PCB109 and SQK-PCS111) were prepared according to the manufacturer’s instructions (Oxford Nanopore Technologies), with the following exceptions. Evaporation during the cDNA synthesis step was assessed by measuring reaction volume, and tubes were topped up with nuclease-free water where appropriate. The cDNA was amplified for 14–16 cycles (recommendation is 14–18 cycles). Finally, libraries were eluted in 8 µL of Elution Buffer (rather than the recommended 12 µL volume) to boost the final concentrations of the libraries. This was necessary, as libraries prepared with templates containing modified bases appear to produce lower output libraries than those prepared with unmodified bases.
The resulting libraries were quantified via a Qubit instrument (Invitrogen) with the dsDNA HS kit (ThermoFisher Scientific) and qualitative analysis of fragment length distribution was conducted using D5000 ScreenTapes (Agilent Technologies, USA). The results of the quantitative and qualitative analysis were used to adjust library concentrations for pooling and loading. Up to 10 barcoded libraries were sequenced on each R9.4.1 (FLO-MIN106D) Flow Cell, with High Accuracy live base-calling enabled (Guppy v5.1.13 and MinKNOW Core 4.5.4). All nanopore reads with a quality score >9 were allocated as passed and proceeded to further analysis.
Bioinformatic analysis of ONT cDNA-PCR sequencing
cDNA-PCR data were analysed as described for ONT pDNA sequencing (see above). Additionally, the following analyses were conducted. To identify full-length FASTQ transcripts containing SSP and VNP primers in the correct orientation, the tool pychopper (v.2.5.0) was utilised (with default parameters). These adaptors were then trimmed and new FASTQ files were generated. Trimmed reads from rescued and full-length folders were merged and these reads were mapped as described above with Minimap2 to the plasmid reference. BAM files were also generated as described above and these were used for analysis of cDNA read lengths. SAMtools (v1.15) utilising SAMtools view -F 2048 was used to extract read length distributions of the primary aligned reads from the BAM files. Plotting and statistical analyses were performed using GraphPad Prism (v9.4.1 for Mac) software to create read length distribution profiles for each sample.
Next, we analysed the proportion of on-target reads from our cDNA-PCR dataset (i.e. reads that were correctly transcribed and are likely to be translated into a functional protein). First, BEDtools (v2.27.1) intersect was utilised to analyse the sorted BAM files and identify the proportions of on- and off-target reads. BED files were generated, indicating target coordinates that include the start of the Kozak sequence and end of the 3’ UTR. Reads were binned as on-target and a BAM file generated if they overlapped the start and stop coordinates all other reads were binned as off-target. Off-target reads were further filtered down depending on start or stop coordinate overlap indicating 3’ or 5’ degradation. The generated BAM files were viewed and analysed utilising Integrative Genomics Viewer (IGV v2.12.3).
Variant calling and consensus sequence generation of the mRNA encoding region was completed as described earlier except that it also utilises the option –targets to restrict the pileup to the start and stop coordinates specified above.
poly(A) tail length calculation using ONT cDNA-PCR sequencing
Poly(A) tail length was estimated from cDNA-PCR data (SQK PCS111) which anchors the oligo dT primer to the 3’ end, and using tailfindr (v1.3) using protocols described in Kraus et al.21. Briefly, unaligned FAST5 files are inputted, which are then segmented into the sequencing adaptor, splint adaptor, poly(A) tail and gene body. poly(A) tail length is then calculated based on estimates of pore translocation time. Poly(A) tail length was estimated from Fast5 files generated from the original cDNA-PCR kit (SQK PCB109) and the updated version (SQK PCS111). Here, Fast5 files from both kits were basecalled with High Accuracy utilising the following configuration dna_r9.4.1_450bps_hac.cfg (Guppy v5.1.13 and MinKNOW Core 4.5.4). Then, the find_tails function of tailfindr (v1.3) was run using default settings21. Fast5 files generated from the SQK PCB109 kit required specification of custom cDNA 5’ (TTTCTGTTGGTGCTGATATTGCT) and 3’ (ACTTGCCTGTCGCTCTATCTTC) primer details to enable estimation.
Illumina cDNA sequencing
Barcoded Illumina stranded mRNA libraries were prepared from the same IVT mRNA templates used for ONT sequencing, according to manufacturer’s instructions (Illumina). The resulting libraries were multiplexed with the plasmid DNA libraries and sequenced on a MiSeq instrument at the Australian Genome Research Facility using v2 chemistry, set to 150 base paired end. Trimming and mapping of our Illumina cDNA sequencing data followed the pipeline described for Illumina pDNA sequencing (see above).
The generation of mapping statistics, per-nucleotide error calculations, variant calling, consensus sequence generation and unmapped reads analyses were conducted as described above for ONT sequencing. For samples sequenced on the Illumina platform, on and off target reads were calculated via a different method due to the short length of the reads. SAMtools (v1.15) with the command SAMtools view (various coordinates) was utilised to extract reads down-stream and up-stream of the mRNA encoding region of the reference and generate corresponding BAM files. SAMtools flagstat was utilised to count the primary reads contained within these BAM files and this number was compared to the total primary reads to calculate on target read numbers.
To identify antisense reads we used SAMtools view to separate reads that originated from the forward (second in pair of forward strand, -b -f 128 -F 16 and first in pair of the reverse strand, -b -f 80) or reverse strand (second in pair if they map to the reverse strand, -b -f 144 and first in pair if they map to the forward strand, -b -f 64 -F 16). The reads originating from the reverse strand were designated as antisense reads.
Modified polyadenylation protocol
To detect fragments that lack a poly(A) tail in our synthetic RNAs, we enzymatically added poly(A) tails to our eGFP mRNA. This was prepared according to an Oxford Nanopore protocol, using E. coli Poly(A) Polymerase. Briefly, up to 10 µg of RNA was incubated with NEB E. coli Poly(A) Polymerase (M0276) and 1 mM ATP for 1.5 min. The reaction was stopped using EDTA (to a final concentration of 10 mM), making a final volume of 25 µL. The poly(A) tailed RNA was then cleaned up using 72 µL of Agencourt RNAClean XP beads (Beckman Coulter A63987) and was eluted in 12 µL of nuclease free water. The eluate was then quantified using Qubit BR or HS RNA kits (depending on the quantity of RNA that was inputted into the reaction). An appropriate quantity of RNA was then used as a template for SQK-PCB109 or SQK-RNA002 library prep. Libraries were prepared and sequenced as described above.
Direct RNA sequencing with Oxford Nanopore
Direct RNA sequencing was used to determine the accuracy and purity of the IVT mRNAs. Libraries were prepared using the ONT SQK-RNA002 kit according to the manufacturer’s instructions (Oxford Nanopore Technologies), with the following exceptions. First, 400 ng of template was used to prepare each library, rather than the 500 or 50 ng recommended in ONT direct RNA-sequencing protocols. We utilised the cDNA synthesis step, which improves the template stability. Additionally, we used Superscript IV, rather than the recommended Superscript III, due to its higher efficiency. Libraries were quantified using a Qubit RNA BR kit (ThermoFisher Scientific). Each library was sequenced on a separate R9.4.1 (FLO-MIN106D) Flow Cell for 72 h. Fast5 files were later basecalled with High Accuracy utilising the following configuration rna_r9.4.1_70bps_hac.cfg (Guppy v5.1.13). All nanopore reads with a quality score >9 were allocated as passed and proceeded to further analysis. Direct RNA sequencing data were analysed as for cDNA, with the exception of barcode detection using Pychopper, and on- and off target detection calculations. Moreover, poly(A) tail length calculations were conducted as described for SQK PCS111 libraries (above).
Next, poly(A) tail length was estimated from direct RNA sequencing data (SQK-RNA002). Poly(A) tail length was estimated using tailfindr (v1.3), using protocols described in Kraus et al.21. tailfindr (v1.3) utilises the Fast5 files to estimate tail length and this tool was initially run on the same direct RNA dataset as Nanopolish v0.13.3 poly(A). After comparison of the estimated poly(A) tail length, further samples were analysed. tailfindr was run with default settings, generating a.tsv file as output. This.tsv file was interrogated with a custom R script generating a density plot of the estimated poly(A) tail length for each read.
Statistics and reproducibility
Data depicted in this paper are predominantly a proof of concept, and range from n = 1 to n = 4. Details on experimental replication is included in individual results sections. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomised, as template quality is the most significant determinant of sequencing quality, rather than any alternative co-variant. The investigators were not blinded to allocation during experiments and outcome assessment, as knowledge of sample identity was unlikely to bias the interpretation of the results.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.