Low mapping rate 3 - TSO concatemers

Compared to the two previous posts in the series, this post deals with something more technology specific.

Many biochemical reactions require a criticial amount of material before they work at all. This is the main challange with single cell RNA-sequecing: to create sufficient material for the next step in a protocol. The Smart-seq2 protocol makes use of Nextera, a kit for fragmenting and adding adapters for amplification, and finally Illumina sequencing adapters. But in order for Nextera to work a minimal input of DNA must be provided.

Once cDNA have been reverse transcribed from mRNA from a cell, it can be pre-amplified if it has PCR adaptors at both ends of the cDNA. A particularly convenient way to add these adapterors is though template switching PCR.

Template switching oligos

Here when reverse transcription reaches the 5' end of the RNA, a CCC sequence is added. This allows a DNA oligo with GGG at the to bind to the end if the cDNA. This oligo allows the second strand of the cDNA to be generated, and at the same time provides as an adapter for PCR primers.

In the standard implementation of the Smart-seq2 protocol the template switching oligo (TSO) is AAGCAGTGGTATCAACGCAGAGTACATGGG.

Sometimes these TSOs concatenate to longer DNA sequences, and get amplified along with the cDNA. If you investigate reads not mapping to the transcriptome or rRNA you will find a number of reads whic have the TSO repeated after itself multiple times.

The TSO concatemers can be accounted for during quantification by including a FASTA record of a TSO concatemer in the reference, like this one:

>TSOconcatamer
AAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTA
CACGGGAAGCAGTGGTATCAACGCAGAGTACATGG

Rerunning Salmon with the new reference, we can compare the mapping rates to those in the previous post:

 
1.png
 

As we can see, the majority of cells get increased mapping rate when including the TSO concatemer. And many cells go from single digit percentages to over 50%! These samples are likely wells with almost no cellular mRNA in.

As before we can visualize the relative contribution of fragments from the different sources (here I merged the rRNA genes expression to one unit).

 
2.png
 

We see that several of the plates have large amounts of TSO contamination, and compared to rRNA it seems more variable between samples. It also seems to generally have a larger contribution than rRNA except for in one of the plates.

To quickly investigate different concatemers in data I created a little tool to our readquant collection which counts the number of occurances in reads from FASTQ files.

$ concatamer_filter.py fastq/20003_3#57_1.fastq fastq/20003_3#57_2.fastq AAGCAGTGGTATCAACGCAGAGTACATGGG
Copies,Fragments
0,667147
1,32750
2,33590
3,84431
4,8723
5,8

A typical strategy when investigating low mapping rates is to BLAST unmapped reads. Many times this will give results from scaffolds of the common carp genome (Cyprinus carpio). Actually, if you simply BLAST a 3x concatemer of TSO, it will map all over the carp genome with 100% similarity

 
Screen Shot 2017-09-07 at 00.22.08.png
 

Finally, I should mention that Smart-seq2 isn't the only protocol making use of template switching. It is also used in STRT-seq, the different flavours of Drop-seq (e.g. SeqWell, DroNc-seq etc) as well as in the very popular 10X Genomics Chromium single cell solution.

35948630474_b86df0d057_z.jpg