Malaria is one of the world's deadliest diseases with 200 million cases in a year, leading to half a million deaths. It is caused by mosquito-borne Plasmodium parasites. During parasitic infections our adaptive immune system responds through a complex series of cellular interactions, precipitated by events at the molecular level. The dynamic ability to suppress and activate genes allow immune cells to maintain surprising plasticity. Switching the states of the cells in response to infection require a network of gene regulatory events to occur.
At the heart of the adaptive immune system are T helper cells. Under healthy conditions, naive T helper cells are quiescent and roam through the body awaiting signals. Once T helper cells receive activation signals they differentiate into different roles depending on the type of infection. In malaria, naive cells differentiate into two distinct populations with different roles: Th1 cells, which signal macrophages and killer T cells to attack infected cells. And Tfh cells, which activate survival programs and promote antibody affinity in B cells.
Differentiation into a role is orchestrated by a small number of transcription factors regulating expression of large sets of genes. Identifying transcription factors that direct the differentiation fate is of key importance to understand the immune system, and to what degree in vivo cell-extrinsic factors bias the cell fate has not been studied in detail.
Immunology tightly links cellular biology and molecular genomics. Recently single cell RNA sequencing has gained massive popularity to investigate cells with this joint perspective (1). This technology allows researchers to measure expression of all genes in thousands of single cells. Immune cells however have particularly low amounts of RNA and many important genes, such as transcription factors, are present at low levels. To ensure scRNA-seq would be useful for us we evaluated many different technologies with ground truth molecular input levels (2). To our surprise we found that expression could be quantified even from tens of copies of RNA molecules!
To chart the immune response to malaria we designed a large scRNA-seq time course experiment using a mouse model. We collected T helper cells, monocytes, and dendritic cells from mice infected by the malaria parasite Plasmodium chabaudi at five points over seven days (3).
We used Gaussian process methods to identify and investigate genes with expression dynamics over time. These are nonparametric machine learning models that can represent and identify any smooth dynamic function. (Later we extended this method to identify spatially variable genes in spatial tissue gene expression systems (4).)
Unfortunately five time points are not sufficient to make conclusions about the dynamics of gene expression. However, we could exploit the fact that the immune cells are not synchronized in their response. An extension of the Gaussian process model let us estimate latent per-cell “time points” which could be earlier or later than cell collection times. This Gaussian process latent variable model simultaneously model many genes as different smooth functions, and correct time values so genes are consistent between each other. Now we could investigate the temporally dynamic genes and looked at when they started, peaked, and stopped expression. This way we learn the order transcriptional programs like activation or proliferation.
The most crucial aspect of our experiment was that the T helper cells had two different fates. Meaning any gene’s dynamic expression was the result of either of two functions: a Th1 fate function, or a Tfh fate function. We needed a way to analyse gradual commitment to either fate, without knowing which cells were on which path.
We solved this problem using a mixture of overlapping Gaussian processes. A mixture model is a statistical way to formalize the cell fate assignment problem described above. Machine learning algorithms can be used to infer the most likely groupings of cells into the components of the mixture. In our case cells would be assigned to either a “Tfh component”, or a “Th1 component”, with ambiguous assignments for uncommitted cells.
After fitting the mixture model to assign cells to fates, we could investigate how each individual gene related to the fates. This was the final key to uncovering transcriptional programs for the fate choice. This recapitulated the known Tfh markers Bcl6 and Cxcr5, but in particular let us associate the maintenance of Tcf7 with the Tfh fate. The mixture model identified the Tcf7 antagonist Id2 as the main transcription factor associated with the Th1 fate, along with many known Th1 genes.
The peak times of dynamic genes could be linked to when cells started committing to fates. This identified receptors which were compatible with signaling molecules expressed by the monocytes we had collected in parallel. We confirmed that mice with reduced numbers of monocytes produced smaller fractions of Th1 cells. (Mice with reduced number of B cells on the other hand produce smaller fractions of Tfh cells).
It was possible that T helper cells driven to either fate by the communicating cells had already been primed to respond to only these cells. Using the T cell receptor sequence in the T helper cells as a natural clonal barcode, we found that both Th1 and Tfh cells shared sequences, and must have originated from common naive progenitors.
In summary our data indicate that in malaria activated T helper cells start expressing Tcf7. After proliferation rate acceleration the cells express receptors which receive signals from monocytes or B cells. Cells receiving B cell signals maintain Tcf7 expression and become Tfh cells, while cells receiving monocyte signals replace Tcf7 expression with Id2 expression and become Th1 cells.
Biology is entering a new era with the rich measurements from single cell RNA sequencing. It allows us to use machine learning to ask explicit questions about the data. The power is particularly clear from the joint cellular and molecular perspective. Going forward I hope for more developments that allow us to ask and answer questions from this perspective, and more opportunities for methods like the overlapping mixture of Gaussian processes that can link the two.
References and notes
- V. Svensson, R. Vento-Tormo, S. A. Teichmann, Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
- V. Svensson et al., Power analysis of single-cell RNA-sequencing experiments. Nat. Methods (2017), doi:10.1038/nmeth.4220.
- T. Lönnberg et al., Single-cell RNA-seq and computational analysis using temporal mixture modeling resolves TH1/TFH fate bifurcation in malaria. Science Immunology. 2, eaal2192 (2017).
- V. Svensson, S. A. Teichmann, O. Stegle, SpatialDE: identification of spatially variable genes. Nat. Methods. 15, 343–346 (2018).
- I. C. Macaulay et al., Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells. Cell Rep. 14, 966–977 (2016).
This post was originally an essay I submitted for a competition on work done in the field of genomics during PhD. Thanks to Sarah Teichmann, Lynn Yi, and Eduardo Beltrame for feedback on the text.