Merging SCVI models

May 21, 2025

There are massive amounts of single-cell RNA-seq data, and data is still being produced. The single-cell genomics community is currently training large models with this data which can be used for downstream analysis tasks. This requires large efforts in collecting, formatting, and storing the data.

Once data is collected for training the models, the training itself needs to happen as well, requiring compute resources that are hard to come by.

Meanwhile, researchers in specific biological fields who analyze the data for specific questions might fit smaller models for their purposes. The authors of scvi-tools recently created infrastructure to share pre-trained SCVI models on Hugging Face (Ergen et al. 2024).

Can we reduce training work by making use of multiple models trained individually on smaller datasets?

Model merging

When generative image models became popular a few years ago, many different models were trained and fine-tuned to be able to generate images in specific domains. Some models were good at generating landscapes, others were good at generating drawings.

What if you wanted a model that was good at generating drawings of landscapes?

Ideally you would get the combined training sets, and fine-tune a stable diffusion model with them. But this is resource intensive. As an alternative, people found that taking the two fine-tuned models and ‘merging’ them can make a new model that leverages the strengths of both models (Wortsman et al. 2022).

In the generative image model and large language model community there are multiple platforms and packages for model merging. It is a popular way for hobbyists to customize models since it requires no training resources. One package, mergekit, designed for merging popular large language models, suggest in their documentation to start with linear merging and NuSLERP merging.

Linear merging simply means taking weighted averages of weights of two trained models,

\(W_\text{linear} = \alpha \cdot W_A + (1 - \alpha) \cdot W_B.\)

NuSLERP is a more complicated merge method, which normalizes weights to lie on a unit sphere, then perform interpolation in spherical geometry,

\(\begin{aligned} a &= \frac{W_A}{||W_A||}, \\ b &= \frac{W_B}{||W_B||}, \\ \theta &= \text{arccos}(\left<a, b \right>), \\ u(\alpha) &= \frac{\sin(\alpha \cdot \theta)}{\sin \theta} \cdot a + \frac{\sin((1 - \alpha) \cdot \theta)}{\sin \theta} \cdot b, \\ W_\text{NuSLERP} &= (\alpha \cdot r_A + (1 - \alpha) \cdot r_B) \cdot u(\alpha). \end{aligned}\)

It is hard to find proper evaluations of whether these merging strategies actually work. In particular, in the generative image model community users typically experiment iteratively until they get desirable images. If we want to apply these techniques to SCVI models, we can evaluate them using the reconstruction error.

Testing model merging on SCVI models

To evaluate whether we can combine strengths of two independent SCVI models, we want to model two very distinct pieces of biology with them, and see if merging the models will provide a clear benefit.

Brain cells, developed from the ectoderm, and liver cells, developed from the endoderm, perform extremely different functions. The main cell types, neurons and glia for the brain, hepatocytes for the liver, use distinct and specific transcriptional regulation to arrive at their final cell states over development.

Su et al. collected 82,168 cells from mouse liver to nonalcoholic fatty liver disease (NAFLD).

Hahn et al. collected 109,826 cells from mouse brain to study the effects of aging across the brain.

We can use these datasets together to learn how well model merging works by training models on them and evaluate how well a merged model works.

To ensure that both the liver model and the brain model learned the same amount of information from the data, the datasets were limited to the intersection of 21,576 measured genes, and both datasets were downsampled to 80,000 cells. For evaluation, 10,000 of those cells are held out as a test set from each of the datasets, leaving 70,000 cells each for training.

Both the liver model and the brain model were trained for 20 epochs on the 70,000 cells.

A even weighting of 0.5 was used when merging the models, both with linear merging and NuSLERP merging.

For a comparison, an SCVI model was trained on the combined dataset of 140,000 cells. The most fair comparison is to allot this combined model with the same fitting budget as the two individual models, meaning it was only allowed to train for 10 epochs. For some additional comparison, a fourth SCVI model was allowed to train for 20 epochs.

After training, performing linear merge, and NuSLERP merge, the two held out test sets with 10,000 cells from each of brain and liver were pushed through the six different models and used to calculate reconstruction error.

The results give us some interesting information.

Applying a model trained on one organ to another organ is a bad idea. The individual models are highly specific to domain they were trained on.

Model merging substantially degrades performance on in-domain data, but substantially improves out-of-domain performance. The resulting reconstruction errors appear similar to what you would get if you evaluated both models on the test data and averaged the results (somewhat better for the brain test data, 16,000 instead of 22,000).

You obtain a far better, more general, model by combining data then fitting a model, given equal fitting budget.

Even doubling the fitting budget, a model fitted on the combined data is marginally worse than a model fitted on the individual dataset. This was a surprise: the difference is very small, but it is still present. The fitting was not repeated, the difference might be explained by stochasticity.

Unfortunately, model merging does not appear to be a good alternative to collecting large amounts of data and fitting a model on all of it.

Notebooks with code for the analysis described here are available on Github at https://github.com/vals/Blog/tree/master/250520-model-merging

References

Ergen, Can, Valeh Valiollah Pour Amiri, Martin Kim, Aaron Streets, Adam Gayoso, and Nir Yosef. 2024. “Scvi-Hub: An Actionable Repository for Model-Driven Single Cell Analysis.” bioRxiv. https://doi.org/10.1101/2024.03.01.582887.

Wortsman, Mitchell, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, et al. 2022. “Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time.” arXiv [Cs.LG]. arXiv. http://arxiv.org/abs/2203.05482.

Su, Qi, Sun Y. Kim, Funmi Adewale, Ye Zhou, Christina Aldler, Min Ni, Yi Wei, et al. 2021. “Single-Cell RNA Transcriptome Landscape of Hepatocytes and Non-Parenchymal Cells in Healthy and NAFLD Mouse Liver.” iScience 24 (11): 103233.

Hahn, Oliver, Aulden G. Foltz, Micaiah Atkins, Blen Kedir, Patricia Moran-Losada, Ian H. Guldner, Christy Munson, et al. 2023. “Atlas of the Aging Mouse Brain Reveals White Matter as Vulnerable Foci.” Cell 186 (19): 4117-4133.e22.

nxn

Merging SCVI models

Model merging

Testing model merging on SCVI models

References

From around the web