Cloud GPUs for SCVI
When I moved last year I decided to get rid of my Windows PC. Windows is in a poor state. Later this year, Microsoft is ending support for Windows 10. My motherboard didn’t support the TPM hardware required to run Windows 11, so later this year I would need to get a new motherboard, which also means getting a new processor and RAM. The latest generation Intel CPU’s were plagued by severe quality issues and did not have stellar performance. The CrowdStrike event made it clear Windows as a whole is not a reliable platform. In addition to all this, Windows 11 seems like a terrible environment even when working as expected, with advertisements, pop-ups, surprise setting changes and on and on.
Meanwhile, I have used Mac laptops and desktops for work for a long time. The ease of doing simple tasks, using third party software like the Adobe suite, or making use of external hardware like cameras and audio devices, is substantial compared to doing the same on Windows.
My main motivation to keep using Windows was to use GPU acceleration, which is incredible even with cheaper devices. For example, Lightroom uses GPU acceleration for image processing.
One of my hobbies is to play with machine learning models like SCVI, which is painfully slow without the ability to use a GPU.
After Apple moved to M series processors, a huge amount of the things I used the GPU for was incredibly fast on the M series CPUs. So I got rid of my Windows PC, and eventually got a Mac desktop.
While hardware acceleration for machine learning models on the M series processors is developing, it is still very slow and unreliable compared to GPU acceleration.
To keep playing with machine learning models, I looked around for cloud alternatives. I tried a few and found user testimonials of several. For now I have settled on using lightning.ai.
On lightning.ai you have four GPU options on the basic account: the L40S, L4, A10G, and T4 Nvidia GPUs. Since the main models I play with are related to scvi-tools, I wanted to compare how the performance of those available GPUs compare relative to typical prices (which vary daily, but some are typically cheaper). The SCVI-based models are quite small by modern standards, so higher specs such as GPU memory is not necessarily more useful for these models.
To benchmark the GPUs, I took a dataset with about 200k cells and trained a standard SCVI model for 10 epochs, noting the training times. I also ran the same training on my Mac desktop for comparison.
All the GPU options are substantially faster than the M4 Pro in the Mac. For my purposes, the L4 GPU, which ran the 10 epochs of training in 2:30, is the optimal choice. Only the L40S (which has 48 GB of memory compared to the 24 GB in the L4) was faster, at 2:16 for 10 epochs.