<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[nxn]]></title><description><![CDATA[What do you mean 'heterogeneity'?]]></description><link>https://www.nxn.se</link><image><url>https://substackcdn.com/image/fetch/$s_!Yq7P!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6948d04d-c52b-49f7-ace9-686a067382a2_600x600.png</url><title>nxn</title><link>https://www.nxn.se</link></image><generator>Substack</generator><lastBuildDate>Fri, 01 May 2026 01:47:19 GMT</lastBuildDate><atom:link href="https://www.nxn.se/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Valentine Svensson]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[nxnse@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[nxnse@substack.com]]></itunes:email><itunes:name><![CDATA[Valentine Svensson]]></itunes:name></itunes:owner><itunes:author><![CDATA[Valentine Svensson]]></itunes:author><googleplay:owner><![CDATA[nxnse@substack.com]]></googleplay:owner><googleplay:email><![CDATA[nxnse@substack.com]]></googleplay:email><googleplay:author><![CDATA[Valentine Svensson]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Improving SCVI for low-count cells through self-supervised augmentation]]></title><description><![CDATA[To improve utility of cells with low UMI counts, I wanted to optimize KL weighting in SCVI models.]]></description><link>https://www.nxn.se/p/improving-scvi-for-low-count-cells</link><guid isPermaLink="false">https://www.nxn.se/p/improving-scvi-for-low-count-cells</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Mon, 02 Mar 2026 05:54:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KtLS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>To improve utility of cells with low UMI counts, I wanted to optimize KL weighting in SCVI models. It turned out that the issues with analyzing lower-UMI cells with SCVI is not due to posterior collapse from the KL term, but instead reflects a learned behavior of the encoder in the SCVI model. By including a joint embedding cross correlation loss objective, the encoders achieve better performance for low-UMI cells. All the details are in the resulting paper, available at <a href="https://www.biorxiv.org/content/10.64898/2026.02.11.705441v1">https://www.biorxiv.org/content/10.64898/2026.02.11.705441v1</a>. This post summarizes the main results.</strong></p><p>In <a href="https://www.nxn.se/p/r81gsph58us02le8prfd35nrvfmsvc">an earlier post</a> I demonstrated that when a cell has low UMI counts, an <a href="https://scvi-tools.org">SCVI model</a> will place them in the center of the representation space. In that post, I used a small subset of genes and looked at how representations in a 2-dimensional SCVI model depended on UMI depth.</p><p>In that post, the low-count behavior could be explained by the balance between the negative binomial reconstruction likelihood and the KL divergence with the prior, which pulls representations to the center. When counts were low the KL term dominated.</p><p>Using the small set of 141 genes, you could see which total UMI counts you needed to beat the KL term and move cells away from the prior in the center of representation space. As a follow-up, I wanted to understand what kind of UMI library sizes you need to beat the prior in actual transcriptome wide data that I&#8217;d use for analysis. This way I could see what lower limits of useful total UMI counts are, and maybe I could put a lower weight on the KL term in the SCVI model to make it work better for cells with lower counts. This would enable analysis with cheaper and lower quality data.</p><p>It turns out, the KL term is not responsible for the low-UMI clustering behavior! In fact, low-UMI cells don&#8217;t converge towards the center. Instead, the SCVI encoder defines a <em>bias point</em> f([0, ..., 0]), where a hypothetical cell with zero counts from all genes is placed. This bias point is implicitly learned by the SCVI encoder during training, and it can be as far away from the center of representations space as high-UMI cells.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KtLS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KtLS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 424w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 848w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 1272w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KtLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png" width="1456" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:435770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/189621773?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KtLS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 424w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 848w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 1272w, https://substackcdn.com/image/fetch/$s_!KtLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c64b9e5-1c24-4ad1-829f-366ddfbbab2f_3107x1028.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It turns out the default KL term is far too weak to assert meaningful influence on the representations. To force convergence towards the center of representations space it needs to be upweighted about 100X. At that point, you will also start observing posterior collapse where the model fails to learn meaningful representations.</p><p>Lowering the KL weight on the other hand, does _not_ reduce convergence towards the bias point as total UMIs per cell decreases. So if we want to improve SCVI for cells with lower UMI counts, what can we do instead?</p><p>The way I investigated the behavior was by subsampling UMI counts of cells using binomial thinning. One idea was to include this thinning procedure during training, increasing the depth-diversity the model sees. This didn&#8217;t improve the the encoder: it still showed similar or worse performance on downstream tasks.</p><p>Instead of just showing the model more diverse training data, I added a <em><a href="https://arxiv.org/abs/2103.03230">joint embedding loss</a></em>. By passing original counts y and thinned counts y* through the encoder to make representations z and z*, we can then use a loss that penalizes the model when it embeds z and z* far apart from each other. In particular I used a cross correlation loss to simultaneously encourage close representations of z and z* and avoid posterior collapse.</p><p>This strategy makes the model learn encoders that slows down the convergence towards the bias point as UMIs per cell decreases. This directly leads to improved performance in terms of preserving cell type identity or experimental conditions in the learned representations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w2E_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w2E_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 424w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 848w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w2E_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png" width="1456" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:807630,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/189621773?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w2E_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 424w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 848w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 1272w, https://substackcdn.com/image/fetch/$s_!w2E_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba9a077-11ce-4639-8bca-3503384ae5e8_3028x1294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Looking at reconstruction performance, the inclusion of the joint embedding loss has minimal impact on the ability to normalize gene expression or simulate gene UMI counts.</p><p>With this additional loss, QC thresholds for data used in analysis can be lowered, and data from precious archival samples can be used more effectively.</p><p>I also investigated if this joint embedding loss is sufficient. Maybe we don&#8217;t need to perform reconstruction at all to learn useful biological cell representations?</p><p>It turns out including the reconstruction loss is crucial. Models trained with joint embedding loss alone fails completely on downstream tasks like classifying clusters or separating experimental conditions.</p><p>The full paper is available at <a href="https://www.biorxiv.org/content/10.64898/2026.02.11.705441v1">https://www.biorxiv.org/content/10.64898/2026.02.11.705441v1</a>. The models are implemented in the SCVI branch at <a href="https://github.com/vals/scVI/tree/scvi-joint-embedding">https://github.com/vals/scVI/tree/scvi-joint-embedding</a>, and code for analysis and figures are available at <a href="https://github.com/vals/scvi-joint-embedding-reproducibility">https://github.com/vals/scvi-joint-embedding-reproducibility</a>.</p><h2>From around the web</h2><ul><li><p><a href="https://arxiv.org/abs/0907.2478">Why we (usually) don&#8217;t have to worry about multiple comparisons</a></p></li><li><p><a href="https://arjunrajlab.substack.com/p/chance-favors-the-theoretically-prepared">Chance favors the (theoretically) prepared mind</a></p></li><li><p><a href="https://press.asimov.com/articles/adjuvants">The Origins of Adjuvants</a></p></li><li><p><a href="https://www.publicbooks.org/the-misuses-of-the-university/">The Misuses of the University</a></p></li><li><p><a href="https://www.cell.com/cell/abstract/S0092-8674(25)00746-9">Interferons in health and disease</a></p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/55088160058/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2QE5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2QE5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108840,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/55088160058/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/189621773?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2QE5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2QE5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff382414b-64fd-4774-93a9-7ba8b4b6e668_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Serverless GPUs for fast TSNE visualization]]></title><description><![CDATA[TSNE visualization is useful to explore data but is time consuming without access to an Nvidia GPU.]]></description><link>https://www.nxn.se/p/serverless-gpus-for-fast-tsne-visualization</link><guid isPermaLink="false">https://www.nxn.se/p/serverless-gpus-for-fast-tsne-visualization</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Wed, 21 Jan 2026 07:16:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!B3Q-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>TSNE visualization is useful to explore data but is time consuming without access to an Nvidia GPU. By using a serverless application at a cloud GPU provider you can seamlessly speed up TSNE visualization by four times compared to a fast locally running implementation.</p><p>To get intuition about latent representations and embeddings when modeling data, it is often useful to produce 2-dimensional visualizations using <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding">TSNE</a>. These can be used to explore what aspects of data the models are learning.</p><p>Creating TSNE visualizations was initially very time consuming and only worked for small datasets. Over time, data structures, algorithmic innovations, and, approximations has increased the scale of data you can investigate with TSNE in shorter time.</p><p>Recently, <a href="https://rapids.ai/">RapidsAI</a> made an extremely fast <a href="https://docs.rapids.ai/api/cuml/stable/api/#tsne">TSNE implementation</a> on Nvidia GPUs. If you have access to a consumer level Nvidia GPU, hundreds of thousands of data points can be visualized in a few seconds.</p><p>If you do not have access to an Nvidia GPU, you can instead use <a href="https://opentsne.readthedocs.io/en/stable/index.html">openTSNE</a>, a package with several highly optimized implementations of TSNE. These implementations are quite fast, but still <a href="https://www.nxn.se/p/an-attempt-at-speeding-up-tsne-using">an order of magnitude slower than the RapidsAI TSNE</a>.</p><p>This makes it attractive to rent a GPU instance on a cloud vendor for interactive work. When exploring models and data, being able to quickly iterate between options and visualize results is very valuable. However, moving your work to a cloud instance means a lot of overhead. You also need to pay for running the GPU instance even though the GPU itself is only being used in occasional bursts.</p><h2><strong>Serverless deployment of GPU accelerated TSNE</strong></h2><p>As an alternative to renting a GPU instance, we can use a &#8216;serverless&#8217; GPU provider just for the TSNE visualization that leverages a GPU.</p><p>The GPU provider <a href="https://modal.com/">Modal</a> lets you define small applications consisting of single functions that can be executed using their API. When you call the function, a GPU instance starts up in a few seconds, runs the function on the input, sends back the result to the client, then shuts down the instance after a few minutes of activity. In this model you only pay for the few seconds that the function takes to run.</p><p>I created a small Modal service which performs TSNE using a serverless GPU with a local Python client library: <a href="https://github.com/vals/gpu-embedding-service">https://github.com/vals/gpu-embedding-service</a>. With this, running RapidsAI&#8217;s TSNE is just a matter of doing</p><pre><code><code>from gpu_embedding import gpu_tsne
coords = gpu_tsne(X)</code></code></pre><p>No local GPU necessary. The data in <code>X</code> is sent to the service, which starts up in ten seconds if it&#8217;s not started, runs TSNE, then sends back the result as an array to the local client.</p><h2><strong>Serverless TSNE give results four times faster than local</strong></h2><p>Even with the overhead of needing to start the service and transfer the data it is about four times faster than running the equivalent TSNE locally on my Mac using 12 cores on 10-dimensional input embeddings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B3Q-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B3Q-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 424w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 848w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 1272w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B3Q-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png" width="1200" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89358,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/185273269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B3Q-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 424w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 848w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 1272w, https://substackcdn.com/image/fetch/$s_!B3Q-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9207028e-7b62-4970-acba-e15c8018bfa9_1200x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As input data grows larger (either in number of observations or dimensionality), the data transfer overhead will increase too, leading to diminishing returns. For many high-dimensional embeddings, this strategy won&#8217;t necessarily be useful compared to running openTSNE locally. However, in many cases the overhead is worth it.</p><p>For this benchmark I used 10-dimensional SCVI embeddings of the Asian Immune Diversity Atlas with <a href="https://www.nxn.se/p/scvi-with-variational-batch-encoding">variational batch encodings</a>.</p><p>Do you ever need to run TSNE on a million observations? Not really. With a random sample of ~100,000 observations from the data you probably have enough points to notice outlier populations even when faceting to ~10 categories. More data points allow you to facet the data more, but there is also a point when it is hard to read and interpret too many facets.</p><p>Scripts for benchmarking and plotting are available on GitHub at <a href="https://github.com/vals/Blog/tree/master/260120-serverless-tsne">https://github.com/vals/Blog/tree/master/260120-serverless-tsne</a></p><h1><strong>From around the web</strong></h1><ul><li><p><a href="https://doi.org/10.1126/science.abi5200">The spectrum of inflammatory responses</a></p></li><li><p><a href="https://ekernf01.github.io/target_gene_shenanigans">We predict that the transcript abundance from the gene we are knocking down will (checks notes) decrease</a></p></li><li><p><a href="https://blog.jck.bio/p/lifes-most-important-problems">Life&#8217;s most important problems</a></p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/55022840560/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h0cq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h0cq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110784,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/55022840560/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/185273269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h0cq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h0cq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe35eb082-9fd5-4c88-953f-21f681257d12_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Training SCVI - Metal acceleration]]></title><description><![CDATA[The single cell analysis package scvi-tools allow you to model data quickly and easily if you have access to a GPU.]]></description><link>https://www.nxn.se/p/training-scvi-metal-acceleration</link><guid isPermaLink="false">https://www.nxn.se/p/training-scvi-metal-acceleration</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Thu, 15 Jan 2026 08:09:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-nrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The single cell analysis package <a href="https://github.com/scverse/scvi-tools">scvi-tools</a> allow you to model data quickly and easily if you have access to a GPU. On a typical dataset (~100,000 cells) fitting a useful model takes a few minutes. However, if you don&#8217;t have access to a GPU, fitting a model is substantially slower, changing from &#8220;it can finish while I check my emails&#8221; to &#8220;I&#8217;ll work on something else until it&#8217;s finished&#8221;.</p><p>About a year ago, <a href="https://docs.scvi-tools.org/en/latest/changelog.html#id31">scvi-tools added support</a> for metal performance shaders (MPS), a compute accelerator on M-series Mac processors. If you&#8217;re analyzing single cell data on a modern Mac, installing scvi-tools with MPS support and using <code>accelerator="mps"</code> substantially speeds up SCVI model fitting without needing to move your work to a Linux server.</p><p>After optimizing a couple of training parameters (batch size: 512, learning rate: 0.004), useful models on typical datasets can be trained with MPS acceleration in about five minutes.</p><h2><strong>Training SCVI on CPU vs MPS</strong></h2><p>To test the improvement in training time we use a dataset by <a href="https://doi.org/10.1016/j.jpha.2023.02.006">Chen et al. 2023</a>, where the authors collected 90,852 cells from mouse spleens to study the immune response to surgically induced sepsis.</p><p>Since we are just interested in the training times, we can fit the model for just five epochs to quickly get timings, as well as getting data on initial training loss dynamics. (For analysis, I would recommend at least 25 epochs for a dataset with 100,000 cells).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-nrL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-nrL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 424w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 848w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 1272w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-nrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png" width="1200" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25227,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/184635168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-nrL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 424w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 848w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 1272w, https://substackcdn.com/image/fetch/$s_!-nrL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d238591-5d0d-4448-b58e-b5294658131c_1200x450.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Training five epochs with CPU takes 6m 55s, while using the MPS accelerator only takes 1m 40s. This alone makes training four times faster, but we can optimize it a bit further.</p><h2><strong>Increase batch size for faster MPS training</strong></h2><p>On Macs with M-series processors, MPS uses the same unified RAM as the CPU. This means the accelerator has a large amount of memory to work with (at least relative to consumer level GPUs). This means we can send more data at a time to the accelerator in batches. The default batch size in the SCVI trainer is 128.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!anuN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!anuN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 424w, https://substackcdn.com/image/fetch/$s_!anuN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 848w, https://substackcdn.com/image/fetch/$s_!anuN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 1272w, https://substackcdn.com/image/fetch/$s_!anuN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!anuN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png" width="1200" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:40402,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/184635168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!anuN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 424w, https://substackcdn.com/image/fetch/$s_!anuN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 848w, https://substackcdn.com/image/fetch/$s_!anuN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 1272w, https://substackcdn.com/image/fetch/$s_!anuN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a4099d-f671-498f-ac2c-d3e282848cc8_1200x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When increasing the batch size to 2,048 we get the training time for five epochs down to 1m 17s.</p><h2><strong>Recover loss performance by scaling learning rate</strong></h2><p>Increasing the batch size comes at a cost. The model is only updated once per batch, and larger batches mean fewer updates. As a consequence, the optimizer explores the parameter space slower. This leads to worse models with the same number of epochs. We also observe this over the five epochs in the benchmark training models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KY3z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KY3z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KY3z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/184635168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KY3z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!KY3z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d42568-7f8e-475e-9c7a-b6aa50b56eaf_1500x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To adjust for this effect, we can scale the learning rate of the training optimizer relative to the batch size. This way the model will take larger steps when there is a smaller number of updates.</p><p>A common and effective learning rate scaling approach is just to proportionally increase it with the batch size. Since the default batch size of 128 and default learning rate 0.001 in SCVI generally work very well for training, we can set a batch size scaled learning rate as <code>batch_size * 0.001 / 128</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LnSg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LnSg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LnSg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130214,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/184635168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LnSg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!LnSg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9feffb98-22e7-42b3-89dc-77429b293c7d_1500x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Changing the learning rate does not affect the training time, but will change the training loss dynamics over epochs. The proportional learning rate scaling does not recover the loss performance for the largest batch sizes, but we can identify a balance of training time and performance.</p><h2><strong>Conclusion</strong></h2><p>Based on the experiments, we get good training dynamics with a batch size of 512 when scaling the learning rate to 0.004. With these settings, training the five epochs takes 1m 23s. For a dataset with ~100,000 cells you usually get a useful SCVI model at 20-25 epochs, which would take around five minutes to train.</p><pre><code>Batch Size    LR      Time     Val Loss
128           0.001   100.0s   4911
256           0.002   89.0s    4880
512           0.004   83.1s    4856
1024          0.008   80.5s    4925
2048          0.016   77.7s    4968
4096          0.032   79.2s    5228</code></pre><p>I tried using SCVI with MPS a bit less than a year ago, but back then I couldn&#8217;t get it to work. If I had to guess, I had probably messed up my PyTorch installation. This time I didn&#8217;t have any issues getting it working. You can install scvi-tools with MPS support using <code>pip install -U scvi-tools[metal]</code>.</p><p>Over the last year I have been using the free tier of <a href="http://lightning.ai/">LightningAI</a> to train SCVI models for hobby projects. I like it, and might use it if I want to test something on larger data. But it will be nice to try simpler experiments locally.</p><p>Scripts for these benchmarks are available on GitHub: <a href="https://github.com/vals/Blog/tree/master/260114-scvi-metal">https://github.com/vals/Blog/tree/master/260114-scvi-metal</a></p><h1><strong>From around the web</strong></h1><ul><li><p><a href="https://www.nature.com/articles/s41586-025-09233-2">Selective remodelling of the adipose niche in obesity and weight loss</a></p></li><li><p><a href="https://blog.turbine.ai/p/assessing-a-virtual-cells-utility">Assessing a Virtual Cell&#8217;s utility</a></p></li><li><p><a href="https://www.themoviedb.org/movie/39538-contagion?language=en-US">Contagion (2011)</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/55022840535/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rvo9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rvo9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/55022840535/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/184635168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rvo9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rvo9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a5e6838-9c37-4d1e-85ec-fd00fd2286e9_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[SCVI - Estimating null expression levels]]></title><description><![CDATA[In this post I motivate why I think it would be useful to have clear thresholds for when a gene is not expressed, test a strategy to obtain such thresholds, and learn it doesn&#8217;t work.]]></description><link>https://www.nxn.se/p/scvi-estimating-null-expression-levels</link><guid isPermaLink="false">https://www.nxn.se/p/scvi-estimating-null-expression-levels</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Tue, 06 Jan 2026 07:54:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2oyR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>In this post I motivate why I think it would be useful to have clear thresholds for when a gene is not expressed, test a strategy to obtain such thresholds, and learn it doesn&#8217;t work.</strong></p><p>In cell biology, we think of discrete populations of cells, that we say do or do not express some (ideally) specific marker.</p><p>When we say that immune cells express CD45, that means &#8220;more CD45 than we can see in other cells from the same context&#8221;. T cells are considered expressing CD3, which means &#8220;more CD3 than other immune cells&#8221;, and defining CD4+ T helper cells means &#8220;T cells that express more CD4 than other T cells&#8221;.</p><p>Mentally, we are always operating on a relative scale. We work at the very edge of what technologies can detect. To illustrate that something exists, it is in comparison to a negative control where we know it doesn&#8217;t exist.</p><p>In single cell genomics, a gene being highly expressed in a cell can have two different meaning: the gene is highly expressed in that cell relative to other cells, or the gene is highly expressed relative to other genes in that cell.</p><p>This relative relation between gene expression levels in individual cells is explicitly modeled in <a href="https://scvi-tools.org/">SCVI</a>. The SCVI model takes observed UMI counts as input, encodes them to representation vectors z, then decodes them to relative abundance levels rho for each gene. A softmax operation at the end of the decoder ensures that the relative abundances sum to 1:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\nw &amp;= f(z), \\\\\n\\rho_g &amp;= \\sigma(w)_g = \\frac{\\exp(w_g)}{\\sum_{i = 1}^G \\exp(w_i)}.\n\\end{aligned}&quot;,&quot;id&quot;:&quot;RXPJDLEALH&quot;}" data-component-name="LatexBlockToDOM"></div><p>During training, the relative abundances, rho, are combined with the observed library size, l, to quantify whether the predicted relative abundance is compatible with the observed UMI counts of a gene g in a cell:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{Reconstruction error } = \\sum_{c, g} \\text{NB}(y_{c, g} | \\rho_{c, g} \\cdot \\ell, \\phi_g).&quot;,&quot;id&quot;:&quot;CUTUIVBOBA&quot;}" data-component-name="LatexBlockToDOM"></div><p>The predicted rho values can be treated as normalized expression levels. Normalized, since they aim to remove the observed variation due to differences in library size. They also tend to have higher resolution than observed UMI counts, summarizing informative coexpression relationships between all the observed input genes through the encoder and decoder, resulting in continuous rather than discrete values.</p><p>These normalized expression values are great for exploring single cell RNA-sequencing data. Many cell types and cell states are defined by having &#8216;high&#8217; expression of some gene, then additionally &#8216;high&#8217; or &#8216;low&#8217; expression of some other gene.</p><p>As an example, we can use a recent dataset by <a href="https://doi.org/10.1016/j.isci.2025.114219">Michki et al 2025</a>. In the paper, the authors investigate the effects of e coli infection in juvenile and neonatal mouse lungs. They collected lungs at 48 hours and 96 hours after infecting the mice with e coli, as well as negative control samples that can be used for comparison.</p><p>After fitting and saving an SCVI model, we can use the model to generate normalized expression levels rho for any genes we are interested in. If we want to study T cells, we might produce normalized expression levels for the genes Ptprc and Cd3d. Ptprc is the gene encoding CD45, a transmembrane receptor which enables fundamental external-to-internal protein-protein interactions required for almost all immune cells. Its presence is a classical global marker for immune cells. The gene Cd3d encodes the delta chain of the CD3 protein complex, which together with the T cell receptor complex enables antigen recognition by T cells.</p><p>To explore the lung cells from the experiment, we can plot the bivariate distribution of Ptprc and Cd3d expression.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2oyR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2oyR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2oyR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png" width="1456" height="2184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2184,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1886144,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/183646189?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2oyR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!2oyR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8dcd9a-4bf6-4a0b-8371-a42c172a2e89_2400x3600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can clearly see a population of cells with low Ptprc expression, and another population with high Ptprc expression. This is most likely a population of immune cells. We can also see, within the Ptprc high population, there is a subpopulation of cells with &#8216;high&#8217; Cd3d expression. These are likely T cells.</p><p>The plot is faceted over experimental conditions to highlight the consistency of these &#8216;high&#8217; and &#8216;low&#8217; expressing populations are between samples. Nested within these highlighted experimental conditions are also neonatal and juvenile conditions, and biological replicates.</p><p>With the scRNA-seq data though, we can dive further into the Cd3d-high T cells. There are multiple subsets of T cells. This includes Th17 cells, an effector subset of CD4+ T helper cells which provides protection at barrier tissues (like in lungs) by secreting the cytokine IL-17, which in turn activates nearby cells to produce chemokines to recruit neutrophils.</p><p>Production of IL-17 is regulated by the transcription factor ROR&#947;t, which is encoded by the gene Rorc in mice. Expression of Rorc in T cells defines the ability to perform the IL-17 production function of Th17 cells. So we can look at the bivariate distribution of Cd3d expression and Rorc expression.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v7tJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v7tJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v7tJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png" width="1456" height="2184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2184,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2070832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/183646189?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v7tJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!v7tJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73d0ed23-7f65-4c4e-9e63-f3c1b4282f93_2400x3600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Within the Cd3d-high population of cells, we can see a number of cells with &#8216;high&#8217; expression of Rorc, most likely indicating Th17 cells.</p><p>We can note from these examples that on the normalized relative abundance scale of gene expression, expression levels are never exactly 0. This is a fundamental property of the relative abundance scale the rho values are defined on that is used to model the observed counts.</p><p>The rho values can be interpreted as the probability of seeing a molecule from a given gene in a cell if you&#8217;d sample a new molecule. In this setting, a rho value of 0 is a very strong statement: it means it is impossible to see molecules from that gene. Instead, the values can be very low, making it extremely unlikely we&#8217;d see a molecule, yet still possible. A rho value of 1e-9 means that if we&#8217;d sample a billion molecules, we should only expect to see one molecule produced from the gene.</p><p>On the other hand, we can easily detect populations with &#8216;low&#8217; or &#8216;high&#8217; normalized expression in different genes. The subdivisions into subpopulations make sense on the relative scale defined by the full dataset. Even though we don&#8217;t have a clear threshold in the form of &#8216;if a gene has normalized expression value above X in a cell, it expresses the gene&#8217;.</p><p>What if we could have the model infer and report at which level a gene is definitely expressed?</p><p>With this, we could automatically set thresholds for when genes are <em>actually</em> expressed, and distinguish cells where we definitely should not expect to see a molecule from a gene, ever.</p><p>To learn what rho values an SCVI model considers as &#8216;0&#8217;, we can fit the model on a modified dataset where we artificially add a gene called &#8216;NULL_CONTROL&#8217; that has observed 0 expression in all cells. Then we can decode the normalized expression rho for the &#8216;NULL_CONTROL&#8217; in every cell.</p><p>For any real gene, we can compare it&#8217;s expression level in a cell to the expression level of &#8216;NULL_CONTROL&#8217; in the same cell. If the gene expression is similar to the &#8216;NULL_CONTROL&#8217; expression, the model can&#8217;t distinguish the expression level from 0 given the data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dOYn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dOYn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dOYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png" width="1456" height="2184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2184,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1985016,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/183646189?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dOYn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 424w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 848w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 1272w, https://substackcdn.com/image/fetch/$s_!dOYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa6d6af-f75a-42be-9b03-3873dbafe61a_2400x3600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What we see though, is that the predicted expression levels from the artificial all-zero &#8216;NULL_CONTROL&#8217; are extremely low in all cells. In the plots, the dashed line is a unit line, and Rorc expression is higher than &#8216;NULL_CONTROL&#8217; expression in all cells.</p><p>Unfortunately, this means that this strategy won&#8217;t work to automatically define cells truly expressing a gene. The way the model sees it, you are more likely to see molecules from Rorc than from &#8216;NULL_CONTROL&#8217;. This is probably due to some very small number UMI&#8217;s being counted in a way that doesn&#8217;t respect the cell populations. This can happen due to experimental measurement reasons or spurious read assignment reasons. In the end, the data is informing the model that regardless the cell, there is a small (yet nonzero) chance to observed Rorc molecules. In a way, it is positive that the model is respecting the uncertainty in the true data.</p><p>Yet, our lives would be easier if we had a clear threshold for defining, for example, CD4+ T cells, or other cell types with known markers. Empirically, I have found that a threshold of 1e-4 is a good first guess at separating &#8216;high&#8217; from &#8216;low&#8217; expressing cells (meaning, it often reproduces known cell type hierarchies). The scale of the rho values will depend on how many genes are in the dataset the model is fitted to, and generally the complexity of the full population of cells.</p><p>This was a pretty quick experiment of adding the artificial &#8216;NULL_CONTROL&#8217; before fitting an SCVI model. I felt the post needed to be quite long though, because I never see people describing these gene distribution explorations of scRNA-seq data, and outside the context of this relative expression -based workflow it&#8217;s not so clear why it would be useful to obtain true 0 thresholds.</p><p>Code and notebook for this analysis are available on <a href="https://github.com/vals/Blog/tree/master/260105-null-expression">Github</a></p><h1><strong>From around the web</strong></h1><ul><li><p><a href="https://www.biorxiv.org/content/10.1101/2023.11.28.568839v2">Interpretable Inflammation Landscape of Circulating Immune cells</a></p></li><li><p><a href="https://giovannipalla.substack.com/p/virtual-cell-perturbation-metrics">Virtual Cell Perturbation Metrics Reloaded</a></p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/55022840535/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0QLo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0QLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/55022840535/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/183646189?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0QLo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0QLo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5efdc8c2-b180-4f44-ae2f-e17e237a974b_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Claude agent skill to infer experimental designs from .h5ad’s]]></title><description><![CDATA[Anthropic recently launched Claude for Life Sciences, which in practical terms means a marketplace of plugins, MCP&#8217;s, and agent skills focused on tasks you encounter in biomedical research.]]></description><link>https://www.nxn.se/p/claude-agent-skill-to-infer-experimental</link><guid isPermaLink="false">https://www.nxn.se/p/claude-agent-skill-to-infer-experimental</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Fri, 14 Nov 2025 05:56:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e286fa5b-625b-484b-ac1b-dcef5ac68e7e_566x381.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Anthropic recently launched <a href="https://www.anthropic.com/news/claude-for-life-sciences">Claude for Life Sciences</a>, which in practical terms means a marketplace of plugins, MCP&#8217;s, and agent skills focused on <a href="https://github.com/anthropics/life-sciences/tree/main">tasks you encounter in biomedical research</a>.</p><p>Part of this is an agent skill to <a href="https://github.com/anthropics/life-sciences/blob/main/single-cell-rna-qc/SKILL.md">perform QC analysis for scRNA-seq data using scanpy</a>. Agent skills were also <a href="https://www.claude.com/blog/skills">introduced recently</a>. These are folders with instructions about when to use the skill, what to do in the cases where the skill gets triggered, and helper tools in the form of CLI scripts to perform the tasks needed.</p><p>When I want to explore some idea in scRNA-seq analysis, I usually start with finding a new dataset to work with. I get tired of using the same dataset all the time, and different experimental designs are good for trying different ideas.</p><p>Once I get a new interesting dataset I usually try to figure out the design of the experiment based on how the original researchers structured the data. There are many pieces of information that often tell the story of what was done. The data might be in different files, where the file names are informative. The different barcodes can indicate which experimental conditions were performed first or last. Naming conventions in samples tell you what original idea was, what might have been tacked on later, and comparisons that make sense that the researchers didn&#8217;t report on for some reason. It is a puzzle to solve.</p><p>This can take some time, so I created a Claude agent skill to do this, available at <a href="https://github.com/vals/anndata-design-inspector">https://github.com/vals/anndata-design-inspector</a>.</p><p>The skill provides helper scripts that uses <a href="https://support.hdfgroup.org/documentation/hdf5/latest/_view_tools_command.html">hdf5-tools</a> to explore column names and categories of a provided .h5ad file straight from the command line without needing to load the full dataset. If the tools aren&#8217;t available, the skill will install them.</p><p>The goal of the skill is to describe the experimental design of the dataset. I wanted this to be concise and easy to compare between datasets. There has been some work on this, <a href="https://arxiv.org/abs/1912.08567">for example using Hasse diagrams</a>. The currently available options are very flexible and complete, but unintuitive in the settings I usually encounter. To help with this, I created <a href="https://github.com/vals/edviz">a domain specific language that defines a grammar for experimental designs</a>. The goal of the agent then becomes to produce a string in this grammar that describes the experiment. The package with the grammar definition also includes code for parsing the grammar into an ASCII-art visualization of the structure of the experiment. This package is also installed by the skill to create visualizations.</p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472; Design Structure &#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                                                                &#9474;
&#9474; ProcessingBatch(6)&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9559;   &#9474;
&#9474;                                                            &#9553;   &#9474;
&#9474; Center(3)  &#8776;&#8776;&#8776;&#8776;  Protocol(2)                               &#9553;   &#9474;
&#9474;    &#8595;                &#8595;                                      &#9553;   &#9474;
&#9474; Patient([30 | 25 | 18])                                    &#9553;   &#9474;
&#9474;    &#8595;                                                       &#9553;   &#9474;
&#9474; Sample(2) &#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9552;&#9565;   &#9474;
&#9474;    &#8595;                                                           &#9474;
&#9474; Cell(~5000)                                                    &#9474;
&#9474;    :                                                           &#9474;
&#9474; CellType(42)                                                   &#9474;
&#9474;                                                                &#9474;
&#9474;   Confounded: Center &#8776;&#8776; Protocol                               &#9474;
&#9474;   Batch: ProcessingBatch &#9552;&#9552; Sample                             &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
</code></code></pre><p>In addition to puzzling out the experimental design, the agent using the skill infers the research context of the experiment and provides some comments about the biological theory behind the experiment.</p><p>This all gets reported in a summarized &#8216;experiment card&#8217; markdown file following a standard structure. With these markdown files I can get an idea of what the experiment was at a glance. I put up a couple of examples of experiment cards, for the datasets <a href="https://gist.github.com/vals/d4dbcd64e1f0f376a28d5a938dedf20e">GSE166504</a> and <a href="https://gist.github.com/vals/04a9c6813199b1cb0588cd734ecfa48a">GSE290106</a>, as gists.</p><p>The visualizations simplify all the subtleties of the design, but are design to highlight important aspects of the design such as crossing and nesting at different levels, which helps you think about how to handle the different types of variation and confounding present in the data.</p><p>I&#8217;ve used it on ~20 different datasets, and usually performs quite well. Sometimes it over-summarizes the experiment so you might miss some hierarchical structures, but it tends to give you a lot of information about what was going on in it.</p><h1>From around the web</h1><ul><li><p><a href="https://www.nature.com/articles/s41592-025-02808-x">Deep generative modeling of sample-level heterogeneity in single-cell genomics</a></p></li><li><p><a href="https://grantland.com/features/the-strange-case-super-mario-bros-movie/">Hollywood Archaeology: The Super Mario Bros. Movie</a></p></li><li><p><a href="https://www.filfre.net/2024/10/the-truth-is-out-there-part-4-the-downside-of-belief/">The Truth Is Out There, Part 4: The Downside of Belief</a></p></li><li><p><a href="https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything">We still can&#8217;t predict much of anything in biology</a></p></li><li><p><a href="https://www.writingruxandrabio.com/p/what-will-it-take-for-ai-to-change">What will it take for AI to change drug discovery?</a></p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54702071677/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gb0v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gb0v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg" width="799" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54702071677/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/178861597?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gb0v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gb0v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8fcd2080-36b0-46c8-a5a9-7fdb3497d441_799x432.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Training SCVI — Differential expression over epochs]]></title><description><![CDATA[With a trained SCVI model you can perform differential expression analysis to learn which genes are increased and decreased in gene expression as cells change their states.]]></description><link>https://www.nxn.se/p/training-scvi-differential-expression</link><guid isPermaLink="false">https://www.nxn.se/p/training-scvi-differential-expression</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Tue, 14 Oct 2025 06:13:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_V_T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With a trained <a href="https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html">SCVI model</a> you can perform <a href="https://www.nxn.se/p/vaes-are-explainable-differential-expression-in-scvi">differential expression analysis</a> to learn which genes are increased and decreased in gene expression as cells change their states.</p><p>Previously, we looked at <a href="https://www.nxn.se/p/training-scvi-posterior-predictive-distributions-over-epochs">how posterior predictive distributions change during model fitting</a>, where we saw that they stabilized after relatively few epochs.</p><p>Here, we want to see how differential expression results change as an SCVI model fits to a dataset.</p><p>As an example, we can use a recent dataset from <a href="https://www.nature.com/articles/s41590-024-01860-7">Li et al. 2024</a>. The paper describes identification of a population of stem-like T cells that are enriched in inflamed areas of the colon in patients with ulcerative colitis. The authors isolated T cells from colons of healthy donors and ulcerative colitis patients. From patients with active ulcerative colitis, they collected T cells from non-inflamed tissues and inflamed tissues from 10 patients. This is a nice experimental design that allows comparison of gene expression levels between the inflamed and non-inflamed tissues.</p><p>The authors annotated 11 different T cell subsets, but applying a filter to retain only T cell populations with at least 50 cells in both inflamed and non-inflamed tissues leaves eight subsets.</p><p>We fit the SCVI model to the full dataset, 15,411 cells with measurements of 14,203 genes. For the differential expression analysis we limit the cells only to the eight T cells subsets with sufficient cell numbers and the paired inflamed/non-inflamed tissues. We fit the SCVI model for a total of 100 epochs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_V_T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_V_T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_V_T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56542,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/176114604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_V_T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_V_T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefebeec-5730-4c76-9199-898e7c7a727c_1500x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can see in the loss curves that training set loss is still decreasing, but the validation set loss appears to have converged after about 50 epochs.</p><p>After each epoch, we stop training and perform differential expression analysis per cell type between inflamed and non-inflamed tissue. The results are stored, then visualized as volcano plots for each epoch. To see how the results change, we convert them to an animation that can be viewed below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hf79!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hf79!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 424w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 848w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hf79!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif" width="800" height="686" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:686,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11657445,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/176114604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hf79!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 424w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 848w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 1272w, https://substackcdn.com/image/fetch/$s_!Hf79!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb61731cb-e79d-4417-a1f9-fbda00aaa2c3_800x686.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over the first 10 epochs, the results are changing dramatically. Following these epochs though, the results do not appear to change systematically. We do see that estimated fold changes and p-values vary between epochs. P-values change more than log2 fold changes between epochs; after convergence (epoch 50), log2 fold changes have a standard deviation of 0.023 on average between epochs, which is well below observable variation. In this case, no genes have p-values smaller than 0.05</p><p>The log2 fold change estimation in the SCVI model stabilized faster than I had expected.</p><p>Scripts for performing the analysis and creating the visualizations are available <a href="https://github.com/vals/Blog/tree/master/251013-de-over-epochs">on Github</a>.</p><h2>From around the web</h2><ul><li><p><a href="https://www.nature.com/articles/s41590-025-02161-3">A single-cell spatial chart of the airway wall reveals proinflammatory cellular ecosystems and their interactions in health and asthma</a></p></li><li><p><a href="https://www.nature.com/articles/s41590-025-02161-3">A single-cell spatial chart of the airway wall reveals proinflammatory cellular ecosystems and their interactions in health and asthma</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54703107633/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mz_D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mz_D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg" width="798" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:798,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112093,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54703107633/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/176114604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mz_D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Mz_D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82031dbb-c721-48f4-82bb-25fff2d7e57a_798x432.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Causal temperatures]]></title><description><![CDATA[Time is special.]]></description><link>https://www.nxn.se/p/causal-temperatures</link><guid isPermaLink="false">https://www.nxn.se/p/causal-temperatures</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Sun, 05 Oct 2025 08:29:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9del!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Time is special. It only moves in one direction. Seeing one thing happen before another thing carries a lot of information. The second thing cannot have caused the first thing to happen.</p><p>For a long time, I have believed that the lack of high resolution temporal data of gene expression has been a bottleneck in learning accurate regulatory networks of transcriptional regulation (this was the basis for my <a href="https://www.youtube.com/watch?v=m2nPt1YhX3A">research program</a> before going to the therapeutics industry). Since technologies for these measurement don&#8217;t exist, I have kept an eye out for other, comparable, datasets to learn how analysis methods for temporal causal inference works.</p><p>A while ago a got a Home Assistant server and connected multiple thermometers throughout my non-air conditioned apartment to it. Sensors, like thermometers, are recorded over time in Home Assistant, allowing you to look up history at high resolution for long periods of time.</p><p>This seemed like an interesting opportunity to test out <em><a href="https://en.wikipedia.org/wiki/Granger_causality">Granger analysis</a></em>, a classical method to identify causality from time series data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8TPl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8TPl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 424w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 848w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8TPl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png" width="1352" height="1154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1154,&quot;width&quot;:1352,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:415006,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8TPl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 424w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 848w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!8TPl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff3d53a-0820-4503-8282-620a2a662462_1352x1154.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Not having an HVAC system, I have a pretty good idea of how air flows between the rooms. This should determine heat exchange between them. Which rooms heat up first throughout the day and transport heat to the other rooms is not obvious though.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wzk0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wzk0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 424w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 848w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 1272w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wzk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png" width="534" height="548.5901639344262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:940,&quot;width&quot;:915,&quot;resizeWidth&quot;:534,&quot;bytes&quot;:55223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wzk0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 424w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 848w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 1272w, https://substackcdn.com/image/fetch/$s_!Wzk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa80e6a95-0b61-43fa-bcf2-02d3209c30c0_915x940.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I downloaded the temperature sensor data from Home Assistant for the last few months, comprising 37,110 measurements from ten thermometers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1ORQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1ORQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 424w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 848w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1ORQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png" width="1456" height="1094" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1094,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:665321,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1ORQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 424w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 848w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!1ORQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3353cf63-fa4f-49d9-a4b9-52e51eab5e8a_2412x1812.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Longer term seasonal trends can be seen through correlated increases in average temperature across ambient air thermometers. When looking at the full dataset, it is hard to daily fluctuations in temperature.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9del!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9del!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 424w, https://substackcdn.com/image/fetch/$s_!9del!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 848w, https://substackcdn.com/image/fetch/$s_!9del!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!9del!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9del!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png" width="1456" height="1094" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1094,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:432753,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9del!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 424w, https://substackcdn.com/image/fetch/$s_!9del!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 848w, https://substackcdn.com/image/fetch/$s_!9del!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!9del!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e9869c-4186-47e3-b272-028dbc2afcda_2412x1812.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I have split up the thermometers into three categories: environmental sensors, refrigeration, and NAS. Environmental sensors measures the temperature of the air in the rooms or closets. Refrigeration thermometers measures the temperatures in fridges, and should (ideally) not be affected by room temperatures, and can be thought of a negative controls for a causal analysis. NAS is the temperature of my NAS which is hidden away in an IT closet. The temperature of the NAS is determined by the load as it is being used.</p><p>The oscillatory temperature changes in the fridges is due to the defrost cycle. Modern fridges and freezers vary the temperature to prevent frost buildup. This is why labs need to buy special, pricier, -20 freezers which maintains temperature more consistently.</p><p>If you look closely at the environmental sensors, you might see that temperatures in some rooms peak before the temperature in other rooms. This property is what we&#8217;re hoping to use to learn directionality in heat transfer between rooms.</p><p>The classical method for this is Granger analysis. The idea is that you fit two models to your data: one simple autoregressive model,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t \\sim \\text{N}\\left(\\alpha_0 + \\sum_{i = 1}^p \\alpha_i \\cdot y_{t - i}, \\sigma \\right),&quot;,&quot;id&quot;:&quot;GWCGKLIPXC&quot;}" data-component-name="LatexBlockToDOM"></div><p>and one bivariate autoregressive model,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;y_t \\sim \\text{N}\\left(\\alpha_0 + \\sum_{i = 1}^p \\alpha_i \\cdot y_{t - i} + \\sum_{i = 1}^p \\beta_i \\cdot x_{t - i}, \\ \\sigma \\right),&quot;,&quot;id&quot;:&quot;IDKWUYLJWW&quot;}" data-component-name="LatexBlockToDOM"></div><p>where x are measurements from the potentially causal source. The null hypothesis, that x does not cause y, means that all beta coefficients are 0. Or, in other words, predictions when including x-values do not improve upon using only y-values.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9J8y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9J8y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 424w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 848w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9J8y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png" width="1456" height="1094" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1094,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:358157,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9J8y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 424w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 848w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 1272w, https://substackcdn.com/image/fetch/$s_!9J8y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F534ea4ba-f56d-45a2-a71f-6605ee451e53_2411x1812.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The models can be compared using an F-statistic that compares the residual sum of squares for the two different models:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;F = \\frac{\\left(\\text{RSS}_\\text{restricted} - \\text{RSS}_\\text{unrestricted}\\right) / p}{\\text{RSS}_\\text{unrestricted}/(n - 2 \\cdot p - 1)}.&quot;,&quot;id&quot;:&quot;SAWIHILCKN&quot;}" data-component-name="LatexBlockToDOM"></div><p>The simple autoregressive model is <em>restricted</em> since it uses a subset of the predictors available for the bivariate autoregressive model which also includes potentially causal measurements.</p><p>In the example above, the top panel uses three previous time points of office temperatures to predict temperature at a given time point, while the bottom panel can also use temperatures from three prior time points from the living room. The qualitative difference in predictions are hard to see by eye in the figure, but it has an F-statistic of 210, corresponding to an error reduction from an RMSE of 0.0970&#176;C to an RMSE of 0.0671&#176;C.</p><p>To build a causal network, we can pairwise fit these models to all possible combinations of temperature sensors, meaning 10 * 9 = 90 possible pairs.</p><p>There are some additional implementation details for the analysis.</p><p>To avoid spurious correlations from seasonality, models are fitted to first differences of the temperatures:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\Delta y_t = y_t - t_{t-1}.&quot;,&quot;id&quot;:&quot;SLAGNRNUIL&quot;}" data-component-name="LatexBlockToDOM"></div><p>We also perform the analysis with multiple lag orders, between one and five hours at one-hour increments. That is, we fit one model using only one hour prior as predictor, another model with one and two hours prior as predictor, etc., up to five hours. The hourly intervals and limiting to five hours was chosen based on it being plausible for temperatures to change at these time scales.</p><p>After fitting all the models and getting all the F-statistics, we can use them to visualize the causal graph of how temperature flows through the apartment based on the sensor data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iGMG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iGMG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 424w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 848w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 1272w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iGMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png" width="1456" height="1163" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1163,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:391824,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iGMG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 424w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 848w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 1272w, https://substackcdn.com/image/fetch/$s_!iGMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa4b312-ae58-4cba-aa3f-93ea2da3e9c2_2970x2373.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This diagram only shows relations with F-statistics larger than 400, a very strict threshold corresponding to a p-value of about 10^-85. Almost all relations between sensors pass a p-value threshold of 0.05. Since we have a very large number of observations, the test is powered to detect extremely small potential causes in a null hypothesis framework, and it is probably more productive to think about strength of evidence rather than null hypothesis significance testing.</p><p>The strongest link is from the living room to the office, which is believable. On the other hand, the 3D printer closet is physically located inside the office, and we are finding a stronger causal link between the living room and the 3D printer closet than from the office to the 3D printer closet. This seems like a false positive.</p><p>I would have expected the outdoors temperature to be a strong predictor throughout the system, but it&#8217;s not unreasonable that it&#8217;s not directly causal: sunlight hitting the apartment is likely to heat up the rooms before the outside temperature rises.</p><p>The lack of sunlight as a predictor is a general issue with the analysis. When there is an unobserved underlying causal factor, Granger analysis tend to produce false positive results. It is pretty likely that sunlight directly affects three of the rooms acting as a hidden confounder.</p><p>The wine fridge is affected by a lot sources. This is somewhat believable, because wine fridges are usually of much lower quality than standard fridges and don&#8217;t maintain consistent temperature well. It is not very believable that the IT closet or the 3D printer closet temperatures directly affect the temperature of the wine fridge; they should at least act through the office or living temperatures.</p><p>This highlights another shortcoming of this strategy. We are simply extracting the pairwise causal results. If there is an intermediate node, we are not testing if mediation through that node better explains the system than direct connections. Mediation should be visible in the system though, by triplets of nodes A, B, C being connected as A &#8594; B, A&#8594; C, B &#8594; C, if A causes C mediated through B.</p><p>In addition to missing sunlight, other potential interventional factors like opening windows or cooking or running the 3D printer are not captured here. Though, I don&#8217;t do those very often, so I don&#8217;t think they affect the data at the majority of the time points.</p><h2>Takeaways</h2><p>Compared to the kind of data that can be generated for gene expression, this data is extremely high resolution, has an enormous amount of observations, and very little observational noise. The system is also very small compared to transcriptional regulation. But even with this data, accurately identifying a causal network is challenging.</p><p>It leaves me a bit pessimistic about the potential to identify transcriptional regulatory networks even with measurement technologies that currently don&#8217;t exist.</p><p>Granger analysis is a classical method that has been around for over half a century. Shortcomings of Granger analysis are well-known. I was hoping to find modern alternatives, and in the last few years the statistics community has had a large focus causality identification. There are many new methods and strategies, but to my surprise they tend to not make use of temporal information.</p><p>Scripts and data for this post are available on <a href="https://github.com/vals/Blog/tree/master/251005-causal-temperatures">Github</a>.</p><h1>From around the web</h1><ul><li><p><a href="https://blog.tahoebio.ai/p/target-deconvolution-through-data">Target Deconvolution Through Data Integration: Unifying Drug and Genetic Perturbations</a></p></li></ul><ul><li><p><a href="https://www.abnormalmapping.com/">Abnormal mapping</a></p></li><li><p><a href="https://sidestory.show/">Side Story</a></p></li><li><p><a href="https://rangedtouch.com/shelved-by-genre/">Shelved By Genre</a></p></li><li><p><a href="https://remapradio.com/">Remap radio</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54702071707/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cUg5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cUg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg" width="799" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76731,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54702071707/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/175325886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cUg5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cUg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe98d3688-b649-4e96-95c9-ff2caef1b99d_799x432.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Negative binomial regression and inference using a pre-trained transformer]]></title><description><![CDATA[I wanted to speed up negative binomial regression with a novel transformer model, but it actually turns out that a classical method of moments is a better solution.]]></description><link>https://www.nxn.se/p/negative-binomial-regression-and</link><guid isPermaLink="false">https://www.nxn.se/p/negative-binomial-regression-and</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Tue, 23 Sep 2025 05:53:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SpYP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I wanted to speed up negative binomial regression with a novel transformer model, but it actually turns out that a classical method of moments is a better solution. For this post I will focus on concepts, and instead refer to the resulting paper for more in depth details: <a href="https://arxiv.org/abs/2508.04111">https://arxiv.org/abs/2508.04111</a></p><h2>Differences in counts</h2><p>When analyzing scRNA-seq data, we are ultimately comparing molecule counts from different sources (genes, cell types, experimental conditions).</p><p>There are many other situations where your observations are counts.</p><p>As we learn to interpret data, we get good intuition about variabilities and magnitudes for measured values. These intuitions are typically based on more continuous values, and unfortunately counts behave differently from these.</p><p>Specifically, comparing magnitudes of observed counts between two groups of observations seems like it&#8217;s straightforward. Unfortunately, with a small number of samples (less than around ten) with observed counts of small magnitudes (also when typical counts are less than around ten) these comparisons are very challenging. Standard strategies for estimating means and variances from the data will fail, leading to inaccurate effect size estimation and p-values that can drastically change in magnitude from small sampling variation in the data.</p><p>The typical tool to solve these issues is negative binomial regression. This is used to estimate parameters needed to compare the groups of samples by directly modeling the properties of discrete counts. The negative binomial distribution has two parameters: mean and over-dispersion. Means are used for effect sizes. Over-dispersion indicates how much unidentified variation there is between samples on top of the variation from the counting process, which is needed to get p-values for the effect size.</p><p>Negative binomial regression is not magic though. Estimating the over-dispersion is very difficult with few observations and small counts. Such a small amount of data contains very limited information. Still, it is the best we have.</p><p>There is another issue however: the standard procedure to estimate the parameters of the negative binomial regression model is computationally expensive. For an individual comparison, computation is negligible, finishing in a millisecond. However, the ability to generate data through, for example, genome-wide CRISPR screens, and similar technologies, means that we need to do a huge number of comparisons. In the CRISPR screen example, if we knock down 20,000 genes and read out expression for 20,000 genes, we need to perform 400,000,000 comparisons. Even if a comparison takes a millisecond, this will still take over a hundred hours.</p><h2>Iterative and non-iterative statistics</h2><p>The reason that negative binomial regression is slow is because it typically uses an iterative method to estimate the parameters. It maximizes that likelihood by gradually improving the parameters estimates using analytical gradients.</p><p>These iterative parameter estimation methods have been around for a century, but were labor intensive before the invention of computers. A large amount of statistical methods research aimed to estimate parameters without iterative updates, referred to as non-iterative statistics. These methods use simple operations in one step to estimate statistics. For example, the mean in a normal distribution is non-iterative statistic; you sum the values and divide by the number of observations. You <em>can</em> estimate the mean with iterative maximum likelihood optimization, but it&#8217;s unnecessary, and might even produce worse results from numerical errors.</p><p>The standard iterative method to estimate the parameters in negative binomial distribution is called &#8216;<a href="https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares">iteratively reweighted least squares</a>&#8217;.</p><p>Another way to estimate parameters is through the <em><a href="https://en.m.wikipedia.org/wiki/Method_of_moments_(statistics)">method of moments</a></em>. Moments are values that describe properties of a probability distribution, such as mean, variance and skewness. For a given parameterized probability distribution, the observed moments can be expressed as functions of the parameters. In special cases, you can then solve for the parameters of interest with respect to the observed moments.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://books.google.com/ngrams/graph?content=method+of+moments%2Citeratively+reweighted+least+squares%2Crestricted+maximum+likelihood&amp;year_start=1950&amp;year_end=2022&amp;case_insensitive=true&amp;corpus=en&amp;smoothing=5" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SpYP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 424w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 848w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 1272w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SpYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png" width="1456" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://books.google.com/ngrams/graph?content=method+of+moments%2Citeratively+reweighted+least+squares%2Crestricted+maximum+likelihood&amp;year_start=1950&amp;year_end=2022&amp;case_insensitive=true&amp;corpus=en&amp;smoothing=5&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/174313935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SpYP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 424w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 848w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 1272w, https://substackcdn.com/image/fetch/$s_!SpYP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0b4de28-7ab6-41da-bc36-7a67e00c8f1a_1570x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Up until the beginning of the 1990&#8217;s, there was increasing interest in the method of moments. In the early era of data collection, performing a statistical analysis was expensive. Puzzling out method of moments estimators for a given problem was a worthwhile effort, because the computational resources needed to do statistical inference were limited. As computational hardware became more affordable, the need for these custom solutions went away.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://ourworldindata.org/grapher/historical-cost-of-computer-memory-and-storage?time=1970..latest" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kUZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 424w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 848w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kUZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png" width="1456" height="1028" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1028,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:512379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://ourworldindata.org/grapher/historical-cost-of-computer-memory-and-storage?time=1970..latest&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/174313935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kUZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 424w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 848w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!kUZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F566122ba-cfbf-4929-be4e-1d01eca0e013_3400x2400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In addition to the work of finding method of moments estimators, maximum likelihood is optimally asymptotically efficient. It requires the the least possible amount of observations to bound the error of a parameter estimate within some interval. It is usually considered that method of moment estimation requires a larger number of observations for accurate results.</p><p>As our ability to collect data is outpacing performance improvements in computational hardware, we might want to revisit the concept of non-iterative statistics, to quickly get answers to the questions we have of the data.</p><h2>Pair-set transformer for comparative statistics</h2><p>In a <a href="https://www.nxn.se/p/a-pre-trained-t-test-transformer">previous post</a> I explored the idea of a pair-set transformer that can be pre-trained for the task of statistical estimation. The same architecture can be adapted to estimate the statistics needed for negative binomial regression and inference.</p><p>The general idea is to explore if modern machine learning frameworks such as transformers can be combined with the 1980&#8217;s concept of non-iterative statistics. Instead of deriving equations for particular special case problems, we can use a generic transformer-based neural network architecture combined with synthetic data generation to learn a network that predicts statistical parameters of interest.</p><p>While transformers are expensive to execute, relatively small ones can be more efficient than iterative parameter estimation. Training it will be expensive, but once it&#8217;s trained it can be re-used forever.</p><p>You simply show it the data, and it will predict what the statistical parameters of interest are.</p><p>The asymptotic optimality of maximum likelihood estimation means it would not be able to beat it in numeric performance, but may be a lot faster, and at least better than method of moments estimation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j8_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j8_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 424w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 848w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 1272w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j8_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png" width="1456" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:406154,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/174313935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j8_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 424w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 848w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 1272w, https://substackcdn.com/image/fetch/$s_!j8_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5083c8c-7216-4297-8c9c-9420ea0c3bd5_2587x668.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I created a variation of the transformer architecture described in <a href="https://www.nxn.se/p/a-pre-trained-t-test-transformer">the previous post</a>, designed for the task of estimating parameters needed to learn magnitude and significance in differences in counts. I trained the model and wrote up <a href="https://arxiv.org/abs/2508.04111">a paper with detailed methods and findings</a>.</p><h2>Method of moments actually performs really well</h2><p>I compared the transformer-based method to the standard approach of estimating parameters through iterative maximum likelihood, as well as an analytical method of moments solution for the parameters. I did this because I figured the analytical solution would be extremely fast but have terrible performance.</p><p>It did not!</p><p>The method of moments solution had nearly identical estimation error as iterative maximum likelihood, much better P-value calibration than either the transformer method or the iterative method, and the most statistical power.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cjP6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cjP6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 424w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 848w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 1272w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cjP6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic" width="1456" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77242,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/174313935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cjP6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 424w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 848w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 1272w, https://substackcdn.com/image/fetch/$s_!cjP6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61488616-c1b7-4fe8-a569-ceef80316d06_1920x546.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This was a big surprise for me.</p><p>For this analysis, I used practically realistic sample sizes: between two and ten replicates per condition. This is likely far below where the asymptotic optimality of iterative maximum likelihood starts being relevant.</p><p>While I found it interesting to work out the transformer architecture and training strategy, the more valuable part of this work was in the benchmarking. It identified that method of moments solution didn&#8217;t have the problems I expected it to have.</p><h1>From around the web</h1><ul><li><p>Sequences and consequences - <a href="https://royalsocietypublishing.org/doi/10.1098/rstb.2009.0221">https://royalsocietypublishing.org/doi/10.1098/rstb.2009.0221</a></p></li><li><p>Optimizing murine sample sizes for RNA-seq studies revealed from large-scale comparative analysis - <a href="https://www.biorxiv.org/content/10.1101/2024.07.08.602525v1">https://www.biorxiv.org/content/10.1101/2024.07.08.602525v1</a></p></li><li><p>A recap of virtual cell releases circa June 2025 - <a href="https://ekernf01.github.io/virtual-cell-june-2025">https://ekernf01.github.io/virtual-cell-june-2025</a></p></li><li><p>A bioinformatician, computer scientist, and geneticist lead bioinformatic tool development&#8212;which one is better? - <a href="https://academic.oup.com/bioinformaticsadvances/article/5/1/vbaf011/7989318">https://academic.oup.com/bioinformaticsadvances/article/5/1/vbaf011/7989318</a></p></li><li><p>Which Kind of Science Reform - <a href="https://elevanth.org/blog/2025/07/09/which-kind-of-science-reform/">https://elevanth.org/blog/2025/07/09/which-kind-of-science-reform/</a></p></li><li><p>Using hierarchical modeling to get more stable rankings of gene expression - <a href="https://statmodeling.stat.columbia.edu/2025/07/30/using-hierarchical-modeling-to-get-more-stable-rankings-of-gene-expression/">https://statmodeling.stat.columbia.edu/2025/07/30/using-hierarchical-modeling-to-get-more-stable-rankings-of-gene-expression/</a></p></li><li><p>(1) Fitting hierarchical models in genetics, (2) A Stan model that runs faster with 400,000 latent parameters, (3) Super-scalable penalized maximum likelihood inference for biome problems, (4) &#8220;In the end, I basically gave up working on biology bec... - <a href="https://statmodeling.stat.columbia.edu/2025/07/31/bio/">https://statmodeling.stat.columbia.edu/2025/07/31/bio/</a></p></li><li><p>Overinterpreting underpowered multi-omics experiments - <a href="https://thecodon.substack.com/p/falling-in-the-trap-of-overinterpreting">https://thecodon.substack.com/p/falling-in-the-trap-of-overinterpreting</a></p></li></ul><ul><li><p>The pharma industry from Paul Janssen to today: why drugs got harder to develop and what we can do about it - <a href="https://atelfo.github.io/2023/12/23/biopharma-from-janssen-to-today.html">https://atelfo.github.io/2023/12/23/biopharma-from-janssen-to-today.html</a></p></li><li><p>The therapeutic potential of stem cells - <a href="https://royalsocietypublishing.org/doi/10.1098/rstb.2009.0149">https://royalsocietypublishing.org/doi/10.1098/rstb.2009.0149</a></p></li><li><p>Animals as chemical factories - <a href="https://worksinprogress.co/issue/animals-as-chemical-factories/">https://worksinprogress.co/issue/animals-as-chemical-factories/</a></p></li><li><p>The immunology of asthma - <a href="https://www.nature.com/articles/s41590-025-02212-9">https://www.nature.com/articles/s41590-025-02212-9</a></p></li><li><p>What's going on with gene therapies? (Part one) - <a href="https://nehalslearnings.substack.com/p/whats-going-on-with-gene-therapies">https://nehalslearnings.substack.com/p/whats-going-on-with-gene-therapies</a></p></li></ul><ul><li><p>The Day Novartis Chose Discovery - <a href="https://www.alexkesin.com/p/the-day-novartis-chose-discovery">https://www.alexkesin.com/p/the-day-novartis-chose-discovery</a></p></li></ul><ul><li><p>A Visual Guide to Gene Delivery - <a href="https://press.asimov.com/articles/gene-delivery">https://press.asimov.com/articles/gene-delivery</a></p></li><li><p>How pour-over coffee got good - <a href="https://worksinprogress.co/issue/how-pour-over-coffee-got-good/">https://worksinprogress.co/issue/how-pour-over-coffee-got-good/</a></p></li><li><p>THE LOTTERY OF FASCINATIONS - <a href="https://slatestarcodex.com/2013/06/30/the-lottery-of-fascinations/">https://slatestarcodex.com/2013/06/30/the-lottery-of-fascinations/</a></p></li><li><p>The Truth Is Out There, Part 3: The Game of Belief - <a href="https://www.filfre.net/2024/10/the-truth-is-out-there-part-3-the-game-of-belief/">https://www.filfre.net/2024/10/the-truth-is-out-there-part-3-the-game-of-belief/</a></p></li><li><p>Alien: Earth (2025) - <a href="https://www.themoviedb.org/tv/157239-alien-earth">https://www.themoviedb.org/tv/157239-alien-earth</a></p></li><li><p>The immunology of asthma and chronic rhinosinusitis - <a href="https://www.nature.com/articles/s41577-025-01159-0">https://www.nature.com/articles/s41577-025-01159-0</a></p></li><li><p>Hanseatic League - <a href="https://en.wikipedia.org/wiki/Hanseatic_League">https://en.wikipedia.org/wiki/Hanseatic_League</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54703237035/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!29Tq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!29Tq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg" width="798" height="432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:798,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54703237035/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/174313935?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!29Tq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!29Tq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff4193fa-c51b-4549-872c-64cde51d3fae_798x432.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[H5AD files in GEO]]></title><description><![CDATA[I&#8217;m working on a project where I want to use a lot of well-annotated scRNA-seq data.]]></description><link>https://www.nxn.se/p/h5ad-files-in-geo</link><guid isPermaLink="false">https://www.nxn.se/p/h5ad-files-in-geo</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Fri, 25 Jul 2025 04:26:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jnak!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;m working on a project where I want to use a lot of well-annotated scRNA-seq data. There are thousands of datasets available at <a href="https://www.ncbi.nlm.nih.gov/geo/">GEO</a>, but in the vast majority of cases these are basic outputs from CellRanger consisting of .mtx files with UMI counts and .tsv files with cell barcodes and gene names. That is, they are all missing experimental conditions and cell type labels.</p><p>Some series on GEO have had <a href="https://anndata.readthedocs.io/en/latest/index.html">anndata-based .h5ad</a> files submitted. These are much more likely to have complete metadata and annotations included in them. For re-use, this is by far more valuable than the basic CellRanger output! In particular the cell type annotation process is very fraught and time consuming (you can usually infer experimental conditions from file names).</p><p>I was happily surprised that the number of GEO series with .h5ad files seem to be increasing over time!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jnak!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jnak!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 424w, https://substackcdn.com/image/fetch/$s_!jnak!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 848w, https://substackcdn.com/image/fetch/$s_!jnak!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!jnak!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jnak!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261258,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/169198604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jnak!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 424w, https://substackcdn.com/image/fetch/$s_!jnak!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 848w, https://substackcdn.com/image/fetch/$s_!jnak!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!jnak!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01f9f024-3e34-4080-b501-8f47b80e70ab_3600x2400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>From around the web</h2><ul><li><p><a href="https://www.filfre.net/2024/09/the-truth-is-out-there-part-1-the-will-to-believe/">The Truth Is Out There, Part 1: The Will to Believe</a></p></li><li><p><a href="https://www.filfre.net/2024/09/the-truth-is-out-there-part-2-the-power-of-belief/">The Truth is Out There, Part 2: The Power of Belief</a></p></li><li><p><a href="https://luispedro.substack.com/p/on-progress">On Progress Studies</a></p></li><li><p><a href="https://blog.jck.bio/p/creating-therapeutic-abundance">Creating therapeutic abundance</a></p></li><li><p><a href="https://behindbioml.substack.com/p/the-state-of-research-on-virtual">The state of research on virtual cell modeling</a></p></li><li><p><a href="https://mbh4h.substack.com/p/neuromancer-2025-review-william-gibson">Reading Neuromancer for the very first time in 2025</a></p></li><li><p><a href="https://www.noahpinion.blog/p/the-elite-overproduction-hypothesis-994">The Elite Overproduction Hypothesis</a></p></li><li><p><a href="https://srikosuri.substack.com/p/the-elusive-virtual-cell">The elusive virtual cell</a></p></li><li><p><a href="https://blog.turbine.ai/p/ic50-is-a-deep-rabbit-hole">IC50 is a deep rabbit hole</a></p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54425693548/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vto6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vto6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100365,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54425693548/&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/169198604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vto6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vto6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe330392c-c89a-41e1-bfa2-e95636c8a05b_799x533.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[A pre-trained t-test transformer]]></title><description><![CDATA[Generally, neural networks are nonlinear function approximators.]]></description><link>https://www.nxn.se/p/a-pre-trained-t-test-transformer</link><guid isPermaLink="false">https://www.nxn.se/p/a-pre-trained-t-test-transformer</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Thu, 10 Jul 2025 04:03:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1J8q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Generally, neural networks are nonlinear function approximators. Unidimensional neural networks aren&#8217;t even particularly good at function approximation, but they are amazing in high-dimensional settings. Not only do they perform very well on high-dimensional vector inputs, but over the many years they have been around people have figured out how to use them effectively for matrix-valued inputs or even higher order blocks of numbers. We can define and learn how to evaluate functions with very complicated <em>domains</em> defined by arrays of numbers by various sizes. If you can phrase some input data as an array of numbers, chances are high that you can make progress on the problem using a neural network with enough data.</p><p>With the introduction of transformers, the machine learning community took another large step. Transformers have <em>sets</em> as domains. With transformers we can learn to evaluate functions that take a set as an input, and e.g., produces a number as an output. Many of the &#8216;functions&#8217; we typically use are classically defined by multi-step algorithms. If you have enough data, you can learn to approximate these very complicated functions.</p><p>In many cases it&#8217;s possible to get arbitrarily large amounts of data through synthetic data generation.</p><p>As an example, we can think of performing a t-test. When you perform a t-test, you have two relatively small sets of numbers. From these two sets you calculate the t-statistic, then the degrees of freedom, and from that you can obtain a p-value. To simplify the problem, let&#8217;s consider the problem of calculating the t-statistic. Calculating the t-statistic isn&#8217;t a hard problem, but can serve as a useful example.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;t \\;=\\; \\frac{\\bar{X} - \\bar{Y}}{\\sqrt{\\displaystyle\\frac{s_X^2}{n_X} \\;+\\;\\frac{s_Y^2}{n_Y}}}&quot;,&quot;id&quot;:&quot;PBSRQRKIKA&quot;}" data-component-name="LatexBlockToDOM"></div><p>If we target the typical situation of having between two and ten observations per group, we can define a fairly simple transformer architecture that take these two sets as input.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1J8q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1J8q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1J8q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg" width="1456" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80504,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167963651?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1J8q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1J8q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbab316ae-36dc-4453-b6f8-d1ae529ef534_1830x472.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This transformer architecture has the ability to learn how to interpret within-group variation in the group, and how it relates to between-group variation. To train the model, we can sample random values for X and Y, of varying sizes, calculate our ground truth t-statistic, then give these as training data.</p><p>The model is relatively small, but still requires a large amount of training. I used ~20 million synthetically generated examples to train the model in total for the task of calculating t-statistics. Training took about six and a half hours on my Mac Mini. But after training, the pre-trained model can be reused to predict t-statistics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hD8-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hD8-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hD8-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png" width="364" height="364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:364,&quot;bytes&quot;:78239,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167963651?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hD8-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!hD8-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99d7df2f-9adc-4ac2-b925-7e9681feb1b4_1200x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The parameters for the pre-trained t-test transformer model takes about ~5 MB of storage (or ~2 iPhone photos).</p><p>I put together a package with the pre-trained t-test transformer in a Github repo: <a href="https://github.com/vals/TTT">https://github.com/vals/TTT</a>. I don&#8217;t suggest replacing standard t-statistics calculations, but the package illustrates the model architecture and training scripts. It is exciting thinking about other of functions defined with sets as domains we can learn.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54425830360/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z456!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z456!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z456!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z456!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z456!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58030,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54425830360/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167963651?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z456!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z456!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z456!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z456!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8d9b6e-4246-4e38-92be-4b7216e9739e_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Slot machine strategy]]></title><description><![CDATA[In the game Blue Prince , every run of the game have you fill a mansion with different rooms to wander through.]]></description><link>https://www.nxn.se/p/slot-machine-strategy</link><guid isPermaLink="false">https://www.nxn.se/p/slot-machine-strategy</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Sun, 06 Jul 2025 04:31:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!szn8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the game <a href="https://www.blueprincegame.com">Blue Prince</a> , every run of the game have you fill a mansion with different rooms to wander through. On of these rooms is the <a href="https://blue-prince.fandom.com/wiki/Casino">Casino</a>, containing several slot machines and a roulette table.</p><p>One of the slot machines is broken until you figure out how to fix it, which requires sacrificing a valuable item.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!szn8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!szn8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 424w, https://substackcdn.com/image/fetch/$s_!szn8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 848w, https://substackcdn.com/image/fetch/$s_!szn8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!szn8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!szn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg" width="1280" height="882" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:882,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167628444?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!szn8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 424w, https://substackcdn.com/image/fetch/$s_!szn8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 848w, https://substackcdn.com/image/fetch/$s_!szn8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!szn8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54a0734d-cc2e-47e2-ac9d-6ae77812c63d_1280x882.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The difference between the standard slot machines and the locked slot machine is that the locked slot machine allows you to respin individual reels five times instead of three times.</p><p>There are a lot of satisfying puzzles in Blue Prince, as well as random draws that involve setting up synergistic benefits. I figured this would also be true for the slot machine that requires you to sacrifice the valuable item. But after playing it many times I couldn&#8217;t see a clear strategy for the slot machine that would beat it.</p><p>Because the difference was in the number of respins, I figured there must be some combinations of symbols on the reels that dramatically increase in expected values when you respin more times, giving higher expected yields. I thought maybe something with the probabilities of getting snake and net combinations.</p><p><a href="https://www.reddit.com/r/BluePrince/comments/1k6a61n/gambling_odds_for_those_interested/">People online</a> have been collecting statistics from the slot machine estimated probabilities for the different symbols on the reels </p><ul><li><p>Blank: 28%</p></li><li><p>Coin: 30%</p></li><li><p>Coin Stack: 10%</p></li><li><p>Snake: 10%</p></li><li><p>Net: 4%</p></li><li><p>x2: 9%</p></li><li><p>Clover: 1%</p></li><li><p>Crown: 8%</p></li></ul><p>With the probabilities and the rules I could implement simulations of playing the game to test different strategies.</p><p>I thought I could use <a href="https://en.wikipedia.org/wiki/Q-learning">Q-learning</a> to identify non-obvious strategies buried deep in the game design. After some iterations on different Q-learning strategies, it wasn&#8217;t able to find any particularly revolutionary strategies.</p><p>To get some points of comparison, I also created a JavaScript version of the game that I could multiple times to collect data on the success on strategies I had intuited from playing the game many many times.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hdIa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hdIa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 424w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 848w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 1272w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hdIa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png" width="1346" height="1715" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1715,&quot;width&quot;:1346,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:380511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167628444?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hdIa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 424w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 848w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 1272w, https://substackcdn.com/image/fetch/$s_!hdIa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18c28a37-9cee-4516-b014-d09c608b23ae_1346x1715.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I collected empirical data for 25 runs (until bankruptcy) while. These empirical data can be compared with having the Q-learning agent play the game for 25 runs, then plotting the total balance over the turns in playing the slot machine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-83M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-83M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!-83M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!-83M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!-83M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-83M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:402769,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167628444?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-83M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!-83M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!-83M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!-83M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69be0358-9e77-4f59-97e1-bfc484098e25_2400x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Q-learning agent had learned some strategies, but it seems I took more risky bets that worked out. On average, however, the performance is about equal.</p><p>I wanted to see how this compared to a naive strategy that never uses the respins, and instead alway cashes out after the initial spin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DjJ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DjJ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DjJ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:667814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167628444?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DjJ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!DjJ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc29cccfa-38ec-4055-84c8-502ae61129c0_2400x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This performed much better than the Q-learning agent or my strategies!</p><p>Generally, the cost of performing a respin does not beat out the expected values for the majority of symbol combinations on the reels. There are a few combinations that do however. For example, if you have three crowns, respinning the remaining reel at a cost of 1 gold has an expected value of 0.08 * 100 = 8 gold. I created a manual strategy that cashes out unless there is a respin available with a positive expected value, and performs that instead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Astv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Astv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!Astv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!Astv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!Astv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Astv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:848707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/167628444?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Astv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!Astv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!Astv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!Astv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06e2afb8-4be2-4894-99ee-e251eab85ee0_2400x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The performance of the optimal expected value strategy wasn&#8217;t much better than the cash-out strategy.</p><p>So <em>why</em> would you sacrifice the value item in order to use a slot machine with more respins, if using respins always end up losing you gold?</p><p>There is an achievement in the game to collect the jackpot from the slot machine (getting four crowns). If you pull three crowns, and you have three respins, the chance of getting the jackpot is 1 - 0.92^3 = 22.13%, but with five respins it is 1 - 0.92^5 = 34.09%. That is a large different if you really want to get this achievement; you end up having to play many fewer games.</p><p>The locked slot machine ends up not being gains in gold but part of &#8216;completing&#8217; the game.</p><p>Code and notebook for this post are available at <a href="https://github.com/vals/Blog/tree/master/250705-slot-machine-strategy">https://github.com/vals/Blog/tree/master/250705-slot-machine-strategy</a></p><h1>From around the web</h1><ul><li><p><a href="https://centuryofbio.com/p/virtual-cell">What Are Virtual Cells?</a></p></li></ul><ul><li><p><a href="https://animationobsessive.substack.com/p/soviet-anime">Soviet Anime?</a></p></li></ul><ul><li><p><a href="https://kyunghyuncho.me/drug-discovery-may-be-in-the-cold-war-era/">Drug Discovery may be in the Cold War Era</a></p></li></ul><p></p>]]></content:encoded></item><item><title><![CDATA[SCVI with variational batch encoding]]></title><description><![CDATA[For last couple of months I have been exploring batch integration strategies with SCVI and MRVI, and the possibility to optionally disable integration when encoding single cells.]]></description><link>https://www.nxn.se/p/scvi-with-variational-batch-encoding</link><guid isPermaLink="false">https://www.nxn.se/p/scvi-with-variational-batch-encoding</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Sat, 21 Jun 2025 22:53:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vL6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For last couple of months I have been exploring batch integration <a href="https://www.nxn.se/p/scvi-integrating-or-not">strategies with SCVI and MRVI</a>, and the possibility to <a href="https://www.nxn.se/p/scvi-inference-based-optional-integration">optionally disable integration</a> when encoding single cells.</p><p>These models allow you to ask questions of the data you have trained the models one. But what if you have a pre-trained model and want to apply it to new data? You will not be able to integrate out new, unseen, batches. In this post I am exploring a strategy to solve this problem.</p><p>As a reminder, the conditional variational autoencoder used for batch integration in SCVI can be written as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\nz_n &amp;\\sim \\text{N}(0, 1), \\\\\nh_n &amp;= f(z_n, s_n), \\\\\nY_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g).\n\\end{aligned}&quot;,&quot;id&quot;:&quot;TLMDHVVCBP&quot;}" data-component-name="LatexBlockToDOM"></div><p>In this formulation, z_n are the learned representations of the single cells, while s_n is an indicator for the batch identity. This allows the neural network f() to decode gene expression differently depending on the interaction between single cell representations and batch identities.</p><p>Practically, the default in SCVI is to represent a batch k as a one-hot encoding vector. The vector will have the length corresponding to the total number of batches K, and the element k being 1 in the vector indicates batch k (and all other entries are 0). Thus &#8220;one-hot&#8221;.</p><p>Of course, the vector s_n having length K means you can&#8217;t really do anything if you want to use the model with a new K+1&#8217;st batch from some new data. You are limited to the batches you trained the original model on.</p><p>There is an<a href="https://docs.scvi-tools.org/en/stable/api/reference/scvi.module.VAE.html#scvi.module.VAE"> experimental option</a> in the SCVI model called <code>batch_representation</code> with the two options <code>'one-hot'</code> and <code>'embedding'</code>. The default <code>'one-hot'</code> option implements the behavior described above. The option <code>'embedding'</code> implements a new behavior that learns a low-dimensional embedding for each batch. Practically, these embeddings are lookup tables that takes a batch indices and return the low-dimensional vector representing the batch. In this setting, s_i[n] are then these continuous vectors, whose values are parameters that are learned during training.</p><p>The <code>'embedding'</code> option has an interesting feature: in theory, &#8216;similar&#8217; batches should get proximal embedding vectors. This could be used to answer questions about batches. For example, if batches are patient samples, and patients have different diagnoses, you could learn which diagnoses are similar to each other.</p><p>These embeddings are finite-size lookup tables, and they are learned during training. We are still in the position that we can&#8217;t add new batches. It would be possible to extend look-up table with the new batch and run some training to learn the parameters for this batch, but this is impractical.</p><p>We can use this batch representation strategy as inspiration for a next step.</p><p>Remember that inference in the SCVI model is done through an inference model that takes as input observed data,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_n | Y \\sim q_{\\theta} (z_n) = \\text{N}(g_\\mu(y_n), g_\\sigma (y_n)).&quot;,&quot;id&quot;:&quot;SDQODQPCUK&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here y_n is an array of observed UMI counts for cell n.</p><p>What if we can make a version of the SCVI model where batch embeddings s_n are encoded using another inference model?</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\nz_n &amp;\\sim \\text{N}(0, 1), \\\\\ns_{i[n]} &amp;\\sim \\text{N}(0, 1), \\\\\nh_n &amp;= f(z_n, s_{i[n]}), \\\\\nY_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g), \\\\\n\\\\\ns_i | X &amp;\\sim q_\\theta(x_i) = \\text{N}(b_\\mu(x_i), b_\\mu(x_i)).\n\\end{aligned}&quot;,&quot;id&quot;:&quot;BDRUQJSLHS&quot;}" data-component-name="LatexBlockToDOM"></div><p>This would give us <em>variationally encoded batches</em>. The known data X need to contain rich enough information for an inference model to be learned to encode the data to the variational batch representations.</p><p>A good candidate for the data X is the <em>pseudobulk</em> of all cells in batch i.</p><p>Let&#8217;s try this out!</p><h2>Results</h2><p>As in the previous posts, we will use the Asian Immune Diversity Atlas (AIDA, Tian et al 2024). The dataset has 1.1 million blood cells, but particularly important here, it has samples from 503 donors, making it realistic to learn an inference model even after holding out 50 donors as a test set.</p><p>After implementing the model with variational batch encoding, I split out all data from 50 donors as a test set. The remaining dataset was used to train a model with the default <code>'one-hot'</code> option, a model with the experimental <code>'embedding'</code> option, and finally a model with our new <code>'variational'</code> option for <code>batch_representation</code>.</p><p>All models used 10-dimensional cell representations, the default for SCVI models. Both the <code>'embedding'</code> and <code>'variational'</code> batch representation models used 5-dimensional representations for donors. This is the default for the experimental <code>'embedding'</code> option. All models were trained for 20 epochs.</p><p>In each training run the training data is further randomly split into training and validation. To make performance comparable, we evaluate the ELBO (evidence lower bound) for the full non-test fraction of the data. This will not let us evaluate overfitting, but it it does give us a global picture of average performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vL6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vL6C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 424w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 848w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 1272w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vL6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png" width="573" height="286.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:573,&quot;bytes&quot;:84128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/166492906?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vL6C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 424w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 848w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 1272w, https://substackcdn.com/image/fetch/$s_!vL6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6ec863-10b6-41b7-8c08-a5bc8dcb8419_1800x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The default <code>'one-hot'</code> option performs the best in terms of fitting the model to the data. The new <code>'variational'</code> option performs the worst.</p><p>To get a feel for the batch integration performance of the different options, we can quickly visually inspect the cell embeddings using TSNE.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aT8F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aT8F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 424w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 848w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aT8F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png" width="1456" height="2184" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2184,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2989363,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/166492906?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aT8F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 424w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 848w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 1272w, https://substackcdn.com/image/fetch/$s_!aT8F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F885cf66a-f8e4-4fde-9b84-a1af906db9b8_1600x2400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The models were set up to learn cell representations that integrate out difference between donors. The ideal results in this case would show unmixed coloring by cell type (left column), and highly mixed coloring by donor ID (right column). The <code>'one-hot'</code> and <code>'embedding'</code> options seem to mix donors about equally well, but the new <code>'variational'</code> option is not as clearly integrating out variation due to donor.</p><h2>Variationally encoding unseen batches</h2><p>The main aim of the new <code>'varitional'</code> batch representation option is to be able to apply a trained model on new data which has new batches. The model uses a low-dimensional continuous representation of batches. Are these representations meaningful? When we tested the different alternatives above we held out all cells from 50 donors as a test set. Do &#8216;similar&#8217; unseen donors get variationally encoded to &#8216;similar&#8217; seen donors?</p><p>In the AIDA data, blood samples from donors were processed at three different institutes: the genome institute of Singapore (GIS), RIKEN (RIK), and Samsung genome institute (SGI). We can investigate if donors processed at the different institutes provide similar batch effects by performing TSNE on the 5-dimensional variational batch embeddings representing the donors. In particular, the design of the <code>'variational'</code> batch representation option also lets us encode donors that were held out from training.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lo2V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lo2V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lo2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:390854,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/166492906?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lo2V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!lo2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd30c0e9-33ba-491f-8ab1-bc558427a4c4_1920x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The inference network for the batch representations learns to encode donors from GIS differently from donors from RIK or SGI. In particular, we can see that the batch inference network has generalized, and has successfully encoded unseen donors so they are similar to seen donors from the same institute.</p><p>How do the encoded batch embeddings compare to embeddings that are learned through optimization using the <code>'embedding'</code> batch representation option? We can similarly perform TSNE visualization on these learned embeddings, with the caveat that we can only get these representations for the training data with 453 donors.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SoUj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SoUj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 424w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 848w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SoUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png" width="448" height="448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:131878,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/166492906?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SoUj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 424w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 848w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!SoUj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9833b060-1231-4ef2-b169-d09e70ac3e59_1500x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Similar to how the <code>'embedding'</code> option seems better at learning to represent the single cell data, it also appears more effective at learning different types of batch effects. We can see clustering into subpopulations even within institutes in the learned representations.</p><h2>Conclusions</h2><p>It is disappointing that this new model using variational encoder for batch representation has lower performance than the other options. It is still very encouraging that the ultimate goal, the ability to represent unseen batches, is working correctly!</p><p>It is very likely that 450 training points is insufficient to learn an inference model to encode the batches accurately. It is probably worth revisiting this strategy with a much larger set of batches. There are very few individual datasets with this many batches, but could potentially be useful if training on e.g., all cells in cellxgene and you want to be able to inject new data over time.</p><p>A fork and branch of scvi-tools that implements the <code>'variational'</code> batch representation option is available on github at <a href="https://github.com/vals/scVI/tree/codex/add-pseudobulk-generator-and-integrate-with-vae">https://github.com/vals/scVI/tree/codex/add-pseudobulk-generator-and-integrate-with-vae</a>. Scripts and notebooks for training and generating results are available on github at <a href="https://github.com/vals/Blog/tree/master/250621-variational-batch-encoding">https://github.com/vals/Blog/tree/master/250621-variational-batch-encoding</a>.</p><h1>References</h1><p>Tian, Chi, Yuntian Zhang, Yihan Tong, Kian Hong Kock, Donald Yuhui Sim, Fei Liu, Jiaqi Dong, et al. 2024. &#8220;Single-Cell RNA Sequencing of Peripheral Blood Links Cell-Type-Specific Regulation of Splicing to Autoimmune and Inflammatory Diseases.&#8221; Nature Genetics 56 (12): 2739&#8211;52.</p><h1>From around the web</h1><ul><li><p><a href="https://www.worksinprogress.news/p/the-duplication-crisis-the-other">The duplication crisis: the other replication crisis</a></p></li></ul><ul><li><p><a href="https://allenpike.com/2025/figma-slides-beautiful-disaster">Figma Slides is a Beautiful Disaster</a> </p><p></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54425443921" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FFWj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FFWj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54425443921&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/166492906?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FFWj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FFWj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a14bf59-2900-4b29-9428-854879667724_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Evaluating automatic cell number extraction from single cell papers]]></title><description><![CDATA[For the past ~7 years I have been tracking the scale of scRNA-seq experiments in a spreadsheet which we published as the &#8216;Single Cell Studies Database&#8217; a few years ago (Svensson, da Veiga Beltrame, and Pachter 2020).]]></description><link>https://www.nxn.se/p/evaluating-automatic-cell-number</link><guid isPermaLink="false">https://www.nxn.se/p/evaluating-automatic-cell-number</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Mon, 26 May 2025 06:31:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GR-U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For the past ~7 years I have been tracking the scale of scRNA-seq experiments in a spreadsheet which we published as the &#8216;<a href="https://www.nxn.se/p/single-cell-studies">Single Cell Studies Database</a>&#8217; a few years ago (Svensson, da Veiga Beltrame, and Pachter 2020).</p><p>Initially, all fields were entered manually. Eventually, I realized that a lot of article information could be automatically populated from the article DOI. This is through a macro in Google Apps Script which queries <a href="http://crossref.org/">crossref.org</a>, receives an object containing the result data, pulls out the useful parts, formats them, and puts them in corresponding columns of the spreadsheet. This made it a lot easier to quickly add studies. This takes care of authors, journal, title, and publication date.</p><p>All other fields are still entered manually.</p><p>It is fascinating to see how the scale of studies are growing in size over the years. The genesis of tracking this trend was while writing a perspective article for Nature Protocols focused on technological development in the single cell genomics field, where we noted the exponential growth in scale for individual studies (Svensson, Vento-Tormo, and Teichmann 2018).</p><p>Over the years, I have learned some shortcuts for identifying the total cell numbers assayed for individual papers. Sometimes they are written in the abstract, often not. A quick search for the word &#8216;total&#8217; in the paper can lead to sentences like &#8216;in total, we generated transcriptomes for 12,000 cells from liver and 20,000 cells from spleen&#8217;, then I can add them up to 22,000. If that doesn&#8217;t work, I tend to search for &#8216;cells&#8217;, which is obviously used everywhere in the paper, but the beginning of the &#8216;results&#8217; section typically has numbers for collected cells. Sometimes the information is in figure legends. More often, it is just written as annotations in the figures themselves. For papers in more prestigious journals like Nature or Science, these kinds of details are more likely to be find in the supplemental material, which actually tend to a good thing, because those are usually available for free even if you don&#8217;t have access to the journal article itself. I also do some quick skimming of the paper, to see if I missed any way of getting these numbers that I didn&#8217;t think of. If I still can&#8217;t find it, I usually try to find the data accession ID, download the data, and try to see if I can easily load it and learn the numbers manually that way.</p><p>Recently, there has been an explosion of tools for data extraction from web sites using large language models. I wanted to try this strategy and see how far off they would be from my manually extracted cell numbers.</p><h2>Evaluating Firecrawl for cell number extraction</h2><p>The general pipeline is to have a scraper convert a query website into a local Markdown file, then pipe that file to an LLM together with a prompt to extract the information of interest. (It is healthy to keep in mind that a &#8216;paper&#8217; is just a post on website by some guest authors, that happens to also have a more readable PDF version.)</p><p>There is a popular free tool called <a href="https://docs.crawl4ai.com/">Crawl4AI</a>, which has a handy CLI utility as well as a Python API. I played around with it for a bit, but while reading up on it I came across <a href="https://www.firecrawl.dev/extract">Firecrawl Extract</a>.</p><p>In terms of functionality, Crawl4AI and Firecrawl Extract can do the same thing, with the key difference that you bring your own LLM API key or local LLM for a model of your choice to Crawl4AI, while this is built into extraction queries in Firecrawl Extract. For a quick test, I found Firecrawl Extract to be easier to get started with and getting results.</p><p>I sampled 97 scRNA-seq studies from the &#8216;Single Cell Studies Database&#8217; where I had annotated the &#8216;Reported cells total&#8217; column to use as an evaluation set.</p><p>Firecrawl Extract takes as input a URL, a prompt, and an output schema for structured outputs.</p><pre><code><code>class ExtractSchema(BaseModel):
    total_number_cells: int
    failure_reason: str
    url: str

prompt = '''
The page is a paper where the authors performed single-cell RNA-sequencing.
Extract the total number of cells the authors report to have collected.
Store the number in the total_number_cells field.

If you cannot extract the number of cells, give a brief reason in the failure_reason field.

Report the URL that was called to extract in the url field.
'''
</code></code></pre><p>Every DOI is a valid URL that resolves correctly when prepended with </p><p>https://doi.org/</p><p> . Even so, I had some hiccups where every ~14 papers or so my extraction loop exited with <code>ValueError: ('Extract job failed. Error: All provided URLs are invalid. Please check your input and try again.', 500)</code>. Simply rerunning the query would result in a successful extraction job, so I&#8217;m not sure what was happening there. It could be due to rate limiting, but I put a pretty generous <code>sleep.time(10)</code> between each paper.</p><h2>Results</h2><p>After extraction I could simply compare my manual annotations with the extractions from FireCrawl. In 38% of cases, Firecrawl Extract was not able to find a cell number and gave the reason that the number was not specified in the provided content. I suspect in most cases this is due to papers being behind paywalls. If I would rerun this experiment I would add further instructions to the prompt to report whether the URL has a paywall. Papers where Firecrawl Extract reported to not find cell numbers are included in the figure below as red dots.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GR-U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GR-U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GR-U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129883,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/164460716?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GR-U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 424w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 848w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!GR-U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7472c1e-1dda-47a3-90b4-a7966a43e084_1920x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The results had the exact same counts for 27% of the papers. Some of the 35% of papers with different cell counts can be explained by a few specific observations: a couple of the &#8216;Reported cells total&#8217; values have very round numbers (e.g., 24,000) while Firecrawl Extract numbers or nearby but less round (e.g., 24,023). In these cases the manual curation is probably inaccurate. In some more cases the situation is reversed, for example, manual curation has 1,027,401 while Firecrawl reports 1,000,000. I think these are cases of Firecrawl picking up rounded numbers from the abstract (&#8217;In this study we profile a million cells &#8230;&#8217; or so). There are a number of cases where Firecrawl reports 0 cells in the papers, but does not indicate that the information is missing. I&#8217;m not sure why this is happening.</p><p>Excluding cases where no cell numbers are reported, or numbers with 0 reported cells, the average fold change error in cell numbers was 1.4x.</p><p>The results are pretty binary. Either the extraction fails to find a cell number, or it is pretty likely to be near the actual number.</p><p>For now I think I will hold off on using this automation. I think the results are pretty useful, but the practical bottleneck is that if I was impromptu adding a paper to the spreadsheet, I would need to either use a CLI tool or go to the Firecrawl Extract dashboard and set up an extract job, and that is just too much friction.</p><p>I also like having confirmed manually curated numbers in the database for these sorts of evaluations. If I start using this automation I would want to add some kind of tag or additional column to indicate if the information was manually curated or automatically extracted.</p><p>Notebooks for this post are available on Github at <a href="https://github.com/vals/Blog/tree/master/250525-cell-number-extraction">https://github.com/vals/Blog/tree/master/250525-cell-number-extraction</a>.</p><h2>References</h2><p>Svensson, Valentine, Eduardo da Veiga Beltrame, and Lior Pachter. 2020. &#8220;A Curated Database Reveals Trends in Single-Cell Transcriptomics.&#8221; Database: The Journal of Biological Databases and Curation 2020 (November). <a href="https://doi.org/10.1093/database/baaa073">https://doi.org/10.1093/database/baaa073</a>.</p><p>Svensson, Valentine, Roser Vento-Tormo, and Sarah A. Teichmann. 2018. &#8220;Exponential Scaling of Single-Cell RNA-Seq in the Past Decade.&#8221; <em>Nature Protocols</em> 13 (4): 599&#8211;604.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54425830355" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mZYx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mZYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104113,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54425830355&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/164460716?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mZYx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mZYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1a90b8d-d4d0-401f-b9d2-837a7406c0a7_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Merging SCVI models]]></title><description><![CDATA[There are massive amounts of single-cell RNA-seq data, and data is still being produced.]]></description><link>https://www.nxn.se/p/merging-scvi-models</link><guid isPermaLink="false">https://www.nxn.se/p/merging-scvi-models</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Wed, 21 May 2025 05:44:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-sS8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are massive amounts of single-cell RNA-seq data, and data is still being produced. The single-cell genomics community is currently training large models with this data which can be used for downstream analysis tasks. This requires large efforts in collecting, formatting, and storing the data.</p><p>Once data is collected for training the models, the training itself needs to happen as well, requiring compute resources that are hard to come by.</p><p>Meanwhile, researchers in specific biological fields who analyze the data for specific questions might fit smaller models for their purposes. The authors of scvi-tools <a href="https://docs.scvi-tools.org/en/1.0.0/tutorials/notebooks/scvi_hub_intro_and_download.html">recently created infrastructure</a> to share pre-trained SCVI models on Hugging Face (Ergen et al. 2024).</p><p>Can we reduce training work by making use of multiple models trained individually on smaller datasets?</p><h2>Model merging</h2><p>When generative image models became popular a few years ago, many different models were trained and fine-tuned to be able to generate images in specific domains. Some models were good at generating landscapes, others were good at generating drawings.</p><p>What if you wanted a model that was good at generating drawings of landscapes?</p><p>Ideally you would get the combined training sets, and fine-tune a stable diffusion model with them. But this is resource intensive. As an alternative, people found that taking the two fine-tuned models and &#8216;merging&#8217; them can make a new model that leverages the strengths of both models (Wortsman et al. 2022).</p><p>In the generative image model and large language model community there are multiple platforms and packages for model merging. It is a popular way for hobbyists to customize models since it <a href="https://www.interconnects.ai/p/model-merging">requires no training resources</a>. One package, <a href="https://github.com/arcee-ai/mergekit">mergekit</a>, designed for merging popular large language models, suggest in their <a href="https://github.com/arcee-ai/mergekit/blob/main/docs/merge_methods.md">documentation</a> to start with linear merging and NuSLERP merging. </p><p>Linear merging simply means taking weighted averages of weights of two trained models,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;W_\\text{linear} = \\alpha \\cdot W_A + (1 - \\alpha) \\cdot W_B.&quot;,&quot;id&quot;:&quot;YTBHDSOLTP&quot;}" data-component-name="LatexBlockToDOM"></div><p>NuSLERP is a more complicated merge method, which normalizes weights to lie on a unit sphere, then perform interpolation in spherical geometry,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned}\na &amp;= \\frac{W_A}{||W_A||}, \\\\\nb &amp;= \\frac{W_B}{||W_B||}, \\\\\n\\theta &amp;= \\text{arccos}(\\left<a, b \\right>), \\\\\nu(\\alpha) &amp;= \\frac{\\sin(\\alpha \\cdot \\theta)}{\\sin \\theta} \\cdot a + \\frac{\\sin((1 - \\alpha) \\cdot \\theta)}{\\sin \\theta} \\cdot b, \\\\\nW_\\text{NuSLERP} &amp;= (\\alpha \\cdot r_A + (1 - \\alpha) \\cdot r_B) \\cdot u(\\alpha).\n\\end{aligned}&quot;,&quot;id&quot;:&quot;YDYHITRNGC&quot;}" data-component-name="LatexBlockToDOM"></div><p>It is hard to find proper evaluations of whether these merging strategies actually work. In particular, in the generative image model community users typically experiment iteratively until they get desirable images. If we want to apply these techniques to SCVI models, we can evaluate them using the reconstruction error.</p><h2>Testing model merging on SCVI models</h2><p>To evaluate whether we can combine strengths of two independent SCVI models, we want to model two very distinct pieces of biology with them, and see if merging the models will provide a clear benefit.</p><p>Brain cells, developed from the ectoderm, and liver cells, developed from the endoderm, perform extremely different functions. The main cell types, neurons and glia for the brain, hepatocytes for the liver, use distinct and specific transcriptional regulation to arrive at their final cell states over development.</p><p>Su et al. collected 82,168 cells from mouse liver to nonalcoholic fatty liver disease (NAFLD).</p><p>Hahn et al. collected 109,826 cells from mouse brain to study the effects of aging across the brain.</p><p>We can use these datasets together to learn how well model merging works by training models on them and evaluate how well a merged model works.</p><p>To ensure that both the liver model and the brain model learned the same amount of information from the data, the datasets were limited to the intersection of 21,576 measured genes, and both datasets were downsampled to 80,000 cells. For evaluation, 10,000 of those cells are held out as a test set from each of the datasets, leaving 70,000 cells each for training.</p><p>Both the liver model and the brain model were trained for 20 epochs on the 70,000 cells.</p><p>A even weighting of 0.5 was used when merging the models, both with linear merging and NuSLERP merging.</p><p>For a comparison, an SCVI model was trained on the combined dataset of 140,000 cells. The most fair comparison is to allot this combined model with the same fitting budget as the two individual models, meaning it was only allowed to train for 10 epochs. For some additional comparison, a fourth SCVI model was allowed to train for 20 epochs.</p><p>After training, performing linear merge, and NuSLERP merge, the two held out test sets with 10,000 cells from each of brain and liver were pushed through the six different models and used to calculate reconstruction error.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-sS8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-sS8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 424w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 848w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-sS8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/164061380?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-sS8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 424w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 848w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!-sS8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2262dab-2a4a-44b7-aa36-3346d95f6acb_2400x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The results give us some interesting information.</p><p>Applying a model trained on one organ to another organ is a bad idea. The individual models are highly specific to domain they were trained on.</p><p>Model merging substantially degrades performance on in-domain data, but substantially improves out-of-domain performance. The resulting reconstruction errors appear similar to what you would get if you evaluated both models on the test data and averaged the results (somewhat better for the brain test data, 16,000 instead of 22,000).</p><p>You obtain a far better, more general, model by combining data then fitting a model, given equal fitting budget.</p><p>Even doubling the fitting budget, a model fitted on the combined data is marginally worse than a model fitted on the individual dataset. This was a surprise: the difference is very small, but it is still present. The fitting was not repeated, the difference might be explained by stochasticity.</p><p>Unfortunately, model merging does not appear to be a good alternative to collecting large amounts of data and fitting a model on all of it.</p><p>Notebooks with code for the analysis described here are available on Github at <a href="https://github.com/vals/Blog/tree/master/250520-model-merging">https://github.com/vals/Blog/tree/master/250520-model-merging</a></p><h2>References</h2><p>Ergen, Can, Valeh Valiollah Pour Amiri, Martin Kim, Aaron Streets, Adam Gayoso, and Nir Yosef. 2024. &#8220;Scvi-Hub: An Actionable Repository for Model-Driven Single Cell Analysis.&#8221; <em>bioRxiv</em>. <a href="https://doi.org/10.1101/2024.03.01.582887">https://doi.org/10.1101/2024.03.01.582887</a>.</p><p>Wortsman, Mitchell, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, et al. 2022. &#8220;Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy without Increasing Inference Time.&#8221; <em>arXiv [Cs.LG]</em>. arXiv. <a href="http://arxiv.org/abs/2203.05482">http://arxiv.org/abs/2203.05482</a>.</p><p>Su, Qi, Sun Y. Kim, Funmi Adewale, Ye Zhou, Christina Aldler, Min Ni, Yi Wei, et al. 2021. &#8220;Single-Cell RNA Transcriptome Landscape of Hepatocytes and Non-Parenchymal Cells in Healthy and NAFLD Mouse Liver.&#8221; <em>iScience</em> 24 (11): 103233.</p><p>Hahn, Oliver, Aulden G. Foltz, Micaiah Atkins, Blen Kedir, Patricia Moran-Losada, Ian H. Guldner, Christy Munson, et al. 2023. &#8220;Atlas of the Aging Mouse Brain Reveals White Matter as Vulnerable Foci.&#8221; <em>Cell</em> 186 (19): 4117-4133.e22.</p><h2>From around the web</h2><ul><li><p><a href="https://ekernf01.github.io/multiplexed_perturbation_studies/">Multiplexed perturbations enable massive scale ... but how big can we go?</a></p></li><li><p><a href="https://animationobsessive.substack.com/p/breaking-away-from-disney-animation">Breaking Away from Disney Animation</a></p></li><li><p><a href="https://www.freaktakes.com/p/how-did-places-like-bell-labs-know">How did places like Bell Labs know how to ask the right questions?</a></p></li><li><p><a href="https://animationobsessive.substack.com/p/why-hey-arnold-sounded-like-that">Why &#8216;Hey Arnold!&#8217; Sounded Like That</a></p></li><li><p><a href="https://luispedro.substack.com/p/what-is-a-scientific-field">What is a scientific field?</a></p></li><li><p><a href="https://www.owlposting.com/p/what-happened-to-pathology-ai-companies">What happened to pathology AI companies?</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54425693513/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dCzz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dCzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166913,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54425693513/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/164061380?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dCzz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dCzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68c872f6-4a2f-49ea-ab17-abc3ac5ecc96_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[An attempt at speeding up TSNE using Apple MLX]]></title><description><![CDATA[I&#8217;m often encountering situations where I want to do a quick 2D visualization of some data.]]></description><link>https://www.nxn.se/p/an-attempt-at-speeding-up-tsne-using</link><guid isPermaLink="false">https://www.nxn.se/p/an-attempt-at-speeding-up-tsne-using</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Tue, 01 Apr 2025 05:36:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G81g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;m often encountering situations where I want to do a quick 2D visualization of some data. A decent tool for this is <a href="https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html">TSNE</a> (as well as the related tools <a href="https://umap-learn.readthedocs.io/en/latest/">UMAP</a> and <a href="https://pymde.org/">MDE</a>).</p><p>A few years ago, fast CUDA-based GPU implementations of these tools became available from <a href="https://docs.rapids.ai/api/cuml/stable/api/#tsne">cuML</a> and <a href="https://pymde.org/">PyMDE</a>, and I got used to being able to make a visualization in just a couple of seconds. Having <a href="https://www.nxn.se/p/cloud-gpus-for-scvi">switched to a Mac desktop</a> for hobby projects, the most frustrating loss has been the ability to make these quick exploratory visualizations on the fly.</p><p>The extremely fast TSNE implementation in cuML from <a href="http://rapids.ai/">rapids.ai</a> uses the FFT-accelerated approximation of tSNE originally published and implemented as <a href="https://github.com/KlugerLab/FIt-SNE">FI-tSNE</a>. Unfortunately, this implementation is challenging to install. To make optimized TSNE implementations available for use and experimentations, several variations of TSNE were implemented in the more user friendly package <a href="https://github.com/pavlin-policar/openTSNE">openTSNE</a>, including the version using FFT approximation.</p><p>The M-series processors in modern Macs have unified CPU/GPU architectures, where GPU cores can be used for computationally intensive tasks. Apple has released the <a href="https://ml-explore.github.io/mlx/build/html/index.html">MLX library</a> to enable general computations to take advantage of the GPU cores on the Apple silicon processors.</p><p>I was wondering if the FFT-accelerated TSNE could be implemented in MLX and speed up TSNE visualizations. I was also curious about using <a href="https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview">Claude Code</a> for translating implementations to new frameworks, so this seemed like a good opportunity.</p><p>Using the openTSNE package as context, I had Claude Code create a native MLX implementation of TSNE (MLXNative), and, following that, an MLX implementation of the FFT approximation (MLXFFT).</p><p>The MLXNative implementation worked fine, while the MLXFFT implementation ended up with a some edge cases leading to outliers and some uncontrolled gradients. It also seems MLX is lacking some FFT functionality compared to the FFT implementations used in cuML and FI-tSNE/openTSNE.</p><p>My primary interest was the runtime speed when making use of the M4 Pro GPU. If runtimes were promising, I figured it would be worth digging in to minor, potentially fixable, issues.</p><p>I investigated how the runtime to create TSNE embeddings depended on dataset size for the implementations, and compared that to the various implementations in openTSNE, using either one or twelve cores. I also compared runtimes with the default scikit-learn implementation of TSNE.</p><p>In addition, I benchmarked the <a href="https://docs.rapids.ai/api/cuml/stable/zero-code-change/">recently announced &#8216;zero code change&#8217; cuML acceleration</a> of the scikit-learn TSNE function. This used the same data, but was run on an L40S node on <a href="http://lightning.ai/">lightning.ai</a> instead of locally on the Mac.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G81g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G81g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!G81g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!G81g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!G81g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G81g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:234555,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160318523?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G81g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 424w, https://substackcdn.com/image/fetch/$s_!G81g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 848w, https://substackcdn.com/image/fetch/$s_!G81g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!G81g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9abafcf8-9e1e-4027-ab0c-0168a31996fc_2400x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Seeing the speed for the MLXNative implementation for small test sets was exciting and promising. Increasing the data set sizes indicated poor scaling.</p><p>Moving on to the MLXFFT implementation, the runtimes were very disappointing. I believe this is mostly due to lacking necessary low level functionality in the FFT library in MLX.</p><p>Both the MLXNative and MLXFFT had ~90% GPU utilization when running the benchmarks, so it does seem compute streams were correctly routed.</p><p>The only other implementation with as poor performance as MLXFFT was the openTSNE implementation of BH-TSNE when using 12 cores. The single core version actually performs a lot better.</p><p>So in the end, unfortunately, simply moving computation to the GPU cores of the M4 Pro doesn&#8217;t immediately provide automatic performance gains.</p><p>Both openTSNEFFT and opentTSNENNDescent with 12 cores are pretty usable (but still 3x slower than cuML-accelerated TSNE).</p><p>In practice, when I have datasets with 200k+ points I want to explore, I will probably use some cloud computing (or subsample to ~100k).</p><p>On a positive note, using Claude Code to create these implementations (including debugging, refactoring, benchmarking, optimization) was quite straightforward.</p><p>The MLX implementations of TSNE, along with benchmarking scripts, are available on this branch on Github: <a href="https://github.com/vals/openTSNE/tree/mlx-acceleration">https://github.com/vals/openTSNE/tree/mlx-acceleration</a>. A notebook for producing the results figure is available at <a href="https://github.com/vals/Blog/tree/master/250331-mlx-tsne">https://github.com/vals/Blog/tree/master/250331-mlx-tsne</a>.</p><h1>From around the web</h1><ul><li><p><a href="https://tedium.co/2025/03/29/severance-apple-remote-editing-weirdness/">Severed Edits</a></p></li><li><p><a href="https://www.freaktakes.com/p/the-third-university-of-cambridge">&#8220;The Third University of Cambridge&#8221;: BBN and the Development of the ARPAnet</a></p></li><li><p><a href="https://www.youtube.com/watch?v=3Wysn_hJ7IQ&amp;t=5284s">Wax Tailor - Que Sera (Phonovisions Symphonic Orchestra)</a> [YouTube]</p></li></ul><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/53947960851/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZkUz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZkUz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/53947960851/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160318523?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZkUz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ZkUz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0fe9d1-69ea-4626-8502-06e08e14ca23_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[SCVI — Inference-based optional integration]]></title><description><![CDATA[The batch-integrating SCVI model is a conditional variational autoencoder (cVAE, Sohn et al.]]></description><link>https://www.nxn.se/p/scvi-inference-based-optional-integration</link><guid isPermaLink="false">https://www.nxn.se/p/scvi-inference-based-optional-integration</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Fri, 28 Mar 2025 03:15:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bT7O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <a href="https://docs.scvi-tools.org/en/latest/api/reference/scvi.model.SCVI.html">batch-integrating SCVI model</a> is a conditional variational autoencoder (cVAE, Sohn et al. 2015), where the generative model can be written as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} z_n &amp;\\sim \\text{N}(0, 1), \\\\ h_n &amp;= f(z_n, s_n), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;RLGKFAKSHS&quot;}" data-component-name="LatexBlockToDOM"></div><p>I usually focus on this generative part of the model. It is what we use to interpret the model. Variational autoencoders (Kingma &amp; Welling, 2013) are divided into two parts: the generative model written above, and an <em>inference model</em> that is used to learn posterior distributions of z from the data. In my mind, I think of it as the part the matters the most.</p><p>Technically, you don&#8217;t need an inference model; you could consider just the generative model and find posterior distributions for all the latent z variables, using the observed data and the likelihood with Bayesian posterior sampling methods. But this is computationally impossible.</p><p>Stepping back a bit, the posterior distributions of the z latent variables could, slightly more efficiently, be estimated by <em>variational distributions</em>: simplified parametric distributions with the goal of being as similar to the true posterior distributions as possible:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_{ni} | Y \\sim q_\\theta(z_{ni}) = \\text{N}(\\mu_{ni}, \\sigma_{ni}) \\approx p(z_{ni} | Y).&quot;,&quot;id&quot;:&quot;SBVBGIKTZS&quot;}" data-component-name="LatexBlockToDOM"></div><p><em><a href="https://en.wikipedia.org/wiki/Variational_Bayesian_methods">Variational inference</a></em> is the strategy of approximating posterior distributions, which is an integration problem, by converting it to an optimization problem, where you are optimizing the parameters of the variational distributions.</p><p>If you have single-cell RNA-seq dataset with 100,000 cells, and you want to learn 10-dimensional representations of your cells, you would need to optimize 2 * 10 * 100,000 = 2,000,000 variational parameters, which ends up being a very difficult optimization problem.</p><p>One of the revolutionary insights in machine learning is that this optimization problem can be solved with neural networks (Kingma &amp; Welling, 2013). Instead of optimizing the variational parameters, you optimize the weights of neural networks that outputs the variational parameters given the observed data</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;z_{n} | Y \\sim q_\\theta(z_{n}) = \\text{N}(g_\\mu(y_n), g_\\sigma(y_n)) \\approx p(z_{n} | Y).&quot;,&quot;id&quot;:&quot;GFGLCBCWIM&quot;}" data-component-name="LatexBlockToDOM"></div><p>This is known as &#8216;autoencoding variational Bayes&#8217; or &#8216;amortized inference&#8217;, and is the other half of the variational autoencoder. We are very good at training neural networks, so turning a problem into a neural network training task ends up being a good solution. These neural networks represent the <em>inference model</em> in the variational autoencoder model.</p><p>In the case of the conditional variational autoencoder, there are two valid options for implementing the inference models:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{align} z_{n} | Y \\sim q_\\theta(z_{n}) &amp;= \\text{N}(g_\\mu(y_n), g_\\sigma(y_n)) \\approx p(z_{n} | Y), \\\\ z_{n} | Y \\sim q_\\theta(z_{n}) &amp;= \\text{N}(g_\\mu(y_n, s_n), g_\\sigma(y_n, s_n)) \\approx p(z_{n} | Y). \\end{align}&quot;,&quot;id&quot;:&quot;NBCGSOHRSM&quot;}" data-component-name="LatexBlockToDOM"></div><p>In option (1), the inference model is not aware of the batches to integrate. Only the generative model will be aware of the batches. The effect, though, is that the inference networks g will learn to map observed data to appropriate representations in an unsupervised manner. A strength with this choice is that you can infer z representations for new data without knowing if any particular batch category is equivalent.</p><p>In option (2), the inference model is explicitly informed of which batch the data comes from. The inference networks will use the interactions between the values in the observed data and the batches to learn potentially more efficiently to map observations to the appropriate representations. A strength with this choice is that you can obtain counterfactual representations. A weakness is that the model will not know how to embed data from a new batch without <a href="https://docs.scarches.org/en/latest/scvi_surgery_pipeline.html">performing architecture surgery</a>.</p><p>The default in the SCVI models from scvi-tools with batch integration is option (1), but option (2) <a href="https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCVI.html">can be enabled</a> with the <code>encode_covariates = True</code> option.</p><p>For a demonstration, we can replicate <a href="https://www.nxn.se/p/scvi-integrating-or-not">the previous post</a>, and compare representational embeddings between an SCVI model that does not integrate out the batches, and an SCVI model that integrates out batches, but in this case using the <code>encode_covariates = True</code> option.</p><p>This can be illustrated <a href="https://www.nxn.se/p/scvi-integrating-or-not">as in the previous post</a> using the AIDA data (Tian et al. 2024).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bT7O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bT7O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bT7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png" width="1200" height="1223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1223,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:959710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160041204?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bT7O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!bT7O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c731579-6300-4670-bc96-a11cc6810604_1200x1223.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The batch-integrating version of the SCVI model with batch-aware encoding leaves out donor-to-donor variation in the representations. The effect is the same when encoding batches in integration as with the default integration explored in the previous post.</p><h2>Optional integration using a conditional inference model</h2><p><a href="https://www.nxn.se/p/scvi-integrating-or-not">In the previous post</a>, we discussed how the choice of learning representations with or without contribution from known factors depends on the questions you aim to answer. In particular, we discussed how <a href="https://docs.scvi-tools.org/en/stable/api/reference/scvi.external.MRVI.html">the MrVI model</a> lets us work in both frameworks with a single model.</p><p>We can think about this problem when considering the SCVI model with encoded batches:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} z_n &amp;\\sim \\text{N}(0, 1), \\\\ z_{n} | Y &amp;\\sim  \\text{N}(g_\\mu(y_n, s_n), g_\\sigma(y_n, s_n)), \\\\ h_n &amp;= f(z_n, s_n), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;LMJPEOOLML&quot;}" data-component-name="LatexBlockToDOM"></div><p>Imagine having the option to &#8216;switch off&#8217; the s_n contribution to the encoders g and decoder f. Then this batch-integrated model would turn into the unintegrated model. Of course, the model will not know what to do if you simply remove the s_n input to the models.</p><p>We can give the model this ability by augmenting the training data. The original data can be expanded, so that each Y observation is used to train the model twice in an epoch: once with the observed batch category, and once with a new dummy batch category we can call <code>'unintegrated'</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RiyS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RiyS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 424w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 848w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 1272w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RiyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png" width="1200" height="725" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/081938c1-18ba-4bec-905c-09cac8013294_1200x725.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28799,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160041204?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RiyS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 424w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 848w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 1272w, https://substackcdn.com/image/fetch/$s_!RiyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F081938c1-18ba-4bec-905c-09cac8013294_1200x725.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, for the added data points where the batch category is 'unintegrated' the model and loss will be equivalent to an unintegrated model,</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} z_n &amp;\\sim \\text{N}(0, 1), \\\\ z_{n} | Y &amp;\\sim  \\text{N}(g_\\mu(y_n, \\texttt{'unintegrated'}), g_\\sigma(y_n, \\texttt{'unintegrated'})), \\\\ h_n &amp;= f(z_n, \\texttt{'unintegrated'}), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;PVKOOFKMRW&quot;}" data-component-name="LatexBlockToDOM"></div><p>After training the model with this expanded training data, we can create two sets of cell embeddings by running the observed data through the encoder g_mu with different settings for the batch categories:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} Z^*_\\text{integrated} &amp;= g_\\mu(Y, s), \\\\ Z^*_\\text{unintegrated} &amp;= g_\\mu(Y, \\texttt{'unintegrated'}). \\end{aligned}&quot;,&quot;id&quot;:&quot;LBNXISKBSE&quot;}" data-component-name="LatexBlockToDOM"></div><p>We can try this approach using the AIDA data, and see how these two different versions of the cell embeddings do or do not contain variation due to donor IDs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jxNi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jxNi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jxNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png" width="1200" height="1223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1223,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:846921,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160041204?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jxNi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!jxNi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b669334-df2c-47d3-8fed-9f2f8fd40236_1200x1223.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These results were surprising to me! My expectation was that encoding batches -case would integrate out differences between donors, while using the fixed <code>'unintegrated'</code> label would retain variation between donors. Instead we are seeing the opposite. When batches are provided to the encoder, it introduces variation between batches into the representations.</p><p>On the positive side, there is a substantial difference between the two versions of cell embeddings. I don&#8217;t understand why the model ends up having this behavior.</p><p>I have always viewed the inference models in VAE-based models as a clever solution to the optimization problem, focusing on the generative model. Through this experiment I have gotten an appreciation for how the encoders can be potentially be leveraged to solve problems. In particular I think it is interesting how we can expand the capabilities of a model by expanding and augmenting the training data.</p><p>Notebooks with analysis code are available on Github: <a href="https://github.com/vals/Blog/tree/master/250327-optional-integration">https://github.com/vals/Blog/tree/master/250327-optional-integration</a></p><h1>References</h1><p>Kingma, Diederik P., and Max Welling. 2013. &#8220;Auto-Encoding Variational Bayes.&#8221; <em>arXiv [<a href="http://stat.ml/">Stat.ML</a>]</em>. arXiv. <a href="http://arxiv.org/abs/1312.6114v10">http://arxiv.org/abs/1312.6114v10</a>.</p><p>Sohn, Kihyuk, Honglak Lee, and Xinchen Yan. 2015. &#8220;Learning Structured Output Representation Using Deep Conditional Generative Models.&#8221; <em>Neural Information Processing Systems</em>, December, 3483&#8211;91.</p><p>Tian, Chi, Yuntian Zhang, Yihan Tong, Kian Hong Kock, Donald Yuhui Sim, Fei Liu, Jiaqi Dong, et al. 2024. &#8220;Single-Cell RNA Sequencing of Peripheral Blood Links Cell-Type-Specific Regulation of Splicing to Autoimmune and Inflammatory Diseases.&#8221; <em>Nature Genetics</em> 56 (12): 2739&#8211;52.</p><h1>From around the web</h1><ul><li><p><a href="https://animationobsessive.substack.com/p/the-process-of-nimh">The Process of 'NIMH&#8217;</a></p></li><li><p><a href="https://animationobsessive.substack.com/p/who-cares-about-the-disney-method">Who Cares About the Disney Method?</a></p></li><li><p><a href="https://goodscienceproject.org/articles/managing-lockheeds-skunk-works/">Managing Lockheed&#8217;s Skunk Works</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/53948416240/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7yW2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7yW2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/53948416240/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/160041204?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7yW2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7yW2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F403b6725-a220-48f8-94c8-e853fbae0e6a_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[SCVI — Integrating or not?]]></title><description><![CDATA[In modern single cell analysis workflows you fit a model to your data that allow you to ask questions about the data.]]></description><link>https://www.nxn.se/p/scvi-integrating-or-not</link><guid isPermaLink="false">https://www.nxn.se/p/scvi-integrating-or-not</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Tue, 18 Mar 2025 02:28:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fbw6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In modern single cell analysis workflows you fit a model to your data that allow you to ask questions about the data.</p><p>A common question is &#8216;which cells are similar to each other?&#8217;</p><p>In the <a href="https://scvi-tools.org/">SCVI modeling framework</a>, vector representations of cells are learned so that similar vectors generate statistically similar transcriptional profiles. With some simplifications, the <a href="https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html">SCVI model</a> is</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} z_n &amp;\\sim \\text{N}(0, 1), \\\\ h_n &amp;= f(z_n), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;WXYNKLFPXS&quot;}" data-component-name="LatexBlockToDOM"></div><p>The representations reflect the varying kinds of transcriptional profiles we can expect to see given the data. Groups of similar cells form dense regions in the representation space, which are often used to assign cell types or cell states that are not known when the data is collected.</p><p>On the other hand, information we do know about the cells is also often reflected in the representation space. If you collect cells from a healthy person, as well as from a person undergoing an immune challenge, many of the immune cells in the blood will have different transcriptional profiles, reflecting their state changes as part of the immune response. In cell representation space, B cells from the healthy person will be in a different area than B cells from a person with an infection.</p><p>In the SCVI modeling framework, if you have kept track of which cell comes from which person, you can learn cell representation that does not include the variation due to the difference between the persons. With some simplifications, the SCVI model with integration can be be written as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} z_n &amp;\\sim \\text{N}(0, 1), \\\\ h_n &amp;= f(z_n, s_n), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;WUMVAGGNSY&quot;}" data-component-name="LatexBlockToDOM"></div><h2>Should you do this?</h2><p>This depends on the questions you want to find answers to using the cell representations.</p><p>You might want to learn which cells are B cells, and count how many there are from each person. This assumes there is a definition of B cells that is shared between the two people. At this point you are not interested in the different states of the B cells from the two people. If you learn representations that integrate out the differences between the blood donors, B cells will form densities in cell representation space that is consistent between the donors.</p><p>Or, you might want to define specifically a B cell state that is present in a particular infection. Not all cell types may change state due to an infection, and you might want to find which cell types react to an infection by explicitly asking the question of whether the cells from the two people are separated in representation space when looking at a specific cell type.</p><p>This leads to a situation where you have to use two different models depending on the questions you want to answer.</p><p>It would be better to be able to answer all our different questions using a single model.</p><p>As an example, we can use the recently published Asian Immune Diversity Atlas (AIDA, Tian et al 2024). The complete dataset has 1.1 million blood cells collected from 503 donors. If we sample 50 donors, we end up with ~100k cells with 33 different cell types annotated. We can fit one SCVI model without integration, and one model where the variation between donors is integrated out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fbw6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fbw6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 424w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 848w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 1272w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fbw6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png" width="1200" height="1224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1224,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:938467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/159305684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fbw6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 424w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 848w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 1272w, https://substackcdn.com/image/fetch/$s_!fbw6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0790e1c-17bd-435e-852f-4b0cef426313_1200x1224.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the unintegrated version of the model, cell types split up by donor, reflecting variation between individuals, while in the integrated model we no longer see differences between individuals, but cell types are consistently defined.</p><h2>Do both at once</h2><p>The new <a href="https://docs.scvi-tools.org/en/stable/user_guide/models/mrvi.html">MrVI model</a> in <a href="https://docs.scvi-tools.org/en/stable/user_guide/index.html">scvi-tools</a> solves this problem by learning two levels of variational representations of the cells (Boyeau et al 2024). With some simplifications, we can think of the MrVI model as</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{aligned} u_n &amp;\\sim \\text{N}(0, 1), \\\\ z_n | u_n &amp;\\sim \\text{N}(u_n, 1), \\\\ z_n &amp;= f(u_n, s_n), \\\\ h_n &amp;= g(u_n), \\\\ Y_{ng} &amp;\\sim \\text{NB}(\\ell_n \\cdot h_{ng}, r_g). \\end{aligned}&quot;,&quot;id&quot;:&quot;FQVZLVBRKA&quot;}" data-component-name="LatexBlockToDOM"></div><p>There are some additional important architecture differences from the SCVI model, but for the sake of the issue discussed here, this reflects the intuition behind the model. Importantly, there are two representation spaces for the cells: U and Z. The U-space reflects cell-cell variation with variation between samples integrated out. The interaction between sample-sample variation and cell-cell variation gets introduced when moving to the Z-space.</p><p>The result of this is that consistent definitions of cell types will be reflected in the U-space, while different states of the cell types will be reflected in the Z-space.</p><p>Now we can fit a single MrVI model to the AIDA data, and visualize the two different representations for the single cells.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fE7J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fE7J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fE7J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png" width="1200" height="1223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1223,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1008663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/159305684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fE7J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 424w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 848w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 1272w, https://substackcdn.com/image/fetch/$s_!fE7J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d23306b-20ee-4d09-abdf-857a3dc48b08_1200x1223.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now with the MrVI model, the U-space have consistent cell types are consistently defined for all donors, while the Z space reflects variation between individuals for the different cell types.</p><h1>References</h1><p>Boyeau, Pierre, Justin Hong, Adam Gayoso, Michael I. Jordan, Elham Azizi, and Nir Yosef. 2022. &#8220;Deep Generative Modeling for Quantifying Sample-Level Heterogeneity in Single-Cell Omics.&#8221; <em>bioRxiv</em>. <a href="https://doi.org/10.1101/2022.10.04.510898">https://doi.org/10.1101/2022.10.04.510898</a>.</p><p>Tian, Chi, Yuntian Zhang, Yihan Tong, Kian Hong Kock, Donald Yuhui Sim, Fei Liu, Jiaqi Dong, et al. 2024. &#8220;Single-Cell RNA Sequencing of Peripheral Blood Links Cell-Type-Specific Regulation of Splicing to Autoimmune and Inflammatory Diseases.&#8221; <em>Nature Genetics</em> 56 (12): 2739&#8211;52.</p><h1>From around the web</h1><ul><li><p><a href="https://arxiv.org/abs/2503.02113">Deep Learning is Not So Mysterious or Different</a></p></li><li><p><a href="https://www.filfre.net/2024/12/half-life/">Half-Life</a></p></li><li><p><a href="https://protocolized.summerofprotocols.com/p/strange-new-rules">Strange New Rules</a></p></li><li><p><a href="https://contraptions.venkateshrao.com/p/discworld-rules">Discworld Rules</a></p></li><li><p><a href="https://xkcd.com/3056/">RNA</a></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/54389581621/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!agrW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!agrW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!agrW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!agrW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!agrW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/54389581621/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/159305684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!agrW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!agrW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!agrW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!agrW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94a72fc7-1d28-4a03-be9b-fb14505582d4_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Cloud GPUs for SCVI]]></title><description><![CDATA[When I moved last year I decided to get rid of my Windows PC.]]></description><link>https://www.nxn.se/p/cloud-gpus-for-scvi</link><guid isPermaLink="false">https://www.nxn.se/p/cloud-gpus-for-scvi</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Thu, 06 Mar 2025 05:52:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NUG9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I moved last year I decided to get rid of my Windows PC. Windows is in a poor state. Later this year, <a href="https://www.microsoft.com/en-us/windows/end-of-support">Microsoft is ending support for Windows 10</a>. My motherboard didn&#8217;t support the TPM hardware required to run Windows 11, so later this year I would need to get a new motherboard, which also means getting a new processor and RAM. The latest generation Intel CPU&#8217;s were <a href="https://www.theverge.com/2024/7/26/24206529/intel-13th-14th-gen-crashing-instability-cpu-voltage-q-a">plagued by severe quality issues</a> and did not have stellar performance. The CrowdStrike event made it clear Windows as a whole is <a href="https://www.theverge.com/2024/7/19/24201864/crowdstrike-outage-explained-microsoft-windows-bsod">not a reliable platform</a>. In addition to all this, Windows 11 seems like a terrible environment even when working as expected, with advertisements, pop-ups, surprise setting changes and on and on.</p><p>Meanwhile, I have used Mac laptops and desktops for work for a long time. The ease of doing simple tasks, using third party software like the Adobe suite, or making use of external hardware like cameras and audio devices, is substantial compared to doing the same on Windows.</p><p>My main motivation to keep using Windows was to use GPU acceleration, which is incredible even with cheaper devices. For example, Lightroom uses GPU acceleration for image processing.</p><p>One of my hobbies is to play with machine learning models like SCVI, which is painfully slow without the ability to use a GPU.</p><p>After Apple moved to M series processors, a huge amount of the things I used the GPU for was incredibly fast on the M series CPUs. So I got rid of my Windows PC, and eventually got a Mac desktop.</p><p>While hardware acceleration for machine learning models on the M series processors is developing, it is still very slow and unreliable compared to GPU acceleration.</p><p>To keep playing with machine learning models, I looked around for cloud alternatives. I tried a few and found user testimonials of several. For now I have settled on using <a href="http://lightning.ai/">lightning.ai</a>.</p><p>On <a href="http://lightning.ai/">lightning.ai</a> you have four GPU options on the basic account: the L40S, L4, A10G, and T4 Nvidia GPUs. Since the main models I play with are related to scvi-tools, I wanted to compare how the performance of those available GPUs compare relative to typical prices (which vary daily, but some are typically cheaper). The SCVI-based models are quite small by modern standards, so higher specs such as GPU memory is not necessarily more useful for these models.</p><p>To benchmark the GPUs, I took a dataset with about 200k cells and trained a standard SCVI model for 10 epochs, noting the training times. I also ran the same training on my Mac desktop for comparison.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NUG9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NUG9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NUG9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png" width="1456" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59234,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/158495039?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NUG9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 424w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 848w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 1272w, https://substackcdn.com/image/fetch/$s_!NUG9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a19fec1-8664-4529-884a-b58927e83ebe_1500x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>All the GPU options are substantially faster than the M4 Pro in the Mac. For my purposes, the L4 GPU, which ran the 10 epochs of training in 2:30, is the optimal choice. Only the L40S (which has 48 GB of memory compared to the 24 GB in the L4) was faster, at 2:16 for 10 epochs.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.flickr.com/photos/val_s/53948289479/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IeWs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IeWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg" width="799" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://www.flickr.com/photos/val_s/53948289479/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nxn.se/i/158495039?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IeWs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IeWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4911fbf9-e3a7-4d3d-a9cf-f840c31d4671_799x533.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[The strength of a hand]]></title><description><![CDATA[In this video I model the scaling strengths of the different poker hands in Balatro.]]></description><link>https://www.nxn.se/p/the-strength-of-a-hand</link><guid isPermaLink="false">https://www.nxn.se/p/the-strength-of-a-hand</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Mon, 03 Mar 2025 02:13:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/86zNxdB1qIY" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-86zNxdB1qIY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;86zNxdB1qIY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/86zNxdB1qIY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>In this video I model the scaling strengths of the different poker hands in <a href="https://en.wikipedia.org/wiki/Balatro">Balatro</a>.</p>]]></content:encoded></item><item><title><![CDATA[Valentine Svensson]]></title><description><![CDATA[What do you mean &#8216;heterogeneity&#8217;?]]></description><link>https://www.nxn.se/p/coming-soon</link><guid isPermaLink="false">https://www.nxn.se/p/coming-soon</guid><dc:creator><![CDATA[Valentine Svensson]]></dc:creator><pubDate>Sun, 04 Aug 2024 20:55:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yq7P!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6948d04d-c52b-49f7-ace9-686a067382a2_600x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>What do you mean &#8216;heterogeneity&#8217;?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.nxn.se/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.nxn.se/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>