Claude agent skill to infer experimental designs from .h5ad’s
Anthropic recently launched Claude for Life Sciences, which in practical terms means a marketplace of plugins, MCP’s, and agent skills focused on tasks you encounter in biomedical research.
Part of this is an agent skill to perform QC analysis for scRNA-seq data using scanpy. Agent skills were also introduced recently. These are folders with instructions about when to use the skill, what to do in the cases where the skill gets triggered, and helper tools in the form of CLI scripts to perform the tasks needed.
When I want to explore some idea in scRNA-seq analysis, I usually start with finding a new dataset to work with. I get tired of using the same dataset all the time, and different experimental designs are good for trying different ideas.
Once I get a new interesting dataset I usually try to figure out the design of the experiment based on how the original researchers structured the data. There are many pieces of information that often tell the story of what was done. The data might be in different files, where the file names are informative. The different barcodes can indicate which experimental conditions were performed first or last. Naming conventions in samples tell you what original idea was, what might have been tacked on later, and comparisons that make sense that the researchers didn’t report on for some reason. It is a puzzle to solve.
This can take some time, so I created a Claude agent skill to do this, available at https://github.com/vals/anndata-design-inspector.
The skill provides helper scripts that uses hdf5-tools to explore column names and categories of a provided .h5ad file straight from the command line without needing to load the full dataset. If the tools aren’t available, the skill will install them.
The goal of the skill is to describe the experimental design of the dataset. I wanted this to be concise and easy to compare between datasets. There has been some work on this, for example using Hasse diagrams. The currently available options are very flexible and complete, but unintuitive in the settings I usually encounter. To help with this, I created a domain specific language that defines a grammar for experimental designs. The goal of the agent then becomes to produce a string in this grammar that describes the experiment. The package with the grammar definition also includes code for parsing the grammar into an ASCII-art visualization of the structure of the experiment. This package is also installed by the skill to create visualizations.
┌─────────────────────── Design Structure ───────────────────────┐
│ │
│ ProcessingBatch(6)═════════════════════════════════════════╗ │
│ ║ │
│ Center(3) ≈≈≈≈ Protocol(2) ║ │
│ ↓ ↓ ║ │
│ Patient([30 | 25 | 18]) ║ │
│ ↓ ║ │
│ Sample(2) ═════════════════════════════════════════════════╝ │
│ ↓ │
│ Cell(~5000) │
│ : │
│ CellType(42) │
│ │
│ Confounded: Center ≈≈ Protocol │
│ Batch: ProcessingBatch ══ Sample │
└────────────────────────────────────────────────────────────────┘
In addition to puzzling out the experimental design, the agent using the skill infers the research context of the experiment and provides some comments about the biological theory behind the experiment.
This all gets reported in a summarized ‘experiment card’ markdown file following a standard structure. With these markdown files I can get an idea of what the experiment was at a glance. I put up a couple of examples of experiment cards, for the datasets GSE166504 and GSE290106, as gists.
The visualizations simplify all the subtleties of the design, but are design to highlight important aspects of the design such as crossing and nesting at different levels, which helps you think about how to handle the different types of variation and confounding present in the data.
I’ve used it on ~20 different datasets, and usually performs quite well. Sometimes it over-summarizes the experiment so you might miss some hierarchical structures, but it tends to give you a lot of information about what was going on in it.


