Snakemake + DataLad + Worktrees: Automated Pipelines with Provenance Tracking

Previously on git worktrees… In the previous post, I introduced a workflow for running datalad run commands in a dedicated git worktree in a different branch while continuing development in the main worktree. The batch-processing script was a plain bash loop — it got the job done, but it had no notion of what had already run, what was stale, or what depended on what. If the script failed half-way, a rerun after the fix would either rerun everything again that did not fail, or require me to manually comment out jobs. It also only ran one procedure for multiple subjects, but not all procedures for one subject. ...

2026-04-18 · 14 min · 2909 words · Jiameng Wu

Git Worktrees + DataLad: The Missing Link Between Daily Development and Reproducible Analysis

Prelude: Scientist in a data labyrinth As experimental neuroscientist in training, I often find myself caught between two worlds: the messy, exploratory world of data analysis where I try to make sense of the experimental data, often relying on a trial-and-error strategy, and the aspired structured, reproducible world of scientific publication where hopefully every figure will be exactly reproducible. Between these worlds lies a labyrinth of processing pipelines, half-written scripts, and the ever-present risk of breaking the working analysis while trying to improve it. The wandering in the labyrinth is rarely straight-forward – one is expected to hit many dead ends and discover other interesting distractions before finding the actual treasure – the key results that hopefully lead to a scientific publication or other forms of consolidated knowledge piece. I try to illustrate this metaphoric labyrinth that is my non-metaphoric reality in a diagram: ...

2025-12-08 · 23 min · 4840 words · Jiameng Wu