Screenshot of a video page of the dataset described in this post as hosted at https://hub.datalad.org/distribits/recordings, and the FFmpeg, HTCondor, git-annex, and DataLad logos on top.

Fairly big video workflow

Two years ago, my colleagues published FAIRly big: A framework for computationally reproducible processing of large-scale data. In this paper, they describe how to partition a large analysis (their example: processing anatomical images of 42 thousand subjects from UK Biobank), using DataLad to provision data and capture provenance, so that individual results can be reproduced on a laptop, even though a cluster is needed to run the entire group analysis. The article is accompanied by a workflow template and a tutorial dataset....

2024-08-16 · 20 min · 4076 words · Michał Szczepanik