A screenshot of https://hub.datalad.org/hcp-openaccess, and the Forgejo, git-annex, and DataLad logos on top.

Hosting really large datasets with Forgejo-aneksajo

One scenario where DataLad shines is managing datasets that are larger than what a single Git repository can deal with. The combination of git-annex’s capabilities to separate Git hosting from data hosting in extremely flexible ways with DataLad’s approach to orchestrating collections of nested repositories as a joint “mono repo” is the foundation for that. One example of such a large dataset is the WU-Minn HCP1200 Data, a collection of brain imaging data, acquired from more than a thousand individual participants by the Human Connectome Project (HCP)....

2024-08-27 · 7 min · 1296 words · Michael Hanke
Screenshot of a video page of the dataset described in this post as hosted at https://hub.datalad.org/distribits/recordings, and the FFmpeg, HTCondor, git-annex, and DataLad logos on top.

Fairly big video workflow

Two years ago, my colleagues published FAIRly big: A framework for computationally reproducible processing of large-scale data. In this paper, they describe how to partition a large analysis (their example: processing anatomical images of 42 thousand subjects from UK Biobank), using DataLad to provision data and capture provenance, so that individual results can be reproduced on a laptop, even though a cluster is needed to run the entire group analysis. The article is accompanied by a workflow template and a tutorial dataset....

2024-08-16 · 20 min · 4076 words · Michał Szczepanik
A screenshot of a tutorial repository with a video showing, and the Forgejo, git-annex, and DataLad logos on top.

Self-hosted and git-annex enabled data store with Forgejo

If we are being honest, hosting DataLad datasets is not the most straightforward thing in the world. Sure, through the power of git-annex, and with the help of a range of DataLad extension packages they can be put pretty much anywhere. But the fact that it is possible does not imply simplicity. There used to be a time when GitLab supported git-annex repositories directly. But with the arrival of Git LFS it got removed....

2024-07-21 · 11 min · 2155 words · Michael Hanke
A screen shot of the Navidrom web UI with the logos of Navidrome, Picard, git-annex, and DataLad on top of it.

Self-hosted music streaming from a git-annex repository

Managing my music collection was the very first thing I tried with git-annex. That was more than 13 years ago, and till today I continue to use the very same git-annex repository for this purpose. Screenshot of the first commit’s git log of my oldest git-annex repository that is still in active use today. I still have a clone on my laptop. I had others on machines like the small box plugged into the wall with speakers connected and running mpd....

2024-07-18 · 12 min · 2554 words · Michael Hanke