A screenshot of DataLad-Registry web UI

DataLad-Registry: Bringing Benefits of Centrality to DataLad

DataLad provides a platform for managing and uniformly accessing data resources. It also captures basic provenance information about data results within Git repository commits. However, discovering DataLad datasets or Git repositories that DataLad has operated on can be challenging. They can be shared anywhere online, including popular Git hosting platforms, such as GitHub, generic file hosting platforms such as OSF, neuroscience platforms, such as GIN, or they can even be available only within the internal network of an organization or just one particular server. We built DataLad-Registry to address some of the problems in locating datasets and finding useful information about them. (For convenience, we will use the term “dataset”, for the rest of this blog, to refer to a DataLad dataset or a Git repo that has been “touched” by the datalad run command, i.e. one that has “DATALAD RUNCMD” in a commit message.) ...

2024-12-06 · 9 min · 1706 words · Isaac To, Austin Macdonald, Yaroslav O Halchenko