Submissions that include data and code for journal publications

gkamener · March 7, 2024, 5:38pm

Hi IM friends,

I recently received a data submission that includes raw data files + multiple R scripts used to conduct a complex workflow of analyses on those data for an upcoming journal submission. Submissions with data + code I’ve handled until now have been fairly simple (with just a data file or two and one R script). Those submissions were then published in EDI. The submitter in this case would like to make the data + code available for journal reviewers to run in a reproducible way, and I’m not sure just uploading everything to EDI would be the best practice. Do any other IMs have experience with data + code submissions, and how would you suggest making such data and code available for reviewers?

I found one strategy used by Christensen et al. where raw JRN data in EDI were cited, and the authors made all data + code available in a Zenodo copy of a GitHub repository. Would publishing the raw data to EDI and making the files + code available through Zenodo be a good practice to follow if a researcher is proficient in GitHub?

Thanks

nick.lyon · March 8, 2024, 2:17pm

We at the LNO had a working group with a similar issue that did the following:

Archived all data in a public data repository
Edited their scripts to assume data were downloaded from the data repository in step 1
Created a GitHub “release” to tag that version of the scripts
Put the ZIP file created by that GitHub release in a public code archive

The editing of the scripts phase can be cumbersome though depending on how the scripts are originally written so that may not be the best solution in this case. Hope this helps as one option though!

whiteaker · March 8, 2024, 2:59pm

Is there a good public code archive out there? Zenodo is convenient if using GitHub, but are there discovery or access advantages compared to archiving the code in EDI?

metamattj · March 8, 2024, 4:18pm

I think there are different advantages of putting code in different places. We support various combinations of:

Code in same data package with associated data in KNB, ADC, EDI, Dryad, Zenodo, etc.
- code close to data, but harder to maintain reusable software package structure and releases, easy to include “literate” computational notebooks that help tie computation to the data and work
Code in its own package in any of those same repositories
- works fine, code has its own release stream, can be cited independently, but harder to mirror to the code release process (except Zenodo)
Code in its own package in Software Heritage, mirrored from git
- code has its own release stream, code package structure mirrors packaging structure from the code repository, software releases/tags from code repo automatically create new releases in software heritage, repository supports CodeMeta software metadata standard

We generally recommend a combination of (1) for simple scripts and notebooks that are tightly tied to an analysis or particular paper/report, and (3) for reusable software that is nicely packaged, such as R packages or python packages.

Another issue is how data are linked in to analytical code, and we recommend either useing a content-based identifier, or at least using a stable repository URI that the repository commits to supporting over decadal time periods. This is often not the URI of the dataset landing page. We have a discussion of approaches to reproducible data access, illustrated with some EDI data packages: NCEAS Open Science Synthesis for the Delta Science Program - 11 Reproducible Data Access

Matt

Topic		Replies	Views
Ecology Letters Introduces Data editors to the publication process General information-mgmt , data-citation	1	159	December 13, 2023
New GitHub for Eco & Evo paper General information-mgmt , GitHub	0	138	July 27, 2023
EDI is moving to the cloud! General information-mgmt	1	65	January 30, 2025
Updates to ezEML research metadata editor (at EDI repository) General information-mgmt	0	93	April 23, 2024
Data Manager for Stone Living Lab General jobs , data-management	0	143	September 21, 2023

Submissions that include data and code for journal publications

Related topics