This checklist is designed to help you understand what someone outside your research project
(or you in 5-10 years) would need to know about your data in order to build on your work. For more
information on preparing your data for reuse, check out our
exercise on how to plan for data reuse.
Image attribution: Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779
Checklist
WHAT
What is the title of the data set?
List a title that goes beyond just the filename. Be descriptive, provide context.
Example: “Migration patterns on Columbia River Delta” NOT “final.csv”
Are there any related research publications for this data? Are there any existing
data sets that were used to create this data set?
Add a citation to any relevant publications for this dataset with a link, preferably a persistent identifer. (For more
information on persistent identifiers, see our References page.)
Example: Forstmann BU, et al. (2014) Multi-modal ultra-high resolution
structural 7-Tesla MRI data repository. Scientific Data, 1:140050.
(http://www.nature.com/articles/sdata201450)
Example: Keating JN, Donoghue PCJ (2016) Data from: Histology and affinity of anaspids, and the early evolution of
the vertebrate dermal skeleton. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.k2qc4
WHO
Who is responsible for the data?
List principal investigator/s or research group that collected or contributed to the data.
Example: Dr. Phoebe Marshwana, Agriculture Lab, Michigan State University
Who can answer questions about the data?
Consider a lab email or other contact method that won’t change as people move on in their careers.
Example: Climate Impacts Group, University of Washington: cig @ uw.edu
WHERE
Where was the data collected?
Can be multiple locations, geographic range. Use geographic coordinates, if possible.
Example: Skukuza, Swaziland
Where does the data live?
Add a link to the repository where data is shared, preferably using a persistent identifier.
(Need help finding a repository? See the "Resources" on our
References page.)
When was the data collected? What time span does the data cover?
Use the international standard date format (YYYY-MM-DD hh:mm.ss) and try to be as specific as possible.
Example: 2015-07-01 to 2015-12-31
Example: Collected: June 2015. Data coverage: 1932-1944
HOW
How was the data collected?
Think of the steps taken to collect the data, the instruments and software used.
Simplified example: Minimum and maximum observed temperature for
each day was calculated at morning high tide using CoolRead thermometers
calibrated using the XYZ method
How was the data processed
Steps taken to clean and analyze the data including tools & software, how null or missing data were handled.
Example: Differences in site mortality were determined through survdiff tests
performed using X software version 2.10.3. Comparisons with a p-value less than 0.05 (P gt 0.05) were
considered different. Null data is coded as 777 and missing data with 999.
If you wrote code for processing the data, provide information on where it can be found.
Example: This data set is covered by the CC0 1.0 Universal (CC0 1.0) license.
To the extent possible under law, Sharee Davis has waived all copyright and
related or neighboring rights to this data set. This work is published from: United States
Feedback
If you have suggestions for improving this checklist, feel free to submit an issue on
GitHub here or you can contact Mozilla Science Lab at
sciencelab@mozillafoundation.org.
We’d like to see what you can do with this. Please fork and make it your own!