Getting Started

This checklist is designed to help you understand what someone outside your research project (or you in 5-10 years) would need to know about your data in order to build on your work. For more information on preparing your data for reuse, check out our exercise on how to plan for data reuse.

Image attribution: Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779



What is the title of the data set?
  • List a title that goes beyond just the filename. Be descriptive, provide context.
    • Example: “Migration patterns on Columbia River Delta” NOT “final.csv”
Are there any related research publications for this data? Are there any existing data sets that were used to create this data set?
  • Add a citation to any relevant publications for this dataset with a link, preferably a persistent identifer. (For more information on persistent identifiers, see our References page.)


Who is responsible for the data?
  • List principal investigator/s or research group that collected or contributed to the data.
    • Example: Dr. Phoebe Marshwana, Agriculture Lab, Michigan State University
Who can answer questions about the data?
  • Consider a lab email or other contact method that won’t change as people move on in their careers.
    • Example: Climate Impacts Group, University of Washington: cig @


Where was the data collected?
  • Can be multiple locations, geographic range. Use geographic coordinates, if possible.
    • Example: Skukuza, Swaziland
Where does the data live?


When was the data collected? What time span does the data cover?
  • Use the international standard date format (YYYY-MM-DD and try to be as specific as possible.
    • Example: 2015-07-01 to 2015-12-31
    • Example: Collected: June 2015. Data coverage: 1932-1944


How was the data collected?
  • Think of the steps taken to collect the data, the instruments and software used.
    • Simplified example: Minimum and maximum observed temperature for each day was calculated at morning high tide using CoolRead thermometers calibrated using the XYZ method
How was the data processed
  • Steps taken to clean and analyze the data including tools & software, how null or missing data were handled.
    • Example: Differences in site mortality were determined through survdiff tests performed using X software version 2.10.3. Comparisons with a p-value less than 0.05 (P gt 0.05) were considered different. Null data is coded as 777 and missing data with 999.
  • If you wrote code for processing the data, provide information on where it can be found.
How may this data be used by others?


If you have suggestions for improving this checklist, feel free to submit an issue on GitHub here or you can contact Mozilla Science Lab at We’d like to see what you can do with this. Please fork and make it your own!