Open Data Training

Guide 2: Data Preservation & Archiving

The basics of getting your data ready for preservation and archiving.

yeah bar



This training module is a very quick introduction to data management and documentation, and for those who know a bit but want to know more. This material was produced by Mozilla Science Lab, a program to encourage the use of open source practices and web technologies to do better science.



Student Prerequisites


Total Time to Complete

About 1 hour, including transitions.

Learning Objectives

  • Identify common data management errors
  • Write human-readable metadata
  • Describe the benefits of machine-readable metadata

Content Outline

Instructor Guide

Instructor Prerequisites:

  1. Topic 1: Introductions and Discussion about Open Data

    • Introductions

      • Instructor (3 minutes)
        • Explain your background, how you became involved in open data.
      • Why this training, why Mozilla? (1 minutes)
        •   • Intro the training series, and how it was created (collaboration, sprints, output of fellows program)
        •   • Structure of the session, content exploration through activities
        •   • Why MSL and Mozilla are involved, your relationship to Mozilla
      • Why open data now? (1 minutes)
        •   • More data than ever-- define types of data here
        •   • Pressure from funders, want more impact from data
        •   • The web as sharing/collaboration tool
  2. Topic 2: Share frustrations

    • What problems have you (might you) encounter when trying to understand someone else’s data?

        Share a brief anecdote about a time you’ve experienced a challenge with understanding someone else’s data. (2 minutes)

        Break participants into small groups of 3-4 people and encourage participants to share their own anecdotes about times they’ve struggled with trying to understand or using poorly documented data. The goal here is to warm participants to the idea of doing a bit of ‘extra’ work of data documentation and data hygiene up front. (4 minutes)

      1. Suggested follow up question: Has there ever been a time you’ve struggled to understand data you produced, yourself? (4 minutes)

        Depending on the size of the group, insights can be shared whole group or recorded on a white board, etherpad or other shared document.

  3. Topic 3: Common Spreadsheet Issues

    Use the list on the Quartz Guide to Bad Data as an example of problematic data formatting issues.

    Project the messy dataset file from Data Carpentry at the front of the room or share the file so that students can open it on their personal computers.

    At the whiteboard or in an etherpad, develop a list of problems that would need to be addressed with the data for it to be understood or reused.

    If you’re working with learners from a particular field, choosing a messy dataset from that field works best.

  4. Topic 4: Metadata

    The goals for this lesson are to help students understand the benefits of open data, how to encourage others to make their data open, and to identify what you can do with your own data to make it possible for someone to build on your work.

    So what exactly is metadata? What does metadata need to include?
    What is Metadata? (YouTube Video) (5 minutes)

    Review Metadata, A Love Note to the Future example in Primer 2.

  5. Topic 5: Writing a DATA-README

    The DATA-README is a human-readable metadata document that students should get in the habit of making with every dataset they produce.

    Activity: have students write a DATA-README for a project they’re working on. A template is available here.

  6. Topic 6: Resources and Wrap

    Provide links to Primer 1, and other relevant resources.

Home | Next Lesson