Open Data Training
Guide 2: Data Preservation & Archiving
Description
Intro
This training module is a very quick introduction to data management and documentation, and for those who know a bit but want to know more. This material was produced by Mozilla Science Lab, a program to encourage the use of open source practices and web technologies to do better science.
Level
Beginner/Novice
Student Prerequisites
None.
Total Time to Complete
About 1 hour, including transitions.
Learning Objectives
- Identify common data management errors
- Write human-readable metadata
- Describe the benefits of machine-readable metadata
Content Outline
- Workshop introductions and an introduction to our topics (5 minutes)
- Share: Frustrations trying to understand someone else's data (10 minutes)
- Common spreadsheet issues: an overview (10 minutes)
- Metadata: a love letter to the future (10 minutes)
- Writing a DATA-README (25 minutes)
- Additional Resources & Wrap Up (5 minutes)
Instructor Guide
Instructor Prerequisites:
- Familiarity with Open Data Training Primer 2
- Close review of Instructor Guides and all supporting materials for this module
-
Topic 1: Introductions and Discussion about Open Data
Introductions
- Instructor (3 minutes)
- Explain your background, how you became involved in open data.
- Why this training, why Mozilla? (1 minutes)
- • Intro the training series, and how it was created (collaboration, sprints, output of fellows program)
- • Structure of the session, content exploration through activities
- • Why MSL and Mozilla are involved, your relationship to Mozilla
- Why open data now? (1 minutes)
- • More data than ever-- define types of data here
- • Pressure from funders, want more impact from data
- • The web as sharing/collaboration tool
- Instructor (3 minutes)
Topic 2: Share frustrations
What problems have you (might you) encounter when trying to understand someone else’s data?
- Suggested follow up question: Has there ever been a time you’ve struggled to understand data you produced, yourself? (4 minutes)
Depending on the size of the group, insights can be shared whole group or recorded on a white board, etherpad or other shared document.
Share a brief anecdote about a time you’ve experienced a challenge with understanding someone else’s data. (2 minutes)
Break participants into small groups of 3-4 people and encourage participants to share their own anecdotes about times they’ve struggled with trying to understand or using poorly documented data. The goal here is to warm participants to the idea of doing a bit of ‘extra’ work of data documentation and data hygiene up front. (4 minutes)
- Suggested follow up question: Has there ever been a time you’ve struggled to understand data you produced, yourself? (4 minutes)
Topic 3: Common Spreadsheet Issues
Use the list on the Quartz Guide to Bad Data as an example of problematic data formatting issues.
Project the messy dataset file from Data Carpentry at the front of the room or share the file so that students can open it on their personal computers.
At the whiteboard or in an etherpad, develop a list of problems that would need to be addressed with the data for it to be understood or reused.
If you’re working with learners from a particular field, choosing a messy dataset from that field works best.
Topic 4: Metadata
The goals for this lesson are to help students understand the benefits of open data, how to encourage others to make their data open, and to identify what you can do with your own data to make it possible for someone to build on your work.
So what exactly is metadata? What does metadata need to include?
What is Metadata? (YouTube Video) (5 minutes)Review Metadata, A Love Note to the Future example in Primer 2.
Topic 5: Writing a DATA-README
The DATA-README is a human-readable metadata document that students should get in the habit of making with every dataset they produce.
Activity: have students write a DATA-README for a project they’re working on. A template is available here.
Topic 6: Resources and Wrap
Provide links to Primer 1, and other relevant resources.
- • Nine simple ways to make it easier to (re)use your data
- • Metadata Guide from Australian National Data Service (a simple working-level view of the needs, issues, processes around metadata collection and creation; not discipline specific)
- • Best Practices for Data Management (PDF) from DataONE: Section 5.4 (p.5)
- • Metadata Directory from Research Data Alliance - which provides a list of metadata standards used in various disciplines.