Your Data Can Live Forever: How to Plan for Data Reuse
This is a short writing/thinking exercise. Best done with a partner or small group, but can also be done alone.
Open science project leads, graduate students, and early-career researchers looking to make their data reusable.
- A data set with which you are familiar
- Pen/pencil & paper or text editor
- Data Reuse Plan Template
Data reuse saves time and accelerates the pace of scientific discovery. By making your data open and available to others, you make it possible for future researchers to answer questions that haven’t yet been asked. Thinking about data reuse in advance and documenting it, saves you time by helping you plan research processes and workflow early in the research project. Finally, this documentation makes it easier for you to defend your research... remember back to second grade when your teacher told you to “show your work”.
When it comes to making your work reusable, “the devil is in the details”. Upon completion of this exercise, you will have a detailed data reuse plan which you can save as a README or text file to store with your data files so others can understand and reuse your data. Extra bonus feature: it provides an outline for the “Methodology” section of any publications that arise from this data.
Steps to Complete
Break into groups of 2-5 people. Identify one volunteer to be the "Researcher" and describe their research data set for this exercise. This person will need to be fairly familiar with how and why the data was collected.
Identify a note taker to record responses to questions from the group about the data set.
Using the Data Reuse Plan Template as a guide, members of the group ask questions of the "Researcher" about her or his data set while the note taker records responses. The note taker can (and is encouraged) to ask questions too. As you ask questions, think about how you would (or if you could) respond to a similar question about your data set.
If you have time, upon completion of the worksheet, review your responses and make sure they would be clear to someone viewing your data set for the first time. You are writing this for someone you have never met. Avoid jargon and abbreviations where possible.
Review & Discuss
Review the following questions and be prepared to share out your responses with the larger group.
- Which parts of the template were particularly challenging? Why? What research best practices could you put into place to make it easier?
- If you weren't able to provide some of the information in the worksheet, is there a way you can get it? If not, is there something you could have done differently during your research project to collect that information?
- Are there pieces of information missing from this worksheet that would help someone understand your data and make it easier to reuse?
Data that is made easily and freely available for anyone to access, use, and share without restrictions, the possible exception being a requirement of attribution.
Information that describes, explains, locates, or in some way makes it easier to find, access, and use a resource (in this case, data). For example, metadata for a photograph may include the name of the photographer, when and where it was taken, as well as the type of camera and settings used to take the photograph.
A license gives explicit permissions for the use of something. This is particularly important if you want to make your data open as some jurisdictions assign copyrights to data sets which limit their use. There are several types of licenses that are in common use for data. You can read more about them here: http://www.dcc.ac.uk/resources/how-guides/license-research-data.
These are a set of predefined rules for the naming and structure of folders, files, field names, etc. (E.g. All files begin with a date, location and project name.) Naming conventions help provide context to a data set, as well as make sure a standard of data collection and management is being followed by all members of a team.
A permanent identifier (or PID) is a set of numbers and/or characters, frequently in the form of a URL, that points to the location of a resource. PIDs are set up in such a way that even though the storage location of the resource may change over time (e.g. moving data from one university server to another), the PID will always point to the correct location. DOI is a commonly known type of PID.
Follow-up Resources & Materials
You may find it useful to review this handout early on in the planning stages of your project to help design the workflows of your project.
The following resources are useful for more information documenting your data and research best practices to make documenting your data easier.