Welcome to Mozilla Science Lab's Open Data Primers!


Where to Share Your Data

Sharing your data means more than putting it up on your personal or departmental website. There are several questions you should ask yourself… and the people you are putting in charge of the care and feeding of your data.

1. Will people be able to find it?

Think about the last time you tried to find a book in a bookstore or in a library. Maybe you used an online catalog which told you where to look. Perhaps you scanned the shelves where books on that topic are usually located. You also may have talked to a librarian or bookstore staff who pointed you in the right direction. She may have given you a call number that took you exactly to where the book was located on a shelf. Believe it or not, the same techniques are used when people are looking for data.

Make sure the site where you store your data is indexed by search engines like Yahoo, Google, or whichever one you use. A data repository is much more likely to be indexed than your personal or departmental website. It is also more likely to be listed at the top of the search results from a search engine. Various studies have indicated that 75-90% of people don’t click on links past the first page of results.

With all the websites in the world (over 1 BILLION as of 2014), the likelihood of someone who doesn’t know you going to your website to look for data on a topic is pretty small. It is much more likely that they would go to a place where they know information on that topic is usually kept, such as a data repository. There are many different kinds of data repositories: institutional, subject-based, format-based, etc. Many data repositories will provide you with a permanent identifier (See the Glossary in Primer 2 for a definition of "permanent identifier") that you can share in your publications, grant proposals, or through emails and social media for finding and citing your data.

2. Will people be able to understand it?

In our primer on How to Open Your Data, we demonstrated the importance of describing your data to make it understandable and reusable by others. It doesn’t do any good to document your data if you don’t have a way to associate that documentation with your data wherever you store it. Make sure wherever you decide to share your data, there is the ability to associate descriptor fields (usually called metadata fields) with your data set. You should also be able to keep supplementary files (such as a Data Reuse Plan and README) with your data files to make them more easily understandable. Metadata fields also make it easier for search engines (and other people) to discover your data set.

3. How long will it be available?

As anyone who has had to maintain a website knows, just putting information on the web does not mean it will be available forever. It costs money to maintain online services. The easy answer may seem like you should put it on your departmental website where it becomes the institution’s responsibility to maintain it, but will they? What happens if you leave your institution? Will they still keep your research website going? If you decide to go with an established data repository, check to see what their long-term sustainability plan is. A site that is freely available today, may not be free tomorrow or even exist tomorrow, or in five years, or ten. Consider checking to see if a site participates in CLOCKSS (a partnership between libraries and publishers to develop a distributed dark archive for online content).

4. Will the data be taken care of?

Finally, wherever you choose to share your data, you will want to see what sort of preservation plan is in place for the long-term care of your data. This goes beyond just keeping the site operational. Is there a plan for maintaining file integrity over time? If you share through a repository, will they be responsible for updating the file formats of the data as older versions are no longer supported, or does that responsibility fall on you? This is where an institutional repository may best serve you as they have a vested interest in making sure your research data is available and accessible for the long-term.