Welcome to Mozilla Science Lab's Study Group Orientation!


4.3 Reproducibility

You’re preparing your research for publication and the temptation may be to focus on the results and discussion sections of your paper-- after all, that’s what will make the biggest splash! But consider how to use publication to make your work reproducible, so that other researchers successfully recreate your results using your data, code and methods. (Reproducing the results of a study is a bit different than replicating a study, where another researcher uses your methods and your code but collects or generates a new data set. Both replication and reproduction are things another researcher may try to verify the results of a published study. For more on the reproducibility versus replicability, see “A Statistical Definition for Reproducibility and Replicability,” by Patil et al.)

By making your work reproducible, you:

  • Increase the usefulness of your research by enabling others to easily build on your results, and re-use your research materials
  • Ensure validity and trust in your results, and help to support the validity of future studies that are based on your work
  • Increase accuracy, trust, and confidence in your field broadly.

Publishing studies that can be reproduced or replicated may seem like a no-brainer. But it’s not an inevitable outcome of every publication. In 2012, cancer researchers Begley and Ellis published a comment in the journal Nature, called “Drug development: Raise standards for preclinical cancer research.” The article describes a crisis in the quality of scientific literature in cancer research. Working over a period of 10 years, Begley and his team at Amgen labs attempted to replicate the results of 53 known “landmark” studies in the field, but were only able to confirm results in 6 of those studies (11%).

Some of these non-replicable studies had resulted in hundreds of secondary publications, building on unconfirmed results and likely leading to the development and eventually the testing of ineffective drugs in cancer patients. Certainly, drug development is a complex problem, with models and technologies that are challenging to work with. But the intense pressure to publish early and often can result in the submission of studies without the level of documentation that allows for either reproduction or replication of results, and doesn’t tell the full story of the research. A glance at the website Retraction Watch, a project of the Center for Scientific Integrity, shows that the problem of publishing unverifiable results isn’t confined to oncology research. For one perspective on how this plays out in different fields, see Roger Peng’s blog post "A Simple Explanation for the Replication Crisis in Science".

In their comment, Begley and Ellis call for more rigorous documentation practices, such as the inclusion of all experimental methods and data from all trials of a given drug in a published paper about that drug-- not just the few trials that succeeded. A truly reproducible study should contain a complete narrative of the research, and include well-documented methods, code, and data.

There are a number of tools and practices that can help you tell a coherent research story, without gaps or fuzzy areas. See biostatistician Karl Broman’s terrific tutorial, “Initial Steps Toward Reproducible Research” for more on how you might get started. Another great resource is anthropologist Ben Marwick’s presentation “Reproducible Research: A View from the Social Sciences.” As mentioned in the introduction to this section, you don’t have to adopt every best practice in reproducibility at once! Find the ones that seem most promising for your work, and give them a try.