pi3: Open Science Essentials: Reproducibility

Tuesday, 2 January 2018

Open Science Essentials: Reproducibility

Open science essentials in 2 minutes, part 3

Let’s define it this way: reproducibility is when your experiment or data analysis can be reliably repeated. It isn’t replicability, which we can define as reproducing an experiment and subsequent analysis and getting qualitatively similar results with the new data. (These aren’t universally accepted definitions, but they are common, and enough to get us started).

Reproducibility is a bedrock of science – we all know that our methods section should contain enough detail to allow an independent researcher to repeat our experiment. With the increasing use of computational methods in psychology, there’s increasing need – and increasing ability – for us to share more than just a description of our experiment or analysis.

Reproducible methods

Using sites like the Open Science Framework you can share stimuli and other materials. If you use open source experiment software like PsychoPy or Tatool you can easily share the full scripts which run your experiment and people on different platforms and without your software licenses can still run your experiment.

Reproducible analysis

Equally important is making your analysis reproducible. You’d think that with the same data, another person – or even you in the future – would get the same results. Not so! Most analyses include thousands of small choices. A mis-step in any of these small choices – lost participants, copy/paste errors, mis-labeled cases, unclear exclusion criteria – can derail an analysis, meaning you get different results each time (and different results from what you’ve published).

Fortunately a solution is at hand! You need to use analysis software that allows you to write a script to convert your raw data into your final output. That means no more Excel sheets (no history of what you’ve done = very bad – don’t be these guys) and no more point-and-click SPSS analysis.

Bottom line: You must script your analysis – trust me on this one

Open data + code

You need to share and document your data and your analysis code. All this is harder work than just writing down the final result of an analysis once you’ve managed to obtain it, but it makes for more robust analysis, and allows someone else to reproduce your analysis easily in the future.

The most likely beneficiary is you – you most likely collaborator in the future is Past You, and Past You doesn’t answer email. Every analysis I’ve ever done I’ve had to repeat, sometimes years later. It saves time in the long run to invest in making a reproducible analysis first time around.

Further Reading

Nick Barnes: Publish your computer code: it is good enough

British Ecological Society: Guide to Reproducible Code

Gael Varoquaux : Computational practices for reproducible science

Advanced

Reproducible Computational Workflows with Continuous Analysis

Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research

Part of a series for graduate students in psychology.
Part 1: pre-registration.
Part 2: the Open-Science Framework.
Part 3: Reproducibility