|>
iris ggplot(
aes(
x = Sepal.Length,
y = Petal.Width
)+
) geom_point() +
coord_fixed(ratio = 1)
In a nutshell
Filippo Gambarota
Last modified: 2025-09-29
Hardwicke et al. (2022) estimated the open science practices in 250 psychology articles published between 2014 and 2017.
Curating and sharing data, scripts, materials is an important practice that improves the scientific community as a whole.
But, what are the short-term benefits of changing our workflow focusing on reproducibility?
Especially related to data analysis and writing, some important features are:
Nuijten et al. (2016) analyzed reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013. More than 50% have reporting and statistics errors.
Bakker & Wicherts (2011) analyzed 281 articles where around 18% of statistical results are incorrectly reported.
Thus, how to improve the way we produce scientific papers?
Usually we work with two (or more) elements:
The problematic aspect of this approach, is that we have to manually combine elements into the final document. This manual merge reduces the reproducibility and increases the probability of making errors.
There are roughly 4 elements in a scientific document:
ggplot2
in R, etc.) or imported from other sources (e.g., the picture of your equipments, the experimental paradigm, etc.)In Microsoft Word or Google Docs, the elements are usually combined in a uniform and coherent writing experience. You can write, add styling (bold, italic, etc.), add/remove images, create tables, etc.
In Microsoft Word or Google Docs, the elements are usually combined in a uniform and coherent writing experience. You can write, add styling (bold, italic, etc.), add/remove images, create tables, etc.
This is is usually called a What you see is what you get (WYSWYG) approach. MSWord (or Google Docs) is called a WYSWYG editor.
Using Google Docs or Word is not clear how the text is formatted (we just press “bold”). But the idea is roughly the following.
We need to choose some tags thus some text and/or symbols that the software will recognize and intepret these tags not as text but as special elements.
The system of tags is called markup language that define how the plain text need to be formatted.
We can open a very simple HTML document and see the source code.
Apparently (beyond learning nerdy stuff) there is no advantage of using a markup language such as HTML or LaTeX for standard documents. In fact there are some problems:
There are several advantages (trust me):
Markdown is a very simple and readable markup language that is nowadays used by a lot of software and services. Is (out of the box) less flexible compared to HTML or LaTeX but you can learn it in 20 minutes (really) and also naive people can read markdown documents.
You can see an example of Markdown document. The final result can be easily see here.
The real power of markdown is the ability to compile the document into basically everything such as Word, HTML, PDF, etc. There is a software called pandoc
that given an .md
file can create different type of documents.
For example, using the pandoc
command line:
Basically, pandoc
will take the .md
file as a general input, convert the document into the plain-text target document (e.g., .tex
or .html
) and then compile the document creating the output.
In standard WYSIWYG editors, the figures are external files embedded into the document. For updating the figure we need to re-create it externally and then re-importing into the document.
But in essence, if you use a programming language (e.g., MATLAB, Python or R), the figure is no longer the output file but some lines of code.
What about having a method that no longer interpret the markup but also interpret a programming language? Thus we no longer produce the figure externally but we include directly in the document the code to produce some elements of the final result (e.g., a figure or a table).
This is the exact definition of literate programming. There are several tools but the most famous one are:
The idea of Quarto is writing a unique *.qmd
file. The file is compiled and using the knitr
package (if using R) the R code is evaluated. The text is converted to the required format (e.g., latex
if output is pdf
) from markdown
. The code output is included into the e.g. tex
or html
. Finally, the document is compiled into the output format.