Data Storytelling

Data is fundamental to everything in the world today, however data is hard to understand unless you know how to look at it and what to take from it. Not everyone can see the value of data and therefore do not know what to do with it.

The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it - that’s going to be a hugely important skill in the next decades

Dr. Hal R. Varian (Google’s Chief Economist), 2009 

As data is exponentially growing. The need for companies to find people who have the required data skills to extract, analyse and visualise data is also growing. How do you create an aesthetically pleasing and useful graph? How do you translate the story in the data from numbers to a picture so everyone can understand it and use it appropriately?

Data is a precious thing and will last longer than the systems themselves

Tim Berners-Lee (Father of the Worldwide Web)

4 Components of Data Storytelling

4 Components of Data Storytelling

Data storytelling is more than just creating a graph to represent the data but a way to communicate what the data is telling you tailored to a specific audience. It is the last step to data analysis but arguably one of the most important steps.

There are 4 main components when it comes to data storytelling:

  1. Audience: Who is looking at the data? What is the data being used for?

  2. Data: The evidence to support the conclusions made and what is being presented

  3. Visuals: Illustrations as to what is happening.

  4. Narrative: The story to go with the data to weave all the information together.

Data storytelling helps you to build a connection between the data and your audience and it makes your argument more memorable.

When thinking of what to present you think of who the audience are, what they want to know and what their end goal may be. This is no different when thinking of what data to present to them and what visualisations to show, even if the data do not support what they want it is a step further in the right direction. This is why the audience is a key part of the data storytelling process.

When creating visuals there are a few quick tricks which can help your visuals become more engaging and appealing to the audience. 

  • Remove grey backgrounds: The default in R’s ggplot is the grey background, but this can make your graph appear dull, so changing it to plain white can have a better, more visually gratifying effect.

  • Gridlines: Having gridlines in your graph can be useful if used appropriately, sometimes they are just in the way. Determining whether or not the gridlines are useful is dependent on your audience and personal preference. (See examples below)

  • Labels: Adding labels into your graphs can be useful however overcrowding your graph with too many labels can be a distraction. Finding the balance is key. What are the most important dates or figures in this graph and add labels to those.

The commonality between science and art is trying to see profoundly - to develop strategies of seeing and showing.

Edward Tufte

When narrative is put together with data, the audience then has an explanation of what the data is showing and what conclusions to draw from the data. When visuals are applied to data, the audience can then see what conclusions to draw from the data and potentially see any outliers or abnormal conclusions which they would not necessarily see without the help of visual aid.

Combining data, narrative and visualisations together, you have way to engage your audience. Used correctly you have then given the data you are representing a voice and can potentially influence change and decisions.

Using the Iris data in R I have created boxplots using ggplot2 (see below). The top graph is the default settings used in ggplot and the two graphs underneath are exactly the same as the top but with a few modifications to help make the graph more visually appealing. The graph on the bottom left has gridlines and the one on the right does not. Each graph has the R code used to create them underneath.

There are more packages within R which can help you format nicer looking visuals, for example formattable is a great way to get nicer looking tables.

Data storytelling helps makes your presentation slides and data more memorable, engaging and if done correctly persuasive. Knowing what the data says is one important key but being able to communicate it effectively is another. In my opinion, data storytelling is one of the key parts of being a data science and is very important in industry today.

RplotDefault.png
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, colour = Species)) +
  geom_boxplot()
Rplot_theme_bw.png
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, colour = Species)) +
  geom_boxplot() + 
  scale_color_manual(labels = c("Setosa", "Versicolor", "Virginica"), values = c("blue", "red", "orange")) +
  xlab("Sepal Length") +
  ylab("Petal Length") +
  theme_bw()
Rplot_theme_classic.png
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, colour = Species)) +
  geom_boxplot() + 
  scale_color_manual(labels = c("Setosa", "Versicolor", "Virginica"), values = c("blue", "red", "orange")) +
  xlab("Sepal Length") +
  ylab("Petal Length") +
  theme_classic()