When you need to present numerical information, the answer to one question can help you get the right message across as effectively as possible.
That question is:
What story are you trying to tell?
And this question not only lets you express the questions you want your readers to be able to answer through your visualisations. It also supports the selection of the most appropriate visualisation technique, and even the selection of the best form of representation – in text, in a table, or as a chart.
While you can approach the definition of your question(s) by examination of your data to confirm a feeling, a better approach, I believe, is to use analytical and visualisation techniques.
This allows for the discovery of potentially interesting features in the data, which, in turn, will suggest interesting questions and opportunities.
And while you may find nothing, you may find gold. So, take the time to look and explore.
Initial examination of your data
Once you’ve collected, or otherwise obtained your data, you must review its completeness and fitness to do what you need.
The size and complexity of your data will guide the tool set you use to explore, filter, sort and search the data to establish its quality. But whether you use Microsoft Excel, Tableau, or any of the other tools out there, you should be on the lookout for the following:
Do you have all the data you need, or do you need more? Does it cover the required period? Does it include all the variables you need?
Are there any errors in the data? Does the accuracy of the data appear as expected? Are there any: incomplete or missing items; duplicate items; formatting issues; obvious unusual values or outliers?
What types of data are included? (There are many ways to classify data type, but I normally only consider numeric continuous, numeric discrete, nominal and ordinal.)
Cleaning your data
Following on from your initial examination, you need to organise and clean your data, eliminating any errors found.
Correcting formatting issues, removing duplicates and deciding how to deal with gaps caused by missing data, are some of the more common fixes you will need to apply.
Preparing your data for analysis
Now you have a complete and clean set of data, you need to carry out several steps to prepare it for analysis and presentation. Here, you may want to:
- Consider if you are looking at the data with the right level of detail? Would it be better to consolidate it further, or should you go for even greater depth?
- Consider other, associated data for context, or to help with the communication of your message.
- Remove data which you do not intend to use (or need) for your analysis.
- Abbreviate category and label text to aid readability in tables or charts.
- Split up compound variables. For example, pulling out the year from a date value.
- Merge variables to create new ones. For example, creating a whole name from first name, and surname fields.
- Create summary statistics for use in analysis. For example, percentages, averages or variances.
- Standardise or normalise your data, maybe because of variables measured at different scales? Or do you need to think about converting absolute values to percentages because you have of different totals across variables you want to compare?
Identifying the key messages
Start by looking at the raw data one more time and consider a few descriptive and statistical properties of that data.
- Can it be sorted or ranked in any meaningful order?
- What’s the range, the maximum minus the minimum values, for each numeric variable?
- Are there any outliers (unusually large or small values) compared with the rest?
- Are there any obvious trends or patterns over time? What direction do they go and how steep is the slope?
- And are there any obvious relationships in the data? Are there clusters? Are there gaps?
This gives us a good ‘feel’ for the data. And maybe a few ideas for what where we might be able to go next.
Using visual analysis to find the stories
But while you can accomplish a lot by reviewing the raw data, you should also explore your data visually to see what comparisons, trends and patterns you can identify; to learn about its shape and any internal relationships.
This is what will really help you unearth some interesting features within the data and give you the stories you can tell.
So, what characteristics in the data will lead to the identification of our key stories? What questions do the results suggest?
Comparison and proportion
The creation of a simple bar chart can often uncover useful features in the data set. This allows you to compare across values and categories to pick out the following physical attributes.
What does the range of the key variables, the differences between the highs and lows, show? Does a wide range show improvement or decline? Does a narrow range show consistency?
Are there any unusually large or small values compared with the rest of the data? If yes, they may skew your interpretations.
Sorting your data can help identify the large, medium, and small values.
Are the proportions of categories within variables meaningful? Is the contribution of individual items to the whole significant?
How do the actual values compare against averages, standard deviations, targets, and forecasts?
A Histogram can be used to show the distribution of a variable’s values. What does the shape of the distribution of each key variable suggest? Is the data symmetrical, or is it skewed? Is it random, or uniform? Are there gaps or clusters?
Trends and patterns
Does the data vary over time? If so, use a line chart to uncover patterns and trends.
Are values increasing, decreasing, or do they remain pretty flat? Is there a cross-over in two or more variable values (indicating an important change in their relationship)?
Rate of change
How quickly do any changes occur? Are the changes fairly linear, or are they more exponential?
Are there consistent patterns or is there significant variation? Is there a cyclical element, for example seasonality? Is the pattern more random?
Are the patterns we see meaningful, or do they simply represent noise within the data?
Relationships and connections
For many data sets, you may also want to see if there are strong relationships between variables. So, creating a quick scatter plot may well be worthwhile at this stage.
Clusters and gaps
Is there evidence of the data falling into clusters? Is any bunching the result of another variable (for example, different locations or processes)? Are there gaps in the values? Can you establish an underlying reason?
Can you see any significant values that do not fit the norm and change the dynamics of a variable’s range?
Are there any correlations (strong or weak) between two or more variable? What do such correlations suggest?
Is there any relevance to the composition and distribution of the data’s categories and subcategories?
What about the charts?
Remember, the graphs and charts you create at this stage are exploratory. They are meant for you (and maybe your team) only. So, you don’t really need to worry about their presentation. (Unlike later when you present the data to others – in which case you need to consider the most appropriate representation, declutter and format for explanation.)
And, the chart types I suggest above are just a few from the large gallery of options you have available. You can learn more about this from other articles on this website, or by taking my course ‘Communicating Numbers: Effective visualisation of data’ on Udemy.com.
The next step
Answer all the questions above, and you should have a better idea of the stories your data can tell?
Do you feel the key message(s) can be gained from a simple comparison between two items? Or would it be better served by showing the contribution of individual items to the whole? Is the change over time relevant? Or, is it about relationships?
Just remember to seek out those questions and stories before attempting any explanatory visualisation that you intend to show to others.
Your reports, presentations and other representations will all be more effective for it.