Ah, statistics. If ever there was a topic guaranteed to turn off an audience, this would be among the favourites. But it’s important, so I’ll persevere since someone, somewhere, might find this little primer useful.

But before I go into the detail of the what and the why, it’s worth making two points about statistics. The first point is that statistics is the worst taught subject in the education system. This is simply because those teaching the subject have neither the aptitude nor the enthusiasm necessary. The second point is that statistics is the best tool we have to see the world for how it really is, rather than as the comforting, but false, picture that our brain paints for us.

As learning and development practitioners become more focused on performance, and identifying ways to improve performance, it follows that we are going to be working with performance data more often.

This presents some challenges and requires those working in L&D to develop their statistical knowledge and skills. We don’t need to be experts, but we do need to know the common pitfalls that people naturally fall into and the right questions to ask when we want to establish whether a workplace learning or performance intervention has had a real world impact, or if an intervention is even necessary.

Our internal error machine

Our intuitive reaction to some information is incredibly accurate, especially in those areas where we have genuine expertise. However, understanding the true nature of cause and effect, especially when presented in the form of numerical information is not one of those areas.

Science, and especially the medical profession, has developed tools to overcome our innate psychological problems in this area. Unfortunately, it’s mostly people who have received scientific or statistical training who are equipped with these tools; not the natural catchment for those working in L&D.

To someone who hasn’t gone through that kind of training, it can seem like the person who insists it’s necessary to find/use a control group, or that a larger sample is needed, is causing unnecessary additional work. And that’s quite simply because what makes sense to us intuitively is so powerful. What feels right in these areas is so often wrong that having a basic grounding in statistics can help prevent you from making mistakes and your organisation from wasting money.

Starting out

To start the ball rolling for anyone wanting to develop their statistical knowledge, I thought I’d put together a list of resources worth consulting. These aren’t dry textbooks, but blogs, articles and book chapters from engaging and passionate writers who can help get you started on the road to statistical confidence.

Nathan Green’s S Word is a series in the Guardian that aims to demystify the basic tools of statistics and explain how to use them: http://www.guardian.co.uk/science/series/nathan-green-statistics.

Michael Blastland’s Go Figure column on the BBC’s News website takes news stories and explains how the reality is more complicated than what’s reported. It’s a brilliant way of introducing critical statistical thinking through familiar stories.

Several chapters in Ben Goldacre’s book Bad Science use familiar medical examples to explain complex statistical concepts like ‘regression to the mean’ and ‘confounding variables’ in straightforward language. The following chapters are particularly valuable:

  • Chapter 4: Homeopathy
  • Chapter 8: ‘Pill Solves Complex Social Problem’
  • Chapter 12: Why Clever People Believe Stupid Things
  • Chapter 13: Bad Stats

Daniel Kahneman’s book Thinking, Fast and Slow (which I will admit to being almost obsessively impressed with) covers ‘regression toward the mean’ and ‘the law of small numbers’ in an engaging and easy to understand way.

Super Crunchers by Ian Ayres gives real life examples of where data has been used to make predictions and improve business by overturning long-held beliefs. In it he covers the key techniques that can be used to interrogate data along with some key principles such as the dreaded standard deviation. By giving context to these concepts, he makes it easier to understand how they can be used in everyday business.

To help with what to focus on initially, concepts that you should understand in order to truly understand performance data are:

Statistical significance (p-values)

One of the most surprising lessons you learn when you start learning about the statistical nature of our world is just how many things can occur simply by chance. So many events that we think are remarkable, or ‘must mean something’ are simply the result of random variation. Significance is a way of determining how likely it is that an effect occurred simply by chance alone. Why is this important for us to understand? Well, let’s say that a sales training programme resulted in sales improving by 12%. Calculating the significance of that result will tell us how likely that increase is to have occurred simply through natural variation in sales performance.

Correlation (is not causation)

This is the most famous term to come out of statistics, and is simply a mantra against the most common mistake that people make when interpreting data, which is to assume that just because two things happen together, that one causes the other. It’s so easy to make that mistake, because it’s often true. But terrible mistakes can be made, and have been, by getting this wrong.

Post hoc ergo propter hoc

Literally meaning ‘after [this], therefore because of [this]’, post hoc ergo propter hoc describes the temptation to assume that just because an event (such as an improvement in sales performance) follows another event (such as the running of some form of sales training) that the second event occurred because of the first, rather than because of other explanations such as random variation or ‘regression to the mean’. Which brings us to …

Regression to the mean

Someone starts to perform very poorly. Their manager gives them a stern talking to and their performance improves again. The manager pats themself on the back for a job well done; their talking to did the trick. Right?

Regression to the mean is probably the most difficult concept to truly understand, but it’s important if performance data is to be understood properly.

The most famous example of regression to the mean is the so called ‘Sports Illustrated curse’, which suggests that sportswomen and men who appear on the cover of the magazine are jinxed in some way so that the run of great performance that led them to get on the cover mysteriously ends.

Sample sizes, confidence level and confidence interval

You want your evaluation results to be credible. So, if you have a learning solution that 1000 people participate in, how many people’s performance should you be measuring before and after the learning intervention so you can be confident in your results? Well, thankfully there are calculators out there for that, and understanding exactly what a survey or sample data is, and is not, telling you is an incredibly important skill to master.

The law of small numbers (or hasty generalisation)

The regions in which the sales penetration of a brewing company’s beer are highest are mostly rural, sparsely populated and located in traditionally Conservative voting counties in England. How would you explain this?

In the few seconds you thought about that you formulated hypotheses, rejected many of them and may well have settled on the explanation that perhaps the company’s brand plays better in rural areas than urban areas, or that there was no simple explanation for this based on the information given.

However, the key information is that the regions are sparsely populated, which means that random variation is more likely to throw up extreme results. Understanding the incredible impact that small samples in larger data sets can have is incredibly important in order to avoid drawing the wrong conclusions about the causes of excellent or poor performance.

The very best method for thoroughly analysing data ever

I left this to the end, almost as a reward if you made it this far! Reading the blogs, columns and books I’ve recommended and understanding the concepts I’ve highlighted is a good start. But there’s an incredibly valuable technique you can use to make sure you’ve truly understood what the data is telling you, whether that’s data you’re using as part of a performance analysis or the evaluation of a learning intervention: get help from someone who’s already got the skills.

You’ll find them in finance, business analysis, R&D, marketing or IT. Nearly all organisations have experts in analysing data who would be able to help out and make sure that you’re not missing anything obvious. Being from another area, they may also have an alternative viewpoint that could prove useful. Don’t be a martyr and struggle on with that Excel sheet for hours when a colleague in another department might be able to work out that tricky standard deviation in five minutes.