CourseKata - 6.6 Interpreting and Using Z-Scores

6.6 Interpreting and Using Z-Scores

How Z-Scores Are Different From Standard Deviation

Standard deviation (SD) is roughly the average deviation of all scores from the mean. It can be seen as an indicator of the spread of the distribution. A z-score uses SD as a sort of ruler for measuring how far an individual score is above or below the mean.

A z-score tells you how many standard deviations a score is from the mean of its distribution, but doesn’t tell you what the standard deviation is (or what the mean is). Another way to think about it is that a z-score is a way of comparing a deviation of a score (the numerator) to the standard deviation of the distribution (the denominator).

Let’s use z-scores to help us make sense of our Thumb data. Calculate the z-score for a 65.1 mm thumb.

require(coursekata)

# this saves the mean and standard deviation of Thumb
mean <- mean(Fingers$Thumb)
sd <- sd(Fingers$Thumb)

# write code to calculate the z-score for a 65.1 mm Thumb

mean <- mean(Fingers$Thumb)
sd <- sd(Fingers$Thumb)
(65.1 - mean) / sd

ex() %>% {
    check_output_expr(., "(65.1 - mean) / sd")
}

CK Code: B2_Code_Using_01

0.572534942855165

A single z-score tells us how many standard deviations away this particular 65.1 mm thumb is from the mean. Because the standard deviation is roughly the average distance of all scores from the mean, it is likely that most scores are clustered between one standard deviation above and one standard deviation below the mean. It is less likely to find scores that are two or three standard deviations away from the mean. Z-scores give us a way to characterize scores in a bit finer way than just bigger or smaller than the mean.

Using Z-Scores to Compare Scores From Different Distributions

One more use for the z-score is to compare scores that come from different distributions, even if the variables are measured on different scales.

A histogram of the distribution of score with a vertical line in blue indicating the mean, and another vertical line in red indicating our friend’s score.

Here’s the distribution of scores for all players of the video game Kargle again. We know that the distribution is roughly normal, the mean score is 35,000, and the standard deviation is 5,000.

Her z-score is +2. Wow, two standard deviations from the mean! Not a lot of scores are way up there.

Now let’s say you have another friend who doesn’t play Kargle at all. She plays a similar game, though—Spargle! Spargle may be similar, but it has a completely different scoring system. Although the scores on Spargle are roughly normally distributed, their mean is 50, and the standard deviation is 5. This other friend has a high score of 65 on Spargle.

Now: what if we want to know which friend, in general, is a better gamer? The one who plays Kargle, or the one who plays Spargle? This is a hard question, and there are lots of ways to answer it. The z-score provides one way.

We’ve summarized the z-scores for your two friends in the table below.

Player	Player Score	Game Mean	Game SD	Player Z-Score
Kargle Player	45,000	35,000	5,000	+2.0
Spargle Player	65	50	5	+3.0

Looking at the z-scores helps us to compare the abilities of these two players, even though they play games with different scoring systems. Based on the z-scores, we could say that the Spargle player is a better gamer, because she scored three standard deviations above the mean, compared with only two standard deviations above the mean for the Kargle player.

Of course, nothing is really definite with such comparisons. Someone might argue that Spargle is a much easier game, and so the people who play it tend to be novices. Maybe the Kargle player is better, because even though her z-score is lower, she is being compared to a more awesome group of gamers!

6.5 Z-Scores 6.7 Modeling the Shape of the Error Distribution