Course Outline

list Statistics and Data Science (ABC)

Book
  • Statistics and Data Science (ABC)

4.3 More Ways to Visualize Relationships: Point and Jitter Plots

We have learned how to make visualizations of outcome variables as a function of explanatory variables (e.g., histograms and bar graphs in a facet grid). We will learn a few more visualizations in this section.

A scatterplot is a common way to show the relationship between an outcome variable and an explanatory variable. A scatterplot will show each data point as a dot on a graph. A scatterplot in ggformula can be made with the function gf_point(). Let’s try using gf_point() to examine Thumb lengths by Sex.

gf_point(Thumb ~ Sex, data = Fingers)

A scatterplot of the distribution of Thumb by Sex in Fingers. The explanatory variable is on the x-axis, and the outcome variable is on the y-axis. Each point represents an observational unit’s values.

You can change the color of the points with the argument color (much like we did before) and the size of the points with size.

gf_point(Thumb ~ Sex, data = Fingers, color = "orange", size = 5)

A scatterplot of the distribution of Thumb by Sex in Fingers with customized color and size of the points.

The problem with these gf_point() plots is that you can’t tell when a point is on top of another point. We can jitter these points around a little so that you can see all the individual points better. We’ll use the function gf_jitter() to create a jitter plot depicting Thumb length by Sex. This function is just like making a gf_point() plot, except that the points will be a little jittered both vertically and horizontally.

gf_jitter(Thumb ~ Sex, data = Fingers)

A jitter plot of the distribution of Thumb by Sex in Fingers. Points are jittered both horizontally and vertically to avoid overlapping.

We can play with a few arguments to modify the jitter plot. As always, we can use the argument color. We can also use the argument height to change how the points get jittered vertically (i.e., a little bit up or down). In this situation, we might want the vertical jitter (height) to be set to 0 so that a point at 60 mm really is a person who has a thumb length of 60.

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", height = 0)

A jitter plot of the distribution of Thumb by Sex in Fingers with customized color and without vertical jitter of the points.

We can use width to change how the points get jittered horizontally (i.e., a little bit left or right). Height and width can be set to values between 0 and 1.

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", width = .1, height = 0)

A jitter plot of the distribution of Thumb by Sex in Fingers with customized color and horizontal jitter and without vertical jitter of the points.

If a point is in the Female column, it’s a female’s thumb length. But being more to the left or right within the female column doesn’t mean anything. The jitter is there just so the points do not overlap too much and obscure how many females have that thumb length.

In a jitter plot, a dense row of points shows that there are a lot of people with that thumb length. For instance, look at all the Female points at 60 mm. More points means more people with that particular thumb length.

Just like a scatterplot, in a jitter plot you can change the size of the points by including the argument size. You can also change the transparency of the points using the argument alpha. alpha can take values from 0 (more transparent) to 1 (more opaque).

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", 
   size = 5, alpha = .2, width = .05, height = 0)

A jitter plot of the distribution of Thumb by Sex in Fingers with customized color, size, transparency and horizontal jitter and without vertical jitter of the points. Less transparent (more opaque) areas indicate a higher frequency of data points.

Try making jitter plots for a few of the variables from the Fingers data frame. Play around with some of the arguments such as height, width, color, size, and alpha. Try making a jitter plot one way, then switching which variable is on the x-axis and which on the y-axis. Does it work both ways?

require(coursekata) MindsetMatters <- Lock5withR::MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Play around with gf_jitter gf_jitter(Thumb ~ Sex, data = Fingers) ex() %>% check_function("gf_jitter")
CK Code: ch4-7

There isn’t any restriction on what kind of variable you can put in the x- or y-axis. You can put an outcome or explanatory variable in either position. You can also put a categorical or quantitative variable in either position. For example, in this jitter plot, we have put Sex on the y-axis and Thumb length on the x-axis.

gf_jitter(Sex ~ Thumb, data = Fingers, color = "orange")

A jitter plot of the distribution of Sex by Thumb in Fingers with customized color of the points. Thumb appears on the x-axis, and Sex is on the y-axis.

Even though you can put the outcome variable anywhere, it is more common to put the outcome variable on the y-axis. We will follow that convention in our jitter plots because it conforms with what people expect, and thus makes them easier to interpret.

Responses