Statistics and Data Science: A Modeling Approach

4.3 More Ways to Visualize Relationships: Point and Jitter Plots

We have learned how to make visualizations of outcome variables as a function of explanatory variables (e.g., histograms and bar graphs in a facet grid). We will learn a few more visualizations in this section.

A scatterplot is a common way to show the relationship between an outcome variable and an explanatory variable. A scatterplot will show each data point as a dot on a graph. A scatterplot in ggformula can be made with the function gf_point(). Let’s try using gf_point() to examine Thumb lengths by Sex.

gf_point(Thumb ~ Sex, data = Fingers)

Scatterplot of thumb lengths by sex

You can change the color of the points with the argument color (much like we did before) and the size of the points with size.

gf_point(Thumb ~ Sex, data = Fingers, color = "orange", size = 5)

Same scatterplot of thumb lengths by sex as above, but with larger points in orange color

The problem with these gf_point() plots is that you can’t tell when a point is on top of another point. We can jitter these points around a little so that you can see all the individual points better. We’ll use the function gf_jitter() to create a jitter plot depicting Thumb length by Sex. This function is just like making a gf_point() plot, except that the points will be a little jittered both vertically and horizontally.

gf_jitter(Thumb ~ Sex, data = Fingers)

Jitter plot of Thumb length by Sex

We can play with a few arguments to modify the jitter plot. As always, we can use the argument color. We can also use the argument height to change how the points get jittered vertically (i.e., a little bit up or down). In this situation, we might want the vertical jitter (height) to be set to 0 so that a point at 60 mm really is a person who has a thumb length of 60.

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", height = 0)

Jitter plot of Thumb length by Sex with orange-colored points

We can use width to change how the points get jittered horizontally (i.e., a little bit left or right). Height and width can be set to values between 0 and 1.

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", width = .1, height = 0)

Jitter plot of Thumb length by Sex showing points more distinctly clustered to vertical male and female lines

If a point is in the Female column, it’s a female’s thumb length. But being more to the left or right within the female column doesn’t mean anything. The jitter is there just so the points do not overlap too much and obscure how many females have that thumb length.

In a jitter plot, a dense row of points shows that there are a lot of people with that thumb length. For instance, look at all the Female points at 60 mm. More points means more people with that particular thumb length.

Just like a scatterplot, in a jitter plot you can change the size of the points by including the argument size. You can also change the transparency of the points using the argument alpha. alpha can take values from 0 (more transparent) to 1 (more opaque).

gf_jitter(Thumb ~ Sex, data = Fingers, color = "orange", 
   size = 5, alpha = .2, width = .05, height = 0)

Similar Jitter plot of Thumb length by Sex as above, now with larger and more transparent points

Try making jitter plots for a few of the variables from the Fingers data frame. Play around with some of the arguments such as height, width, color, size, and alpha. Try making a jitter plot one way, then switching which variable is on the x-axis and which on the y-axis. Does it work both ways?

require(tidyverse) require(mosaic) require(Lock5withR) require(supernova) MindsetMatters <- Lock5withR::MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Play around with gf_jitter gf_jitter(Thumb ~ Sex, data = Fingers) ex() %>% check_function("gf_jitter")
DataCamp: ch4-7

There isn’t any restriction on what kind of variable you can put in the x- or y-axis. You can put an outcome or explanatory variable in either position. You can also put a categorical or quantitative variable in either position. For example, in this jitter plot, we have put Sex on the y-axis and Thumb length on the x-axis.

gf_jitter(Sex ~ Thumb, data = Fingers, color = "orange")

Jitter plot of Thumb length by Sex, with Sex on y-axis and Thumb length on x-axis

Even though you can put the outcome variable anywhere, it is more common to put the outcome variable on the y-axis. We will follow that convention in our jitter plots because it conforms with what people expect, and thus makes them easier to interpret.