Suppose we are interested in characterizing 1.69-oz (47.9-g) packages of plain M&Ms. We obtain 30 bags (ten from each of three stores) and, for each bag, report the number of blue, brown, green, orange, red, and yellow M&Ms—for yellow, the number in parentheses is the number of yellow M&Ms in the first five drawn from the bag—and their combined net weight. Table 2 summarizes the data for the last six samples. The full set of data for all 30 samples is available as a separate spreadsheet or R file.
bag id | store | blue | brown | green | orange | red | yellow | net weight (g) |
---|---|---|---|---|---|---|---|---|
25 | CVS | 07 | 13 | 00 | 04 | 15 | 16 (2) | 48.212 |
26 | Target | 06 | 15 | 01 | 13 | 10 | 14 (1) | 51.682 |
27 | CVS | 05 | 17 | 06 | 04 | 08 | 19 (1) | 50.802 |
28 | Kroger | 01 | 21 | 06 | 05 | 19 | 14 (0) | 49.055 |
29 | Target | 04 | 12 | 06 | 05 | 13 | 14 (2) | 46.577 |
30 | Kroger | 15 | 08 | 09 | 06 | 10 | 08 (1) | 48.317 |
Having collected some data, our next step is to examine it for possible problems, such as missing values or errors introduced when we recorded the data, or to identify important variables and interesting patterns or trends within or between these variables. Although this information is embedded within the data itself, often it is difficult to see it when the data is displayed as a table, particularly if the data set is large in size. Instead, we use one or more simple visualizations of the data.
Two simple visualizations are box and whisker plots and dot plots, examples of which are shown in Figure 1 using the data for yellow M&Ms. Note that neither plot has meaningful information along the y-axis as the vertical dimension simply helps us visualize the data. The vertical distribution of points in the dot plot, for example, is the result of jittering, which offsets samples that share a com-mon value so that, we hope, each appears as a distinct point.
Investigation 6. Use the dot plot in Figure 1 to deduce the general structure of a box and whisker plot, giving particular attention to the position along the x-axis of the three vertical lines that make up the yellow box and the two vertical lines that make up the whiskers on either side of the yellow box. You might begin by tabulating the number of samples that fall to the left of the box, that fall within the box, including its boundaries, and that fall to the right of the box, and the number of samples that lie to the left and to the right of line inside the box.