Although box and whisker plots, dot plots, and histograms help us see qualitative patterns in our data, they do not allow us to express this information in a quantitative way. For example, in Figure 3 and in Investigation 10 we learned that the distribution of yellow M&Ms in 1.69-oz bags is relatively similar between the three different sources, although the plot for samples purchased from Target has much shorter whiskers and the individual results seem more tightly clustered than is the case for samples purchased at CVS and at Kroger, and the box for the samples purchased from Kroger is quite a bit wider than is the case for the samples from CVS and Target.
Qualitative phrases such as “relatively similar,” “much shorter,” “more tightly clustered,” and “quite a bit wider” are, frankly, fuzzy, but in the absence of a more quantitative way to characterize our data, we have little choice but to adopt such fuzzy terms. When we summarize data, our goal is to report quantitative characteristics, or statistics, that we can use to provide clearer statements about the differences and the similarities between results for different variables, or between the results for a variable and an expected result already known to us. In this part of the case study we consider several useful statistics that we can use to summarize the data for our samples.
Investigation 15 Before we consider ways to summarize our data, we need to draw a distinction between a sample and a population. We collect and analyze samples with the hope that we can deduce something about the properties of the population. Using our data for M&Ms as an example, define the terms sample and population.