Introduction to Data and to Data Analysis

At the heart of any analysis is data. Sometimes our data is categorical and sometimes it is numerical; sometimes our data conveys order and sometimes it does not; sometimes our data has an absolute reference and sometimes it is has an arbitrary reference; and sometimes our data takes on discrete values and sometimes it takes on continuous values. Whatever its form, when we gather data our intent is to extract from it information that can help us solve a problem. In this case study we consider how to find meaning in data, including ways to describe data, to visualize data, to summarize data, to model data, and to draw conclusions from data.

If we are to consider how to describe, to visualize, to summarize, to model, and to draw conclusions from data, then we need some data with which we can work. For the purpose of this case study, we need data that is easy to gather and easy to understand, and that allows us to ask interesting questions; it is helpful, as well, if we can find expected results for at least some of our questions so that we can check our analysis. It also is helpful if you can gather your own data so that you can repeat and verify our work, or so that you can extend our analysis. A simple system that meets these criteria is to analyze the contents of bags of M&Ms. There is a rich history of using M&Ms to introduce or to illustrate the analysis of data in a variety of disciplines; Appendix 1 provides examples of such studies. Although this system may seem trivial, keep in mind that reporting the percentage of yellow M&Ms in a bag is analogous to reporting the concentration of Pb2+ in a sample of soil: both express the amount of an analyte present in a unit of its matrix.

Interspersed within the case study’s narrative are a series of investigations, each of which asks you to stop and consider one or more important issues. Some of these investigations include data for you to analyze, created using plot.ly. The image below

shows the tools for interacting with the data, which are available when cursor enters the figure; from left-to-right, the tools are:

  1. zoom by clicking and dragging within the figure
  2. pan from side-to-side by clicking and dragging
  3. zoom in
  4. zoom out
  5. autocale (returns figure to original magnification; double-clicking within a figure also autoscales the data)
  6. show closest on hover (provides x-axis and y-axis values for one data set)
  7. compare data on hover (provides x-axis and y-axis values for all data sets)
  8. link to plot.ly

Some figures include data for multiple analytes or data sets and, as a consequence, include a legend; clicking on an analyte's or a data set's name in the figure's legend toggles on and off the display of the corresponding data.