Introduction to Bivariate Data

Prerequisites
Variables, Distributions, Histograms, Measures of Central Tendency, Variability, Shape

Learning Objectives

  1. Define "bivariate data"
  2. Define "scatterplot"
  3. Distinguish between a linear and a nonlinear relationship
  4. Identify positive and negative associations from a scatterplot

Measures of central tendency, variability, and spread summarize a single variable by providing important information about its distribution. Often, more than one variable is collected on each individual.

Figure 1 shows a scatter plot of the paired ages of spouses. The x-axis represents the age of the husband and the y-axis the age of the wife.

Figure 1. Scatter plot showing wife age as a function of husband age.

There are two important characteristics of the data revealed by Figure 2. First, it is clear that there is a strong relationship between the husband's age and the wife's age: the older the husband, the older the wife. When one variable (Y) increases with the second variable (X), we say that X and Y have a positive association. Conversely, when y decreases as x increases, we say that they have a negative association.

Second, the points cluster along a straight line. When this occurs, the relationship is called a linear relationship. Not all scatter plots show linear relationships.

Scatter plots that show linear relationships between variables can differ in several ways including the slope of the line about which they cluster and how tightly the points cluster about the line. A statistical measure of the strength of the relationship between variables that takes these factors into account is the subject of the next section.