Introduction to Bivariate Data
Prerequisites
Variables,
Distributions,
Histograms, Measures of
Central Tendency, Variability,
Shape
Learning Objectives
 Define "bivariate data"
 Define "scatterplot"
 Distinguish between a linear and a nonlinear relationship
 Identify positive and negative associations from a scatterplot
Measures of central tendency, variability, and
spread summarize a single variable by providing important information
about its distribution. Often, more than one variable is collected
on each individual.
Figure 1
shows a scatter
plot of the paired ages of spouses. The xaxis represents
the age of the husband and the yaxis the age of the wife.
There are two important characteristics of the data
revealed by Figure 2. First, it is clear that there is a strong
relationship between the husband's age and the wife's age: the
older the husband, the older the wife. When one variable (Y) increases
with the second variable (X), we say that X and Y have a positive
association. Conversely, when y decreases as x increases,
we say that they have a negative
association.
Second, the points cluster along a straight line.
When this occurs, the relationship is called a linear
relationship. Not all scatter plots show linear relationships.
Scatter plots that show linear relationships between
variables can differ in several ways including the slope of the
line about which they cluster and how tightly the points cluster
about the line. A statistical measure of the strength of the relationship
between variables that takes these factors into account is the
subject of the next section.
