Introduction to Bivariate Data
Histograms, Measures of
Central Tendency, Variability,
- Define "bivariate data"
- Define "scatterplot"
- Distinguish between a linear and a nonlinear relationship
- Identify positive and negative associations from a scatterplot
Measures of central tendency, variability, and
spread summarize a single variable by providing important information
about its distribution. Often, more than one variable is collected
on each individual.
shows a scatter
plot of the paired ages of spouses. The x-axis represents
the age of the husband and the y-axis the age of the wife.
|Figure 1. Scatter plot showing wife age
as a function of husband age.
There are two important characteristics of the data
revealed by Figure 2. First, it is clear that there is a strong
relationship between the husband's age and the wife's age: the
older the husband, the older the wife. When one variable (Y) increases
with the second variable (X), we say that X and Y have a positive
association. Conversely, when y decreases as x increases,
we say that they have a negative
Second, the points cluster along a straight line.
When this occurs, the relationship is called a linear
relationship. Not all scatter plots show linear relationships.
Scatter plots that show linear relationships between
variables can differ in several ways including the slope of the
line about which they cluster and how tightly the points cluster
about the line. A statistical measure of the strength of the relationship
between variables that takes these factors into account is the
subject of the next section.