Distributions
Prerequisites
Variables
Learning Objectives
- Define "distribution"
- Interpret a frequency distribution
- Distinguish between a frequency distribution and a probability distribution
- Construct a grouped frequency distribution for a continuous
variable
- Identify the skew of a distribution
- Identify bimodal, leptokurtic, and platykurtic distributions
Distributions of Discrete variables
Table 1 is called a "frequency
table" and it describes the distribution of M&M color
frequencies. Not surprisingly, this kind of table is called a
frequency
distribution. Often a frequency distribution is shown graphically
as in Figure 1.
We call Figure 2 a probability
distribution because if you chose an M&M at random, the
probability of getting, say, a brown M&M is equal to the proportion
of M&M's that are brown (0.30).
Notice that the distributions in Figures 1 and 2
are not identical. Figure 1 portrays the distribution in a sample
of 55 M&M's. Figure 2 shows the proportions for all M&M's.
Chance factors involving the machines used by the manufacturer
introduce random variation into the different bags produced. Some
bags will have a distribution of colors that is close to Figure
2; others will be farther away.
Continuous Variables
The variable "color of M&M" used
in this example is a discrete
variable, and its distributions is also called discrete.
Let us now extend the concept of a distribution to continuous
variables. With continuous variables, no two scores will
be exactly the same and thus all frequencies will be 1. The
solution to this problem is to create a grouped
frequency distribution. In a grouped frequency
distribution, scores falling within various ranges are tabulated.
Probability Densities
Distributions for continuous variables are called
probability
density distributions. Some probability densities
have particular importance in Statistics. A very important
one is shaped like a bell, and called the normal
distribution. Many naturally-occuring phenomena
can be approximated surprisingly well by this distribution.
It will serve to illustrate some features of all continuous
distributions.
An example of a normal distribution is shown in Figure
3. The Y axis in the normal distribution represents the "
density of probability." Intuitively, it shows the chance
of obtaining values near corresponding points on the X axis.
In Figure 3, for example, the probability of an observation
with value near 40 is about half of the probability of an observation
with value near 50. Although this text does not discuss the
concept of probability density in detail, you should keep the
following ideas in mind about the curve that describes a continuous
distribution (like the normal distribution). First, the area
under the curve equals 1. Second, the probability of any exact
value of X is 0. Finally, the area under the curve and bounded
between two given points on the X axis is the probability that
a number chosen at random will fall between the two points.
Shapes of Distributions
Distributions have different shapes; they don't
all look like the normal distribution in Figure 3. For example,
the normal probability density is higher in the middle compared
to its two tails. Other distributions need not have this feature.
There is even variation among the distributions that we call "normal."
For example, some normal distributions are more spread out than
the one shown in Figure 3 (their tails begin to hit the X axis
further from the middle of the curve --for example, at 10 and
90 if drawn in place of Figure 2). Others are less spread out
(their tails might approach the X axis at 30 and 70). More information
on the normal distribution can be found in a later chapter
completely devoted to them.
The normal distribution shown in Figure 3 is symmetric;
if you folded it in the middle, the two sides would match perfectly.
Figure 4 shows the discrete distribution of scores on a psychology
test. This distribution is not symmetric: the tail in the positive
direction extends further than the tail in the negative direction.
A distribution with the longer tail extending in the positive
direction is said to have a positive
skew. It is also described as "skewed to the
right."
Figure 5 shows the salaries of major league baseball
players in 1974 (in thousands of dollars). This distribution has
an extreme positive skew.
Although less common, some distributions have
negative
skew. Figure 6 shows the scores on a 20-point problem
on a statistics exam. Since the tail of the distribution extends
to the left, this distribution is skewed
to the left.
The distributions shown so far all have one distinct
high point or peak. The distribution in Figure 8 has two distinct
peaks. A distribution with two peaks is called a bimodal
distribution.
Distributions also differ from each other in terms
of how large or "fat" their tails are. Figure 9 shows
two distributions that differ in this respect. The upper distribution
has relatively more scores in its tails; its shape is called leptokurtic.
The lower distribution has relatively fewer scores in its tails;
its shape is called platykurtic
.
|