Distributions

Prerequisites
Variables

Learning Objectives

Define "distribution"
Interpret a frequency distribution
Distinguish between a frequency distribution and a probability distribution
Construct a grouped frequency distribution for a continuous variable
Identify the skew of a distribution
Identify bimodal, leptokurtic, and platykurtic distributions

Distributions of Discrete variables

Table 1. Distribution of colors.

Color

Frequency

Brown
Red
Yellow
Green
Blue
Orange
17
18
7
7
2
4

Table 1 is called a "frequency table" and it describes the distribution of M&M color frequencies. Not surprisingly, this kind of table is called a frequency distribution. Often a frequency distribution is shown graphically as in Figure 1.

Figure 1. Distribution of 55 M&M's.

We call Figure 2 a probability distribution because if you chose an M&M at random, the probability of getting, say, a brown M&M is equal to the proportion of M&M's that are brown (0.30).

Figure 2. Distribution of all M&M's.

Notice that the distributions in Figures 1 and 2 are not identical. Figure 1 portrays the distribution in a sample of 55 M&M's. Figure 2 shows the proportions for all M&M's. Chance factors involving the machines used by the manufacturer introduce random variation into the different bags produced. Some bags will have a distribution of colors that is close to Figure 2; others will be farther away.

Continuous Variables

The variable "color of M&M" used in this example is a discrete variable, and its distributions is also called discrete. Let us now extend the concept of a distribution to continuous variables. With continuous variables, no two scores will be exactly the same and thus all frequencies will be 1. The solution to this problem is to create a grouped frequency distribution. In a grouped frequency distribution, scores falling within various ranges are tabulated.

Probability Densities

Distributions for continuous variables are called probability density distributions. Some probability densities have particular importance in Statistics. A very important one is shaped like a bell, and called the normal distribution. Many naturally-occuring phenomena can be approximated surprisingly well by this distribution. It will serve to illustrate some features of all continuous distributions.

An example of a normal distribution is shown in Figure 3. The Y axis in the normal distribution represents the " density of probability." Intuitively, it shows the chance of obtaining values near corresponding points on the X axis. In Figure 3, for example, the probability of an observation with value near 40 is about half of the probability of an observation with value near 50. Although this text does not discuss the concept of probability density in detail, you should keep the following ideas in mind about the curve that describes a continuous distribution (like the normal distribution). First, the area under the curve equals 1. Second, the probability of any exact value of X is 0. Finally, the area under the curve and bounded between two given points on the X axis is the probability that a number chosen at random will fall between the two points.

Figure 3. A normal distribution.

Shapes of Distributions

Distributions have different shapes; they don't all look like the normal distribution in Figure 3. For example, the normal probability density is higher in the middle compared to its two tails. Other distributions need not have this feature. There is even variation among the distributions that we call "normal." For example, some normal distributions are more spread out than the one shown in Figure 3 (their tails begin to hit the X axis further from the middle of the curve --for example, at 10 and 90 if drawn in place of Figure 2). Others are less spread out (their tails might approach the X axis at 30 and 70). More information on the normal distribution can be found in a later chapter completely devoted to them.

The normal distribution shown in Figure 3 is symmetric; if you folded it in the middle, the two sides would match perfectly. Figure 4 shows the discrete distribution of scores on a psychology test. This distribution is not symmetric: the tail in the positive direction extends further than the tail in the negative direction. A distribution with the longer tail extending in the positive direction is said to have a positive skew. It is also described as "skewed to the right."

Figure 4. A distribution with a positive skew.

Figure 5 shows the salaries of major league baseball players in 1974 (in thousands of dollars). This distribution has an extreme positive skew.

Figure 5. A distribution with a very large positive skew. This histogram shows the salaries of major league baseball players.

Although less common, some distributions have negative skew. Figure 6 shows the scores on a 20-point problem on a statistics exam. Since the tail of the distribution extends to the left, this distribution is skewed to the left.

Figure 6. A distribution with negative skew. This histogram shows the frequencies of various scores on a 20-point question on a statistics test.

The distributions shown so far all have one distinct high point or peak. The distribution in Figure 8 has two distinct peaks. A distribution with two peaks is called a bimodal distribution.

Figure 8. Frequencies of times between eruptions of the old faithful geyser. Notice the two distinct peaks: one at 1.85 and the other at 3.85.

Distributions also differ from each other in terms of how large or "fat" their tails are. Figure 9 shows two distributions that differ in this respect. The upper distribution has relatively more scores in its tails; its shape is called leptokurtic. The lower distribution has relatively fewer scores in its tails; its shape is called platykurtic .

Figure 9. Distributions differing in kurtosis. The top distribution has long tails. It is called "leptokurtic." The bottom distribution has short tails. It is called "platykurtic."