Mikki Hebl and David Lane
- Distinguish between a sample and a population
- Define inferential statistics
- Identify biased samples
- Distinguish between simple random sampling and stratified sampling
- Distinguish between random sampling and random assignment
Populations and samples
In statistics, we often rely on a sample
--- that is, a small subset of a larger set of data --- to draw
inferences about the larger set. The larger set is known as the population
from which the sample is drawn.
Example #1: You have been hired by the National
Election Commission to examine how the American people feel about
the fairness of the voting procedures in the U.S. Whom will you
It is not practical to ask every single American
how he or she feels about the fairness of the voting procedures.
Instead, we query a relatively small number of Americans, and
draw inferences about the entire country from their responses.
The Americans actually queried constitute our sample of the larger
population of all Americans. The mathematical procedures whereby
we convert information about the sample into intelligent guesses
about the population fall under the rubric of inferential
A sample is typically a small subset of the population.
In the case of voting attitudes, we would sample a few thousand
Americans drawn from the hundreds of millions that make up the
country. In choosing a sample, it is therefore crucial that it
not over-represent one kind of citizen
at the expense of others. For example, something would be wrong
with our sample if it happened to be made up entirely of Florida
residents. If the sample held only Floridians, it could not be
used to infer the attitudes of other Americans. The same problem
would arise if the sample were comprised only of Republicans.
Inferential statistics are based on the assumption that sampling
is random. We trust a random sample to represent different segments
of society in close to the appropriate proportions (provided
the sample is large enough; see below).
Example #2: We are interested in examining how
many math classes have been taken on average by current graduating
seniors at American colleges and universities during their four
years in school. Whereas our population in the last example included
all US citizens, now it involves just the graduating seniors
throughout the country. This is still a large set since there
are thousands of colleges and universities, each enrolling many
students. (New York University, for example, enrolls 48,000 students.)
It would be prohibitively costly to examine the transcript of every college
senior. We therefore take a sample of college seniors and then
make inferences to the entire population based on what we find.
To make the sample, we might first choose some public and private
colleges and universities across the United States. Then we might
sample 50 students from each of these institutions. Suppose that
the average number of math classes taken by the people in our
sample were 3.2. Then we might speculate that 3.2 approximates
the number we would find if we had the resources to examine every
senior in the entire population. But we must be careful about
the possibility that our sample is non-representative of the
population. Perhaps we chose an overabundance of math majors,
or chose too many technical institutions that have heavy math
requirements. Such bad sampling makes our sample unrepresentative
of the population of all seniors.
To solidify your understanding
of sampling bias,
consider the following example. Try to identify the population
and the sample, and then reflect on whether the sample is likely
to yield the information desired.
Example #3: A substitute teacher wants to know
how students in the class did on their last test. The teacher asks the
10 students sitting in the front row to state their latest test
score. He concludes from their report that the class did extremely
well. What is the sample? What is the population? Can you identify
any problems with choosing the sample in the way that the teacher
In Example #3, the population consists of all
students in the class. The sample is made up of just the 10 students
sitting in the front row. The sample is not likely to be representative
of the population. Those who sit in the front row tend to be more
interested in the class and tend to perform higher on tests. Hence,
the sample may perform at a higher level than the population.
Example #4: A coach is interested in how many
cartwheels the average college freshmen at his university can
do. Eight volunteers from the freshman class step forward. After
observing their performance, the coach concludes that college
freshmen can do an average of 16 cartwheels in a row without
In Example #4, the population is the class of
all freshmen at the coach's university. The sample is composed
of the 8 volunteers. The sample is poorly chosen because volunteers
are more likely to be able to do cartwheels than the average
freshman; people who can't do cartwheels probably did not volunteer!
In the example, we are also not told of the gender of the volunteers.
Were they all women, for example? That might affect the outcome,
contributing to the non-representative nature of the sample
(if the school is co-ed).
Sampling Bias is Discussed in More Detail Here
Simple Random Sampling
Researchers adopt a variety of sampling strategies.
The most straightforward is simple
random sampling. Such sampling requires every member
of the population to have an equal chance of being selected into
the sample. In addition, the selection of one member must be independent
of the selection of every other member. That is, picking one
member from the population must not increase or decrease the probability
of picking any other member (relative to the others). In this
sense, we can say that simple random sampling chooses a sample
by pure chance. To check your understanding
of simple random sampling, consider the following example. What
is the population? What is the sample? Was the sample picked by
simple random sampling? Is it biased?
Example #5: A research scientist is interested
in studying the experiences of twins raised together versus those
raised apart. She obtains a list of twins from the National
Twin Registry, and selects two subsets of individuals
for her study. First, she chooses all those in the registry whose
last name begins with Z. Then she turns to all those whose last
name begins with B. Because there are so many names that start
with B, however, our researcher decides to incorporate only every
other name into her sample. Finally, she mails out a survey and
compares characteristics of twins raised apart versus together.
In Example #5, the population consists of all
twins recorded in the National Twin Registry. It is important
that the researcher only make statistical generalizations to the
twins on this list, not to all twins in the nation or world. That
is, the National Twin Registry may not be representative of all
twins. Even if inferences are limited to the Registry, a number
of problems affect the sampling procedure we described. For instance,
choosing only twins whose last names begin with Z does not give
every individual an equal chance of being selected into the sample.
Moreover, such a procedure risks over-representing ethnic groups
with many surnames that begin with Z. There are other reasons
why choosing just the Z's may bias the sample. Perhaps such people
are more patient than average because they often find themselves
at the end of the line! The same problem occurs with choosing
twins whose last name begins with B. An additional problem for
the B's is that the every-other-one procedure disallowed
adjacent names on the B part of the list from being both selected.
Just this defect alone means the sample was not formed through
simple random sampling.
Sample size matters
Recall that the definition of a random sample
is a sample in which every member of the population has an equal
chance of being selected. This means that the sampling procedure
rather than the results of the
procedure define what it means for a sample to be random. Random
samples, especially if the sample size is small, are not necessarily
representative of the entire population. For example, if a random
sample of 20 subjects were taken from a population with an equal
number of males and females, there would be a nontrivial probability
(0.06) that 70% or more of the sample would be female. (To see
how to obtain this probability, see the section
on the binomial distribution.)
Such a sample would not be representative, although it would
be drawn randomly. Only a large sample size makes it likely
that our sample is close to representative of the population.
For this reason, inferential statistics take into account
the sample size when generalizing results from samples
to populations. In later chapters, you'll see what kinds of
mathematical techniques ensure this sensitivity to sample size.
More complex sampling
Sometimes it is not feasible to build a sample
using simple random sampling. To see the problem, consider the
fact that both Dallas and Houston are competing to be hosts of
the 2012 Olympics. Imagine that you are hired to assess whether
most Texans prefer Houston to Dallas as the host, or the reverse.
Given the impracticality of obtaining the opinion of every single
Texan, you must construct a sample of the Texas population. But
now notice how difficult it would be to proceed by simple random
sampling. For example, how will you contact those individuals
who dont vote and dont have a phone? Even among people
you find in the telephone book, how can you identify those who
have just relocated to California (and had no reason to inform
you of their move)? What do you do about the fact that since the
beginning of the study, an additional 4,212 people took up residence
in the state of Texas? As you can see, it is sometimes very difficult
to develop a truly random procedure. For this reason, other kinds
of sampling techniques have been devised. We now discuss two of
In experimental research, populations are often
hypothetical. For example, in an experiment comparing the effectiveness
of a new anti-depressant drug with a placebo,
there is no actual population of individuals taking the drug.
In this case, a specified population of people with some degree
of depression is defined and a random sample is taken from
this population. The sample is then randomly divided into two
groups; one group is assigned to the treatment condition (drug)
and the other group is assigned to the control condition (placebo).
This random division of the sample into two groups is called random
assignment. Random assignment
is critical for the validity
of an experiment. For example, consider the bias that could
be introduced if the first 20 subjects to show up at the experiment
were assigned to the experimental group and the second 20
subjects were assigned to the control group. It is possible
that subjects who show up late tend to be more depressed than
those who show up early, thus making the experimental group
less depressed than the control group even before the treatment
In experimental research of this kind, failure
to assign subjects randomly to groups is generally more serious
than having a non-random sample. Failure to randomize (the former
error) invalidates the experimental findings. A non-random sample
(the latter error) simply restricts the generalizability of
Since simple random sampling often does not ensure
a representative sample, a sampling method called stratified
random sampling is sometimes used to make the sample more
representative of the population. This method can be used if the
population has a number of distinct "strata" or groups.
In stratified sampling, you first identify members of your sample
who belong to each group. Then you randomly sample from each of
those subgroups in such a way that the sizes of the subgroups
in the sample are proportional to their sizes in the population.
Let's take an example: Suppose you were interested
in views of capital punishment at an urban university. You have
the time and resources to interview 200 students. The student
body is diverse with respect to age; many older people work during
the day and enroll in night courses (average age is 39), while
younger students generally enroll in day classes (average age
of 19). It is possible that night students have different views
about capital punishment than day students. If 70% of the students
were day students, it makes sense to ensure that 70% of the sample
consisted of day students. Thus, your sample of 200 students would
consist of 140 day students and 60 night students. The proportion
of day students in the sample and in the population (the entire
university) would be the same. Inferences to the entire population
of students at the university would therefore be more secure.
Please answer the questions: