Unequal Sample Sizes
Prerequisites
ANOVA
Designs, MultiFactor
Designs
Learning Objectives
 State why unequal n can be a problem
 Define confounding
 Compute weighted and unweighted means
 Distinguish between Type I and Type III sums of squares
 Describe why the cause of the unequal sample sizes makes a difference
for the interpretation
The Problem of Confounding
Whether by design, accident,
or necessity, the number of subjects in each of the various
experimental conditions may not be equal. For example, the sample
sizes for the Obesity
and Relationships case study are shown in Table 1.
Although the were approximately equal, the most subjects were
in the Acquaintance/Typical condition. Since n is
used to refer to the sample size of an individual group, designs
with unequal samples sizes are sometimes referred to as designs
with unequal n.
We consider an absurd design to illustrate the
main problem caused by unequal n. Suppose an experimenter were
interested in the effect of diet and exercise on cholesterol.
The sample sizes are shown in Table 2.
What makes this example absurd, is that there are no subjects
in either the Low Fat/No exercise condition or the High Fat/Moderate
exercise condition. The (hypothetical) data showing change in
cholesterol are shown in Table 3.
The last column shows the mean change in cholesterol for the
two Diet conditions whereas the last row shows the mean change
for the two Exercise conditions. The value of 15 in the lowerright
most cell in the table is the mean of all subjects.
We see from the last column that those on the
lowfat diet lowered their cholesterol an average of 25 units
whereas those on the highfat diet lowered theirs by only 5 units.
However, there is no way of knowing whether the difference is
due to diet or to exercise since every subject in the lowfat
condition was in the moderateexercise condition and every subject
in the highfat condition was in the noexercise condition. Therefore,
Diet and Exercise are completely confounded. The problem with unequal n is that it causes confounding.
Weighted and Unweighted Means
The difference between weighted and unweighted
means, a difference critical for understanding how to
deal with the confounding resulting from unequal n.
Weighted and unweighted means will be explained
using the
data shown in Table 4. Here, Diet and Exercise
are confounded because 80% of the subjects in the lowfat
condition exercised as compared to 20% of those in the highfat
condition. However, there is not complete confounding as there
was with the data in Table 3.
The weighted mean is computed
as the mean of the lowfat moderateexercise condition and the
lowfat noexercise mean, weighted in accordance with sample size. To
compute a weighted mean, you multiply each mean by its sample
size and divide by N, the total number of observations. Since
there are four subjects in the lowfat moderateexercise condition
and one subject in the lowfat noexercise condition, the means
are weighted by factors of 4 and 1 as shown below
where Mw is
the weighted mean.
The weighted mean for the lowfat
condition is also the mean of all five
scores in this condition. Thus if you ignore the factor "Exercise,"
you are implicitly computing weighted means.
The unweighted
mean for the lowfat condition (Mu) is
simply the mean of the two means.
Consider the the main effect of Diet by comparing the weighted mean for the lowfat
diet (26) with the weighted mean for the highfat diet (4).
This difference of 22 is called "the effect of diet
ignoring exercise" and is misleading since
most of the lowfat subjects exercised and most of the highfat
subjects did not. However, the difference between the unweighted
means of 15.5 (23.75 minus 8.25) is not affected by this confounding
and is therefore a better measure of the main effect. In short,
the weighted means ignore the effects of other variables (exercise
in this example) and result in confounding; unweighted means control
for the effect of other variables and therefore eliminate the
confounding.
Statistical analysis programs use different terms
for means that are computed controlling for other effects. SPSS
calls them "estimated marginal means" whereas SAS and SAS JMP
call them least squares means.
Types of Sums of Squares
The section on Multifactor
ANOVA stated that
the sum of squares total was not equal to the sum of the sum of
squares for all the other sources of variation when there is unequal
n. This is because the confounded sums of squares are not apportioned
to any source of variation. For the
data in Table 4, the sum of squares for Diet is 390.625,
the sum of squares for Exercise is 180.625, and the sum of squares
confounded between these two factors is 819.375 (the calculation
of this value is beyond the scope of this introductory text).
In the ANOVA Summary Table shown in Table 5, this large portion
of the sum of squares is not apportioned to any source of variation
and represents the "missing" sums of squares. That
is, if you add up the sums of squares for Diet, Exercise, D
x E, and Error, you get 904.625. If you add the confounded sum
of squares of 819.375 to this value you get the total sum of
squares of 1722.00. When confounded sums of squares are not
apportioned to any source of variation, the sums of squares
are called Type
III sums of squares. Type III sums of squares are, by
far, the most common and if sums of squares are not otherwise
labeled, it can safely be assumed that they are Type III.
When confounded sums of squares are apportioned
to sources of variation, the sums of squares are called Type
I sums of squares. The order in which the confounded sums
of squares is apportioned is determined by the order in which
the effects are listed. The first effect gets any sums of squares
confounded between it and any of the other effects. The second
gets the sums of squares confounded between it and subsequent
effects, but not confounded with the first effect, etc. The Type
I sums of squares are shown in Table 6. As you can see, with Type
I sums of squares, the sum of all sums of squares is the total
sum of squares.
In Type II sums of squares,
sums of squares confounded between main effects is not apportioned
to any source of variation whereas sums of squares confounded
between main effects and interactions is apportioned to the main
effects. In our example, there is no confounding between D x E
interaction and either of the main effects. Therefore, the Type
II sums of squares are equal to the Type III sums of squares.
Unweighted Mean Analysis
Type III sums of squares are tests of difference
in unweighted means. However, there is an alternative method to
testing the same hypotheses tested using Type III sums of squares.
This method, unweighted means analysis,
is computationally simpler than the standard method but is
an approximate test rather than an exact test. It is, however,
a very good approximation in all but extreme cases. Moreover,
it is exactly the same as the traditional test for effects with
one degree of freedom. The Analysis Lab uses unweighted means
analysis and therefore may not match the results of other computer
programs exactly when there is unequal n and the df are greater
than one.
Causes of Unequal Samples
None of the methods for dealing with unequal sample
sizes are valid if the experimental treatment is the source of
the unequal sample sizes. Imagine an experiment seeking to determine
whether publicly performing an embarrassing act would affect one's
attitude toward public disclosure. In this imaginary experiment,
the experimental group is asked to reveal to a group of people
the most embarrassing thing they have ever done. The control group
is asked to describe what they had at their last meal. Twenty
subjects are recruited for the experiment and randomly divided
into two equal groups of 10, one for the experimental treatment
and one for the control. Following the description, subjects are
given an attitude survey concerning public disclosure. So far,
this seems like a valid experimental design. However, of the 10
subjects in the experimental group, four withdrew from the experiment
because they did not wish to publicly describe an embarrassing
situation. None of the subjects in the control group withdrew.
Even if the data analysis shows a significant effect, it would
not be valid to conclude that the treatment had an effect because
a likely alternative explanation cannot be ruled out. Namely,
subjects who were willing to describe an embarrassing situation
differed from those who were not even before the experiment began.
Thus, the differential dropout rate destroyed the random assignment of
subjects to conditions, a critical feature of the experimental
design. No amount of statistical adjustment can compensate for
this flaw.
