WithinSubjects ANOVA
Prerequisites
Introduction
to ANOVA,
ANOVA Designs, MultiFactor
ANOVA, Difference
Between Means, Correlated Pairs
Learning Objectives
 Define a withinsubjects factor
 Explain why a withinsubjects design can be expected to have more power
than a betweensubjects design
 Be able to create the Source and df columns of an ANOVA summary table
for a oneway withinsubjects design
 Explain error in terms of interaction
 Discuss the problem of carryover effects
 Be able to create the Source and df columns of an ANOVA summary table
for a design with one betweensubjects and one withinsubjects variable
 Define sphericity
 Describe the consequences of violating the assumption of sphericity
 Discuss courses of action that can be taken if sphericity is violated
Withinsubjects factors involve
comparisons of the same subjects under different conditions. For
example, in the ADHD Treatment
Study,
each child's performance was measured four times, once after
being on each of four drug doses for a week. Therefore,
each subject's performance was measured at each of the four levels of the factor "Dose."
Note the difference from betweensubjects
factors for which each subject's performance is measured only
once and the comparisons are among different groups of subjects.
A withinsubjects factor is sometimes referred to as a repeated
measures factor since repeated measurements are taken on
each subject. An experimental design in which the independent variable
is a withinsubjects factor is called a withinsubjects
design.
An advantage of withinsubjects designs
is that individual differences in subjects' overall levels of
performance are controlled. This is important because subjects
invariably will differ from one another. In an experiment on problem
solving, some subjects will be better than others regardless of
the condition they are in. Similarly, in a study of blood pressure
some subjects will have higher blood pressure than others regardless
of the condition. Withinsubjects designs control these individual
differences by comparing the scores of a subject in one condition
to the scores of the same subject in other conditions. In this
sense each subject serves as his or her own control. This typically
gives withinsubjects designs considerably more power than
betweensubjects designs.
Onefactor Designs
Let's consider
how to analyze the data from the ADHD
treatment case study. These data consist of the scores
of 24 children with ADHD on a delay of gratification (DOG)
task. Each child was tested under four dosage levels. For
now we will be concerned only with testing the difference
between the mean in the placebo (D0) condition and mean in
the highest dosage condition (D60). The details of the computations
are relatively unimportant since they are almost universally
done by computers. Therefore we jump right to the ANOVA Summary
table shown in Table 1.
The first source of variation, "Subjects," refers
to the differences among subjects. If all the subjects had exactly
the same mean (across the two dosages) then the sum of squares
for subjects would be zero; the more subjects differ from each
other, the larger the sum of squares subjects.
Dosage refers to
the differences between the two dosage levels. If the means for
the two dosage levels were equal, the sum of squares would be
zero. The larger the difference between means, the larger the
sum of squares.
The error reflects
the degree to which the effect of dosage is different for
different subjects. If subjects all responded very similarly to
the drug, then the error would be very low. For example, if all
subjects performed moderately better with the high dose than they
did with the placebo, then the error would be low. On the other
hand, if some subjects did better with the placebo while others
did better with the high dose, then the error would be high. It
should make intuitive sense that the less consistent the effect
of the drug, the larger the drug effect would have to be in order
to be significant. The degree to which the effect of the drug
differs depending on the subject is the Subjects x Drug interaction.
Recall that an interaction occurs when the effect of one variable
differs depending on the level of another variable. In this case,
the size of the error term is the extent to which the effect of
the variable "Drug" differs depending on the level of
the variable
"Subjects." Note that each subject is a different level
of the variable "Subjects."
Other portions of the summary table have the same
meaning as in between ANOVA. The F for dosage is the mean
square for dosage divided by the mean square error. For these
data, the F is significant with p = 0.004. Notice that this F
test is equivalent to the ttest
for correlated pairs, with F
= t2.
Table 2 shows the ANOVA Summary Table when all four
doses are included in the analysis. Since there are now four dosage
levels rather than two, the df for dosage is three rather than
one. Since the error is the Subjects x Dosage interaction, the
df for error is the df for "Subjects" (23) times the df for Dosage
(3) and is equal to 69.
Carryover effects
Often performing in one condition affects performance
in a subsequent condition in such a way to make a withinsubjects
design impractical. For example, consider an experiment with two
conditions. In both condition subjects are presented with pairs
of words. In Condition A subjects are asked to judge whether the
words have similar meaning whereas in Condition B subjects are
asked to judged whether they sound similar. In both conditions,
subjects are given a surprise memory test at the end of the presentation.
If condition were a withinsubjects variable, then the there would
be no surprise the second time and it is likely that the subjects
would have been trying to memorize the words.
Not all carryover effects cause such serious problems.
For example, if subjects get fatigued by performing a task, then
they would be expected to do worse on the second condition they
were in. However, as long as half the subjects are in Condition
A first and Condition B second, the fatigue effect itself would
not invalidate the results, although it would add noise and reduce
power. The carryover effect is symmetric in that having Condition
A first affects performance in Condition B to the same degree
that having Condition B first affects performance in Condition
A.
Asymmetric carryover effects cause more serious
problems. For example, suppose performance in Condition B were
much better if preceded by Condition A whereas performance in
Condition A was approximately the same regardless of whether it
was preceded by Condition B. With this kind of carryover effect
it is probably better to use a betweensubjects
design.
One between and onewithinsubjects factor
In the Stroop Interference case study, subjects
performed three tasks: naming colors, reading color words, and
naming the ink color of color words. Some of the subjects were
males and some of the subjects were females. Therefore this design
had two factors: gender and task. The ANOVA Summary Table for
this design is shown in Table 3.
The computations for the sum of squares will not
be covered since computations are normally done by software. However,
there are some important things to learn from the summary table.
First notice that there are two error terms: one for the betweensubjects
variable Gender and one for both the withinsubjects variable
Task and the interaction of the betweensubjects variable and
the withinsubjects variable. Typically, the mean square error
for the betweensubjects variable will be higher than the other
mean square error. In this example, the mean square error for
Gender is about twice as large as the other mean square error.
The degrees of freedom for the betweensubjects
variable is equal to the number of levels of the between subjects
variable minus one. In this example it is one since there are
two levels of gender. Similarly, the degrees of freedom for the
withinsubjects variable is equal to the number of levels of the
variable minus one. In this example, it is two since there are
three tasks. The degrees of freedom for the interaction is the
product of the degrees of freedom of the two variables. For the
Gender x Task interaction, the degrees of freedom is the product
of degrees of freedom Gender (which is 1) and the degrees of freedom
Task (which is 2) and is equal to 2.
Assumption of Sphericity
Withinsubjects ANOVA makes a restrictive
assumption about the variances and the correlations
among the dependent variables. Although the details of the assumption
are beyond the scope of this book, it is approximately correct
to say that it is assumed that all the correlations are equal
and all the variances are equal. Table 4 shows the correlations
among the three dependent variables in the Stroop Interference
case study.
Note that the correlation between the word reading
and the color naming variables of 0.7013 is much higher than
the correlation between either of these variables with the interference
variable. Moreover, as shown in Table 5, the variances among the
variables differ greatly.
Naturally, the assumption of sphericity, like
all assumptions, refers to populations not samples. However it
is clear from these sample data, the assumption is not met
here.
Consequences of Violating the Assumption of
Sphericity
Although ANOVA is robust to most violations of
its assumptions, the assumption of sphericity is an exception:
Violating the assumption of sphericity leads to a substantial
increase in the Type I error rate. Moreover, this assumption is
rarely met in practice. Although violations of this assumption
had received little attention in the past, the current consensus
of data analysts is that it is no longer considered acceptable
to ignore them.
Approaches to Dealing with Violations of Sphericity
If an effect is highly significant, there is a
conservative test that can be used to protect against an inflated
Type I error rate. This test consists of adjusting the degrees
of freedom for all within subject variables as follows: The degrees
of freedom numerator and denominator are divided by the number
of scores per subject minus one. Consider the effect of Task shown
in Table 3. There are three scores per subject and therefore the
degrees of freedom should be divided by two. The adjusted degrees
of freedom are:
(2)(1/2) = 1 for the numerator and
(90)(1/2)= 45 for the denominator
The probability value is obtained using the F
probability calculator with the new degrees of freedom parameters.
The probability of an F of 228.06 or larger with 1 and 45 degrees
of freedom is less than 0.001. Therefore, there is no need to
worry about the assumption violation in this case.
Possible violation of sphericity does make a difference
in the interpretation of the analysis shown in Table 2. The probability
value of an F or 5.18 with 1 and 23 degrees of freedom is 0.032,
a value that would lead to a more cautious conclusion than the
p value of 0.003 shown in Table 2.
The correction described above is very conservative
and should only be used when, as in Table 3, the probability value
is very low. A better correction, but one that is very complicated
to calculate is to multiply by a quantity called ε. There
are two methods of calculating ε. The correction called
the HuynhFeldt (or HF) is slightly preferred to the called the
Geisser Greenhouse (or GG) although both work well. The GG correction
is generally considered a little too conservative.
A final method for dealing with violations of sphericity
is to use a multivariate approach to withinsubjects variables.
This method has much to recommend it, but it is beyond the score
of this text.
