Specific Comparisons (Independent Groups)
Prerequisites
Difference
Between Two Means (Independent Groups)
Learning Objectives
 Define linear combination
 Specify a linear combination in terms of coefficients
 Do a significance test for a specific comparison
There are many occasions on which the comparisons
among means are more complicated than simply comparing one mean
with another. This section shows how to test these more complex
comparisons. The methods in this section assume that the comparison
among means was decided on before looking at the data. Therefore
these comparisons are called planned comparisons.
A different procedure is necessary for unplanned
comparisons.
Let's begin with the madeup data from a hypothetical
experiment shown in Table 1. Twelve subjects were selected from
a population of highselfesteem subjects (esteem = 1) and an
additional 12 subjects were selected from a population of lowselfesteem
subjects (esteem = 2). Subjects then performed on a task and (independent
of how well they really did) half were told they succeeded (outcome
= 1) and the other half were told they failed (outcome = 2). Therefore
there were six subjects in each esteem/success combination and
24 subjects altogether.
After the task, subjects were asked to rate (on
a 10point scale) how much of their outcome (success or failure)
they attributed to themselves as opposed to being due to the nature
of the task.
The means of the four conditions are shown in
Table 2.
There are several questions we can ask about the
data. We begin by asking whether, on average, subjects who were
told they succeeded differed significantly from subjects who were
told they failed. The means for subjects in the success condition
are 7.333 for the highselfesteem subjects and 5.500 for the
lowselfesteem subjects. Therefore, the mean of all subjects
in the success condition is (7.333 + 5.500)/2 = 6.417. Similarly,
the mean for all subjects in the failure condition is (4.833 +
7.833)/2 = 6.333. The question is, how do we do a significance
test for this difference of 6.4176.333 = 0.083?
The first step is to express this difference in
terms of a linear combination of a set of coefficients and the
means. This may sound complex, but it is really pretty easy. We
can compute the mean of the success conditions by multiplying
each success mean by 0.5 and then adding the result. In other
words, we compute
(.5)(7.333) + (.5)(5.500)
= 3.67 + 2.75
= 6.42
Similarly we can compute the mean of the failure
conditions by multiplying each failure mean by 0.5 and then adding
the result:
(.5)(4.833) + (.5)(7.833)
= 2.417 + 3.916
= 6.33
The difference between the two means can be expressed
as
.5 x 7.333 + .5 x 5.500 (.5 x 4.833 + .5 x
7.833)
= .5 x 7.333 + .5 x 5.500 .5 x 4.833  .5 x 7.83
We therefore can compute the difference between
the "success" mean and the "failure" mean
by multiplying each "success" mean by 0.5, each failure
mean by 0.5 and adding the results. In Table 3, the coefficient
column is the multiplier and the product column in the result
of the multiplication. If we add up the four values in the product
column we get
L = 3.667 + 2.750  2.417  3.917 = 0.083
This is the same value we got when we computed
the difference between means previously (within rounding error).
We call the value "L" for "linear combination."
Now, the question is whether our value of L is significantly different
from 0. The general formula for L is:
where ci is the ith coefficient
and Mi is the ith mean. As shown above,
L = 0.083. The formula for testing L for significance is shown
below
In this example,
MSE is the mean of the variances. The four variances
are shown in Table 4. Their mean is 1.625. Therefore MSE = 1.625.
The value of n is the number of subjects in each
group. Here, n = 6.
Putting it all together,
We need to know the degrees for freedom in order
to compute the probability value. The degrees of freedom is
df = N  k
where N is the total number of subjects (24) and
k is the number of groups (4). Therefore, df = 20. Using the Online
Calculator, we find that the twotailed probability value is 0.874.
Therefore, the difference between the "success" condition
and the "failure" condition is not significant.
Online
Calculator: t distribution
A more interesting question about the results is
whether the effect of outcome (success or failure) differs depending
on the self esteem of the subject. For example, success may make
highselfesteem subjects more likely
to attribute the outcome to themselves whereas success may make
lowselfesteem subjects less likely
to attribute the outcome to themselves.
To test this, we have to test a difference between
differences. Specifically, is the difference between success and
failure outcomes for the highselfesteem subjects different from
the difference between success and failure outcomes for the lowselfesteem
subjects. The means shown in Table 5 show that this is the case.
For the highselfesteem subjects, the difference between the
success and failure is 7.3334.8333 = 2.5. For lowselfesteem
subjects, the difference is 5.5007.833=2.333. The difference
between differences is 2.5  (2.333) =4.83.
The coefficients to test this difference between
differences are shown in Table 5.
If it is hard to see where these coefficients
came from, consider that our difference between differences was
computed this way:
(7.33  4.83)  (5.5  7.83)
= 7.3  4.83  5.5 + 7.83
= (1)7.3 + (1)4.83 + (1)5.5 + (1)7.83
The values in parentheses are the coefficients.
To continue the calculations,
The twotailed p value is 0.0002. Therefore, the
difference between differences is highly significant.
In a later chapter on Analysis
of Variance, you
will see that comparisons such as this are testing what is called
an interaction. In general, there
is an interaction when the effect of one variable differs as
a function of the level of another variable. In this example
the effect of the outcome variable is different depending on
the subject's self esteem. For the highselfesteem subjects,
success led to more self attributions than did failure; for
the lowselfesteem subjects, success led to less self attributions
than failure.
Multiple Comparisons
The more comparisons you make, the greater your
chance of a Type I error. It is useful to distinguish between
two error rates: (1) the percomparison error
rate and (2) the familywise error rate.
The percomparison error rate is the probability of a Type I error
for a particular comparison. The familywise error rate is the
probability of making one or more Type I error in a family or
set of comparisons. In the attribution experiment discussed above,
we computed two comparisons. If we use the 0.05 level for each
comparison, then the percomparison rate is simply 0.05. The familywise
rate can be complex. Fortunately, there is a simple approximation
that is fairly accurate when the number of comparisons is small.
Define α as the percomparison error rate and c as the number
of comparisons, the following inequality always holds true for
the familywise error rate (FW) can be approximated as:
FW ≤ cα
This inequality is called the Bonferroni
inequality.
The Bonferroni inequality can be used to control
the familywise error rate as follows: If you want to the familywise
error rate to be α, you use α/c as the percomparison
error rate. This correction, called the Bonferroni
correction, will generally result in a family wise error
rate less than α.
Should the familywise error rate be controlled?
Unfortunately, there is no clearcut answer to this question.
The disadvantage of controlling the familywise error rate is that
it makes it more difficult to obtain a significant result for
any given comparison: The more comparisons you do, the lower the
percomparison rate must be and therefore the harder it is to
reach significance. That is, the power is lower when you control
the familywise error rate. The advantage is that you have a lower
chance of making a Type I error.
One consideration is the definition of a family
of comparisons. Let's say you conducted a study in which you
were interested in whether there was a difference between male
and female babies in the age at which they started crawling.
After you finished analyzing the data, a colleague of yours
had a totally different research question: Do babies who are
born in the winter differ from those born in the summer in the
age they start crawling? Should the familywise rate be controlled
or should it be allowed to be greater than 0.05? Our view is
that there is no reason you should be penalized (by lower power)
just because your colleague used the same data to address a
different research question. Therefore, the familywise error
rate need not be controlled. Consider the two comparisons done
on the attribution
example at the beginning of this section: These comparisons
are testing completely different hypotheses. Therefore, controlling
the familywise rate is not necessary.
Now consider a study designed to investigate the
relationship between various variables and the ability of subjects
to predict the outcome of a coin flip. One comparison is between
males and females; a second comparison is between those over 40
and those under 40; a third is between vegetarians and nonvegetarians,
and a fourth is between firstborns and others. The question of
whether these four comparisons are testing different hypotheses
depends on your point of view. On the one hand, there is nothing
about whether age makes a difference that is related to whether
diet makes a difference. In that sense, the comparisons are addressing
different hypotheses. On the other hand, the whole series of comparisons
could be seen as addressing the general question of whether anything
affects the ability to predict the outcome of a coin flip. If
nothing does, then allowing the familywise rate to be high means
that there is a high probability of reaching the wrong conclusion.
Orthogonal Comparisons
In the preceding sections, we talked about comparisons
being independent. Independent comparisons are often called orthogonal
comparisons. There is a simple test to determine whether
two comparisons are orthogonal: If the sum of the products of
the coefficients is 0, then the comparisons are orthogonal. Consider
again the experiment on the attribution of success or failure.
Table 6 shows the coefficients previously presented in Table
3 and in Table 5. The column "C1"
contains the coefficients from the comparison shown in Table
3; the column "C2" contains the coefficients from
the comparison shown in Table 5. The column
labeled "Product" is the product of theses two columns.
Note that the sum of the numbers in this column is 0. Therefore,
the two comparisons are orthogonal.
Table 7 shows two comparisons that are not orthogonal.
The first compares the highselfesteem subjects to lowselfesteem
subjects; the second considers only those in the success group
compares highselfesteem subjects to lowselfesteem subjects.
The failure group is ignored by using 0's as coefficients. Clearly
the comparison of these two groups of subjects for the whole sample
is not independent of the comparison of them for the success group.
You can see that the sum of the products of the coefficients is
0.5 and not 0.
