Explain why the null hypothesis should not be accepted

Describe how a non-significant result can increase confidence that
the null hypothesis is false

Discuss the problems of affirming a negative conclusion

When a significance test results in a high probability
value, it means that the data provide little or no evidence that
the null hypothesis is false. However, the high probability value
is not evidence that the null hypothesis is true. The problem
is that it is impossible to distinguish a null effect from a very
small effect. For example, in the James
Bond Case Study, suppose Mr.
Bond is, in fact, just barely better than chance at judging whether
a martini was shaken or stirred. Assume he has a 0.51 probability
of being correct on a given trial (π = 0.51). Let's say Experimenter
Jones (who did not know π =
0.51) tested Mr. Bond and found he was correct 49 times out of
100 tries. How would the significance test come out? The experimenter’s
significance test would be based on the assumption that Mr. Bond
has a 0.50 probability of being correct on each trial (π =
0.50). Given this assumption, the probability of his being correct
49 or more times out of 100 is 0.62. This means that the probability
value is 0.62, a value very much higher than the conventional significance
level of 0.05. This result, therefore, does not give even a hint
that the null hypothesis is false. However, we know (but Experimenter
Jones does not) that π =
0.51 and not 0.50 and therefore that the null hypothesis is false.
So, if Experimenter Jones had concluded that the null hypothesis
was true based on the statistical analysis, he or she would have
been mistaken. Concluding that the null hypothesis is true is
called accepting
the null hypothesis. To do so is a serious error.

Do not accept the null hypothesis when you do
not reject it.

So how should the non-significant result be interpreted?
The experimenter should report that there is no credible evidence
Mr. Bond can tell whether a martini was shaken or stirred, but
that there is no proof that he cannot. It is generally impossible
to prove a negative. What if I claimed to have been Socrates in
an earlier life? Since I have no evidence for this claim, I would
have great difficulty convincing anyone that it is true. However,
no one would be able to prove definitively that I was not.

Often a non-significant finding increases one's
confidence that the null hypothesis is false. Consider the following
hypothetical example. A researcher develops a treatment for anxiety
that he or she believes is better than the traditional treatment.
A study is conducted to test the relative effectiveness of the
two treatments: 20 subjects are randomly divided into two groups
of 10. One group receives the new treatment and the other receives
the traditional treatment. The mean anxiety level is lower for
those receiving the new treatment than for those receiving
the traditional treatment. However, the difference is not significant.
The statistical analysis shows that a difference as large or larger
than the one obtained in the experiment would occur 11% of the
time even if there were no true difference between the treatments.
In other words, the probability value is 0.11. A naive researcher
would interpret this finding as evidence that the new treatment
is no more effective than the traditional treatment. However,
the sophisticated researcher, although disappointed that the effect
was not significant, would be encouraged that the new treatment
led to less anxiety than the traditional treatment. The data support
the thesis that the new treatment is better than the traditional
one even though the effect is not statistically significant. This
researcher should have more confidence that the new treatment
is better than he or she had before the experiment was conducted.
However, the support is weak and the data are inconclusive. What
should the researcher do? A reasonable course of action would
be to do the experiment again. Let's say the researcher repeated
the experiment and again found the new treatment was better than
the traditional treatment. However, once again the effect was
not significant and this time the probability value was 0.07.
The naive researcher would think that two out of two experiments
failed to find significance and therefore the new treatment is
unlikely to be better than the traditional treatment. The sophisticated
researcher would note that two out of two times the new treatment
was better than the traditional treatment. Moreover, two experiments
each providing weak support that the new treatment is better,
when taken together, can provide strong support. Using a method
for combining probabilities, it can be determined that combining
the probability values of 0.11 and 0.07 results in a probability
value of 0.045. Therefore, these two non-significant findings
taken together result in a significant finding.

Although there is never a statistical basis for
concluding that an effect is exactly zero, a statistical analysis
can demonstrate that an effect is most likely small. This
is done by computing a confidence
interval.
If all effect sizes in the interval are small, then it can be
concluded that the effect is small. For example, suppose an experiment
tested the effectiveness of a treatment for insomnia. Assume that
the mean time to fall asleep was 2 minutes shorter for those receiving
the treatment than for those in the control group and that this
difference was not significant. If the 95% confidence interval
ranged from -4 to 8 minutes, then the researcher would be justified
in concluding that the benefit is eight minutes or less. However,
the researcher would not be justified in concluding the null hypothesis
is true, or even that it was supported.