A frequency polygon for 642 psychology test scores
shown in Figure 1 was constructed from the frequency table shown in Table 1.
The first label on the X-axis is 35. This
represents an interval extending from 29.5 to 39.5. Since the
lowest test score is 46, this interval has a frequency of 0. The
point labeled 45 represents the interval from 39.5 to 49.5. There
are three scores in this interval. There are 147 scores in the
interval that surrounds 85.
You can easily discern the shape of the distribution
from Figure 1. Most of the scores are between 65 and 115. It
is clear that the distribution is not symmetric inasmuch as
good scores (to the right) trail off more gradually than poor
scores (to the left). In the terminology of Chapter 3 (where
we will study shapes of distributions more systematically),
the distribution is skewed.
Frequency polygons are useful for comparing distributions.
This is achieved by overlaying the frequency polygons drawn for
different data sets. Figure 3 provides an example. The data come
from a task in which the goal is to move a computer cursor to a
target on the screen as fast as possible. On 20 of the trials,
the target was a small rectangle; on the other 20, the target
was a large rectangle. Time to reach the target was recorded on
each trial. The two distributions (one for each target) are plotted
together in Figure 3. The figure shows that, although there is
some overlap in times, it generally took longer to move the cursor
to the small target than to the large one.
It is also possible to plot two cumulative frequency
distributions in the same graph. This is illustrated in Figure
4 using the same data from the cursor task. The difference in distributions
for the two targets is again evident.
Note that the graphs on this page were not created in R.
However, the R code shown here produces very similar graphs. Make sure to put the data files in the default directory.
R code written by David Scott
Data files for Figures 1 and 2
Data files for Figures 3 and 4
# Figure 1
tests = read.csv(file = 'psych_scores.csv')
bk = seq(40,170,10) # bin count interval
tk = seq(35,175,10) # FP "bins" edges
nuk = c( 0, hist( (tests[[1]]), bk, plot=F )$counts, 0 )
main="Frequency polygon for the psychology test scores"
plot(tk,nuk,type="l",col=4,xlab="Test Score",ylab="Frequency",lwd=2,main=main,ylim=c(0,160))
points(tk,nuk,pch=16,col=4,cex=1.5); abline(h=seq(0,160,20),lwd=.5)
# Figure 2
tests = read.csv(file = 'psych_scores.csv')
cum.nuk = cumsum(nuk)
main="Cumulative frequency polygon for the psychology test scores"
plot(tk,cum.nuk,type="l",col=4,xlab="Test Score",ylab="Cumulative Frequency",
lwd=2,main=main,ylim=c(0,700))
points(tk,cum.nuk,pch=16,col=4,cex=1.5); abline(h=seq(0,700,100),lwd=.5)
# Figure 3
target = read.csv(file = 'target_size.csv')
bk = seq(400,1100,100) # bin count interval
tk = seq(350,1150,100) # FP "bins" edges
dat = target[[2]] # 1st 20 small 2nd 20 large
nuk1 = c( 0, hist( dat[ 1:20], bk, plot=F )$counts, 0 )
nuk2 = c( 0, hist( dat[21:40], bk, plot=F )$counts, 0 )
main="Overlaid Frequency polygons"
plot(tk,nuk1,type="l",col=2,xlab="Time (msec)",ylab="Frequency",lwd=2,main=main,ylim=c(0,10))
points(tk,nuk1,pch=16,col=2,cex=2); abline(h=seq(0,10,2.5),lwd=.5,lty=2)
lines(tk,nuk2,col=4); points(tk,nuk2,pch=16,cex=2,col=4)
text(1000,4,"small target",cex=1.5)
text(720,8,"large target",cex=1.5)
# Figure 4
target = read.csv(file = 'target_size.csv')
cum.nuk1 = cumsum(nuk1)
cum.nuk2 = cumsum(nuk2)
main="Overlaid cumulative frequency polygons"
plot(tk,cum.nuk1,type="l",col=2,xlab="Time (msec)", ylab="Cumulative Frequency",
lwd=2,main=main,ylim=c(0,20))
points(tk,cum.nuk1,pch=16,col=2,cex=2); abline(h=seq(0,20,5))
lines(tk,cum.nuk2,col=4,lwd=2); points(tk,cum.nuk2,pch=16,col=4,cex=2)
text(850,12,"small target",cex=1.5)
text(450,18,"large target",cex=1.5)