Frequency Polygons

Home

Introduction

Graphing Distributions

Contents
Standard
Qualitative Variables
Standard Video
Quantitative Variables
Standard Video
Stem and Leaf Displays
Standard Video
Histograms
Standard Video
Frequency Polygons
Standard
Box Plots
Standard Video
Box Plot Demo
Standard
Bar Charts
Standard Video
Line Graphs
Standard Video
Dot Plots
Standard Video
Statistical Literacy
Standard
Exercises
Standard

Chapter: Section:

Home | Previous Section | Next Section

Video

Frequency Polygons

Author(s)

David M. Lane

Prerequisites

Histograms

Learning Objectives

Create and interpret frequency polygons
Create and interpret cumulative frequency polygons
Create and interpret overlaid frequency polygons

Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the same purpose as histograms, but are especially helpful for comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions.

To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-axis representing the values of the scores in your data. Mark the middle of each class interval with a tick mark, and label it with the middle value represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a point in the middle of each class interval at the height corresponding to its frequency. Finally, connect the points. You should include one class interval below the lowest value in your data and one above the highest value. The graph will then touch the X-axis on both sides.

A frequency polygon for 642 psychology test scores shown in Figure 1 was constructed from the frequency table shown in Table 1.

Table 1. Frequency Distribution of Psychology Test Scores.

Lower Limit	Upper Limit	Count	Cumulative Count
29.5	39.5	0	0
39.5	49.5	3	3
49.5	59.5	10	13
59.5	69.5	53	66
69.5	79.5	107	173
79.5	89.5	147	320
89.5	99.5	130	450
99.5	109.5	78	528
109.5	119.5	59	587
119.5	129.5	36	623
129.5	139.5	11	634
139.5	149.5	6	640
149.5	159.5	1	641
159.5	169.5	1	642
169.5	179.5	0	642

The first label on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the lowest test score is 46, this interval has a frequency of 0. The point labeled 45 represents the interval from 39.5 to 49.5. There are three scores in this interval. There are 147 scores in the interval that surrounds 85.

You can easily discern the shape of the distribution from Figure 1. Most of the scores are between 65 and 115. It is clear that the distribution is not symmetric inasmuch as good scores (to the right) trail off more gradually than poor scores (to the left). In the terminology of Chapter 3 (where we will study shapes of distributions more systematically), the distribution is skewed.

Figure 1. Frequency polygon for the psychology test scores.

A cumulative frequency polygon for the same test scores is shown in Figure 2. The graph is the same as before except that the Y value for each point is the number of students in the corresponding class interval plus all numbers in lower intervals. For example, there are no scores in the interval labeled "35," three in the interval "45," and 10 in the interval "55." Therefore, the Y value corresponding to "55" is 13. Since 642 students took the test, the cumulative frequency for the last interval is 642.

Figure 2. Cumulative frequency polygon for the psychology test scores.

Frequency polygons are useful for comparing distributions. This is achieved by overlaying the frequency polygons drawn for different data sets. Figure 3 provides an example. The data come from a task in which the goal is to move a computer cursor to a target on the screen as fast as possible. On 20 of the trials, the target was a small rectangle; on the other 20, the target was a large rectangle. Time to reach the target was recorded on each trial. The two distributions (one for each target) are plotted together in Figure 3. The figure shows that, although there is some overlap in times, it generally took longer to move the cursor to the small target than to the large one.

Figure 3. Overlaid frequency polygons.

It is also possible to plot two cumulative frequency distributions in the same graph. This is illustrated in Figure 4 using the same data from the cursor task. The difference in distributions for the two targets is again evident.

Figure 4. Overlaid cumulative frequency polygons.

Show R code

Note that the graphs on this page were not created in R. However, the R code shown here produces very similar graphs. Make sure to put the data files in the default directory.

R code written by David Scott

Data files for Figures 1 and 2
Data files for Figures 3 and 4

# Figure 1
tests = read.csv(file = 'psych_scores.csv')
bk = seq(40,170,10) # bin count interval
tk = seq(35,175,10) # FP "bins" edges
nuk = c( 0, hist( (tests[[1]]), bk, plot=F )$counts, 0 )
main="Frequency polygon for the psychology test scores"
plot(tk,nuk,type="l",col=4,xlab="Test Score",ylab="Frequency",lwd=2,main=main,ylim=c(0,160))
points(tk,nuk,pch=16,col=4,cex=1.5); abline(h=seq(0,160,20),lwd=.5)

# Figure 2
tests = read.csv(file = 'psych_scores.csv')
cum.nuk = cumsum(nuk)
main="Cumulative frequency polygon for the psychology test scores"
plot(tk,cum.nuk,type="l",col=4,xlab="Test Score",ylab="Cumulative Frequency", lwd=2,main=main,ylim=c(0,700))
points(tk,cum.nuk,pch=16,col=4,cex=1.5); abline(h=seq(0,700,100),lwd=.5)

# Figure 3
target = read.csv(file = 'target_size.csv')
bk = seq(400,1100,100) # bin count interval
tk = seq(350,1150,100) # FP "bins" edges
dat = target[[2]] # 1st 20 small 2nd 20 large
nuk1 = c( 0, hist( dat[ 1:20], bk, plot=F )$counts, 0 )
nuk2 = c( 0, hist( dat[21:40], bk, plot=F )$counts, 0 )
main="Overlaid Frequency polygons"
plot(tk,nuk1,type="l",col=2,xlab="Time (msec)",ylab="Frequency",lwd=2,main=main,ylim=c(0,10))
points(tk,nuk1,pch=16,col=2,cex=2); abline(h=seq(0,10,2.5),lwd=.5,lty=2)
lines(tk,nuk2,col=4); points(tk,nuk2,pch=16,cex=2,col=4)
text(1000,4,"small target",cex=1.5)
text(720,8,"large target",cex=1.5)

# Figure 4
target = read.csv(file = 'target_size.csv')
cum.nuk1 = cumsum(nuk1)
cum.nuk2 = cumsum(nuk2)
main="Overlaid cumulative frequency polygons"
plot(tk,cum.nuk1,type="l",col=2,xlab="Time (msec)", ylab="Cumulative Frequency", lwd=2,main=main,ylim=c(0,20))
points(tk,cum.nuk1,pch=16,col=2,cex=2); abline(h=seq(0,20,5))
lines(tk,cum.nuk2,col=4,lwd=2); points(tk,cum.nuk2,pch=16,col=4,cex=2)
text(850,12,"small target",cex=1.5)
text(450,18,"large target",cex=1.5)

Please answer the questions:

feedback

Previous Section | Next Section