### A Post By: Gary Ernest Davis

A *five-number summary* of a list of numeric data values gives:

- The median of the data
- The upper and lower quartiles of the data
- The minimum value of the data
- The maximum value of the data

The *median* of a numerical data set is a number *m* such that the number of data points less than *m* is the same as the number of data points greater than *m*. For an odd number of data points, the median is chosen to be the middle data point, while for an even number of data points, the median is usually chosen to be the average of the two data points nearest the middle of the data.

The *lower quartile* splits the data below the median in the same way as the median splits the whole data set, and the *upper quartile* splits the data above the median in the same way.

The five-number summary is implemented in R by using the *summary* function. In the example below I have used the data on birth weights of children of non-smoking mothers, as used in this post on histograms:

> summary(data)

gives the output:

Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |

55 | 113 | 123 | 123 | 134 | 176 |

Note that the “five-number summary” contains the mean as well as the other information (so it’s really a six-number summary!)

A five-number summary can also be calculated using the following R function:

> fivenum(data)

which gives the output:

[1] 55 113 123 134 176

Note that this is indeed a five-number summary, but unlike the *summary* function, *fivenum* does not include headers for the numbers.

Occasionally, these two different commands for calculating five-number summaries will give slighlty different results. The reason is the different ways that the two commands calculate the 1 st and 3 rd quartiles (for which there is no entirely agreed-upon definition by data analysts). For a fuller discussion of this issue see:

Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R, by Eric Cai.

The five-number summary was first used by John Wilder Tukey:

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA.

so it is often called Tukey’s five-number summary.

Prior to Tukey, Arthur Lyon Bowley used a seven number summary, consisting of the deciles in addition to the median, upper and lower quartiles, and the minimum and maximum:

Bowley, A. L. (1915). An elementary manual of statistics. PS King son, Limited.

## Leave a Reply