KNOWLEDGEBASE - ARTICLE #1660

Prism's algorithm for determining the automatic bin width when creating a frequency distribution.

When creating a frequency distribution histogram, you need to decide on a bin width. The number of bins in the histogram is determined by the range of the data and the bin width. Set the bin width too small, and you get lots of bins and the frequency distribution looks like a broken comb, which doesn't really show you how the data are distributed. Set the bin width too large, and you get too few bins, and the distribution has too little information to be useful.

The only way out of this problem is to plot a cumulative frequency distribution. A cumulative distribution can be plotted (in Prism, and other programs) without any bins. The cumulative frequency jumps upwards with each value in the data set.

But for ordinary frequency distributions, a decision must be made about the number of bins. The only general rule is that the ideal number of bins has something to do with the size of the data set. If the frequency distribution tabulates the frequency of a huge number of values, it makes sense to use a small bin width and so create lots of bins. If the frequency distribution is for a small data set, a larger bin width makes sense. Many algorithms have been devised to define the ideal bin with.

Prism uses this algorithm to determine the default bin width (which you can easily override).

For each data set on the data table, compute:

DataSetBinCount = 1 + Log2(DataSetPointCount) [Log2 is the logarithm to base 2]

DataSetBinWidth = (DataSetMaxValue - DataSetMinValue) / DataSetBinCount

Average that computed bin width for all data sets:

BinWidth = Mean of all DataSetBinWidth

Round that value (BinWidth) down to closest 'nice-lloking' value i.e. value whose last digit (in scientific notation) ends with 0, 2, or 5
If all the values in the original data set are integers, then round that value (BinWidth) down to the nearest integer (unless that step would make a bin width of zero; in that case, set the bin width to 1).

The first step is based on Sturges (1). We extended that method as shown above.

Reference:

Sturges, H. A. (1926). "The choice of a class interval". J. American Statistical Association: 65–66.

Prism's algorithm for determining the automatic bin width when creating a frequency distribution.

Explore the Knowledgebase