*59*

By default, the **hist()** function in R uses Sturges’ Rule to determine how many bins to use in a histogram.

Sturges’ Rule uses the following formula to determine the optimal number of bins to use in a histogram:

**Optimal Bins = ⌈log _{2}n + 1⌉**

where:

**n:**The total number of observations in the dataset.**⌈ ⌉:**Symbols that mean “ceiling” – i.e. round the answer up to the nearest integer.

For example, if there are 31 observations in a dataset, Sturge’s Rule will use the following formula to determine the optimal number of bins to use in a histogram:

**Optimal Bins** = ⌈log_{2}(31) + 1⌉ = ⌈4.954 + 1⌉ = ⌈5.954⌉ = **6**.

According to Sturges’ Rule, we should use 6 bins in the histogram to visualize this dataset.

If you use the **hist()** function in R, Sturges’ Rule will be used to automatically choose the number of bins to display in the histogram.

hist(data)

Even if you use the **breaks** argument to specify a different number of bins to use, R will only use this as a “suggestion” for how many bins to use.

hist(data, breaks=7)

However, you can use the following code to force R to use a specific number of bins in a histogram:

#create histogram with 7 bins hist(data, breaks = seq(min(data), max(data), length.out = 8))

**Note**: You must use a length of **n+1** for length.out where **n** is your desired number of bins.

The following example shows how to use this code in practice.

**Example: Specify Histogram Breaks in R**

Suppose we have the following dataset in R with 16 values:

#create vector of 16 values data

If we use the **hist()** function, R will create the following histogram with 5 bins:

#create histogram hist(data)

**Note**: R used Sturges’ Rule to determine that 5 bins was the optimal number of bins to use to visualize a dataset with 16 observations.

If we attempt to use the **breaks** argument to specify 7 bins to use in the histogram, R will only take this as a “suggestion” and instead choose to use 10 bins:

#attempt to create histogram with 7 bins hist(data, breaks=7)

However, we can use the following code to force R to use 7 bins in the histogram:

#create histogram with 7 bins hist(data, breaks = seq(min(data), max(data), length.out = 8))

Notice that the result is a histogram with 7 equally-spaced bins.

**Additional Resources**

The following tutorials explain how to perform other common operations in R:

How to Create a Relative Frequency Histogram in R

How to Plot Multiple Histograms in R