Home Â» How to Perform a Breusch-Pagan Test in Python

# How to Perform a Breusch-Pagan Test in Python

In regression analysis,Â heteroscedasticityÂ refers to the unequal scatter of residuals. Specifically, it refers to the case where there is a systematic change in the spread of the residuals over the range of measured values.

Heteroscedasticity is a problem because ordinary least squares (OLS) regression assumes that the residuals come from a population that hasÂ homoscedasticity, which means constant variance.

When heteroscedasticity is present in a regression analysis, the results of the analysis become hard to trust.

One way to determine ifÂ heteroscedasticity is present in a regression analysis is to use aÂ Breusch-Pagan Test.

This tutorial explains how to perform a Breusch-Pagan Test in Python.

## Example: Breusch-Pagan Test in Python

For this example weâ€™llÂ use the following dataset that describes the attributes of 10 basketball players:

import numpy as np
import pandas as pd

#create dataset
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#view dataset
df

rating	points	assists	rebounds
0	90	25	5	11
1	85	20	7	8
2	82	14	7	10
3	88	16	8	6
4	94	27	5	6
5	90	20	7	9
6	76	12	6	6
7	75	15	9	10
8	87	14	9	10
9	86	19	5	7

We will fit a multiple linear regression model using rating as the response variable and points, assists, and rebounds as the explanatory variables. Then we will perform a Breusch-Pagan Test to determine ifÂ heteroscedasticity is present in the regression.

Step 1: Fit a multiple linear regression model.

First, weâ€™ll fit a multiple linear regression model:

import statsmodels.formula.api as smf

#fit regression model
fit = smf.ols('rating ~ points+assists+rebounds', data=df).fit()

#view model summary
print(fit.summary())

Step 2: Perform a Breusch-Pagan test.

Next, weâ€™ll perform a Breusch-Pagan test to determine ifÂ heteroscedasticity is present.

from statsmodels.compat import lzip
import statsmodels.stats.api as sms

#perform Bresuch-Pagan test
names = ['Lagrange multiplier statistic', 'p-value',
'f-value', 'f p-value']
test = sms.het_breuschpagan(fit.resid, fit.model.exog)

lzip(names, test)

[('Lagrange multiplier statistic', 6.003951995818433),
('p-value', 0.11141811013399583),
('f-value', 3.004944880309618),
('f p-value', 0.11663863538255281)]

A Breusch-Pagan test uses the following null and alternative hypotheses:

The null hypothesis (H0):Â Homoscedasticity is present.

The alternative hypothesis: (Ha):Â Homoscedasticity is notÂ present (i.e. heteroscedasticity exists)

In this example, the Lagrange multiplier statistic for the test isÂ 6.004Â and the corresponding p-value isÂ 0.1114.Â Because this p-value is not less than 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

## How to Fix Heteroscedasticity

In the previous example we saw thatÂ heteroscedasticity was not present in the regression model.

However, whenÂ heteroscedasticity actually is present there are three common ways to remedy the situation:

1.Â Transform the dependent variable.Â One way to fixÂ heteroscedasticity is to transform the dependent variable in some way. One common transformation is to simply take the log of the dependent variable.

2. Redefine the dependent variable.Â Another way to fixÂ heteroscedasticity is to redefine the dependent variable. One common way to do so is to use aÂ rateÂ for the dependent variable, rather than the raw value.

3. Use weighted regression.Â Another way to fixÂ heteroscedasticity is to use weighted regression. This type of regression assigns a weight to each data point based on the variance of its fitted value. When the proper weights are used, this can eliminate the problem of heteroscedasticity.

Read more details about each of these three methods in this post.