*63*

One of the key assumptions in linear regression is that there is no correlation between the residuals, e.g. the residuals are independent.

To test for first-order autocorrelation, we can perform a Durbin-Watson test. However, if weâ€™d like to test for autocorrelation at higher orders then we need to perform a **Breusch-Godfrey test**.

This test uses the following hypotheses:

**H _{0} (null hypothesis): **There is no autocorrelation at any order less than or equal to

*p*.

**H _{A} (alternative hypothesis): **There exists autocorrelation at some order less than or equal to

*p*.

The test statistic follows a Chi-Square distribution withÂ *p* degrees of freedom.

If the p-value that corresponds to this test statistic is less than a certain significance level (e.g. 0.05) then we can reject the null hypothesis and conclude that autocorrelation exists among the residuals at some order less than or equal to *p*.

To perform a Breusch-Godfrey test in Python, we can use the **acorr_breusch_godfrey()** function from the **statsmodels **library.

The following step-by-step example explains how to perform the Breusch-Godfrey test in Python.

**Step 1: Create the Data**

First, letâ€™s create a dataset that contains two predictor variables (x1 and x2) and one response variable (y).

import pandas as pd #create dataset df = pd.DataFrame({'x1': [3, 4, 4, 5, 8, 9, 11, 13, 14, 16, 17, 20], 'x2': [7, 7, 8, 8, 12, 4, 5, 15, 9, 17, 19, 19], 'y': [24, 25, 25, 27, 29, 31, 34, 34, 39, 30, 40, 49]}) #view first five rows of dataset df.head() x1 x2 y 0 3 7 24 1 4 7 25 2 4 8 25 3 5 8 27 4 8 12 29

**Step 2: Fit a Regression Model**

Next, we can fit a multiple linear regression model using x1 and x2 as predictor variables and y as the response variable.

**import statsmodels.api as sm
#define response variable
y = df['y']
#define predictor variables
x = df[['x1', 'x2']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
**

**Step 3: Perform the Breusch-Godfrey Test**

Next, weâ€™ll perform the Breusch-Godfrey test to test for autocorrelation among the residuals at order *p*. For this example, weâ€™ll choose *p* = 3.

**import statsmodels.stats.diagnostic as dg
#perform Breusch-Godfrey test at order ***p* = 3
print(dg.acorr_breusch_godfrey(model, nlags=3))
(8.70314827, 0.0335094873, 5.27967224, 0.0403980576)

The first value in the output represents the test statistic and the second value represents the corresponding p-value.

From the output we can see the following:

- Test statistic X
^{2}=**8.7031** - P-value =
**0.0335**

Since this p-value is less than 0.05, we can reject the null hypothesis and conclude that autocorrelation exists among the residuals at some order less than or equal to 3.

**How to Handle Autocorrelation**

If you reject the null hypothesis and conclude that autocorrelation is present in the residuals, then you have a few different options to correct this problem if you deem it to be serious enough:

- For positive serial correlation, consider adding lags of the dependent and/or independent variable to the model.
- For negative serial correlation, check to make sure that none of your variables areÂ
*overdifferenced*. - For seasonal correlation, consider adding seasonal dummy variables to the model.

**Additional Resources**

A Complete Guide to Linear Regression in Python

How to Perform a Durbin-Watson Test in Python

How to Perform a Ljung-Box Test in Python