*61*

A **likelihood ratio test** compares the goodness of fit of two nested regression models.

A nested model is simply one that contains a subset of the predictor variables in the overall regression model.

For example, suppose we have the following regression model with four predictor variables:

Y = Î²_{0}Â + Î²_{1}x_{1}Â + Î²_{2}x_{2}Â + Î²_{3}x_{3}Â + Î²_{4}x_{4}Â + Îµ

One example of a nested model would be the following model with only two of the original predictor variables:

Y = Î²_{0}Â + Î²_{1}x_{1}Â + Î²_{2}x_{2}Â +Â Îµ

To determine if these two models are significantly different, we can perform a likelihood ratio test which uses the following null and alternative hypotheses:

**H _{0}:** The full model and the nested model fit the data equally well. Thus, you should

**use the nested model**.

**H _{A}:** The full model fits the data significantly better than the nested model. Thus, you should

**use the full model**.

If the p-value of the test is below a certain significance level (e.g. 0.05), then we can reject the null hypothesis and conclude that the full model offers a significantly better fit.

The following step-by-step example shows how to perform a likelihood ratio test in Python.

**Step 1: Load the Data**

In this example, weâ€™ll show how to fit the following two regression models in Python using data from the **mtcars** dataset:

**Full model:** mpg = Î²_{0} + Î²_{1}disp + Î²_{2}carb + Î²_{3}hpÂ + Î²_{4}cyl

**Reduced model:** mpg = Î²_{0} + Î²_{1}disp + Î²_{2}carb

First, weâ€™ll load the dataset:

from sklearn.linear_model import LinearRegression import statsmodels.api as sm import pandas as pd import scipy #define URL where dataset is located url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv" #read in data data = pd.read_csv(url)

**Related:**Â How to Read CSV Files with Pandas

**Step 2: Fit the Regression Models**

First, weâ€™ll fit the full model and calculate the log-likelihood of the model:

#define response variable y1 = data['mpg'] #define predictor variables x1 = data[['disp', 'carb', 'hp', 'cyl']] #add constant to predictor variables x1 = sm.add_constant(x1) #fit regression model full_model = sm.OLS(y1, x1).fit() #calculate log-likelihood of model full_ll = full_model.llf print(full_ll) -77.55789711787898

Then, weâ€™ll fit the reduced model and calculate the log-likelihood of the model:

#define response variable y2 = data['mpg'] #define predictor variables x2 = data[['disp', 'carb']] #add constant to predictor variables x2 = sm.add_constant(x2) #fit regression model reduced_model = sm.OLS(y2, x2).fit() #calculate log-likelihood of model reduced_ll = reduced_model.llf print(reduced_ll) -78.60301334355185

**Step 3: Perform the Log-Likelihood Test**

Next, weâ€™ll use the following code to perform the log-likelihood test:

**#calculate likelihood ratio Chi-Squared test statistic
LR_statistic = -2*(reduced_ll-full_ll)
print(LR_statistic)
2.0902324513457415
#calculate p-value of test statistic using 2 degrees of freedom
p_val = scipy.stats.chi2.sf(LR_statistic, 2)
print(p_val)
0.35165094613502257
**

From the output we can see that the Chi-Squared test-statistic is **2.0902**Â and the corresponding p-value isÂ **0.3517**.

Since this p-value is not less than .05, we will fail to reject the null hypothesis.

This means the full model and the nested model fit the data equally well. Thus, we should use the nested model because the additional predictor variables in the full model donâ€™t offer a significant improvement in fit.

Thus, our final model would be:

mpg = Î²_{0} + Î²_{1}disp + Î²_{2}carb

**Note**: We used 2 degrees of freedom when calculating the p-value because this represented the difference between the total predictor variables used between the two models.

**Additional Resources**

The following tutorials provide additional information about how to use regression models in Python:

A Complete Guide to Linear Regression in Python

How to Perform Polynomial Regression in Python

How to Perform Logistic Regression in Python