Home Â» How to Perform a Likelihood Ratio Test in Python

# How to Perform a Likelihood Ratio Test in Python

A likelihood ratio test compares the goodness of fit of two nested regression models.

A nested model is simply one that contains a subset of the predictor variables in the overall regression model.

For example, suppose we have the following regression model with four predictor variables:

Y = Î²0Â + Î²1x1Â + Î²2x2Â + Î²3x3Â + Î²4x4Â + Îµ

One example of a nested model would be the following model with only two of the original predictor variables:

Y = Î²0Â + Î²1x1Â + Î²2x2Â +Â Îµ

To determine if these two models are significantly different, we can perform a likelihood ratio test which uses the following null and alternative hypotheses:

H0: The full model and the nested model fit the data equally well. Thus, you should use the nested model.

HA: The full model fits the data significantly better than the nested model. Thus, you should use the full model.

If the p-value of the test is below a certain significance level (e.g. 0.05), then we can reject the null hypothesis and conclude that the full model offers a significantly better fit.

The following step-by-step example shows how to perform a likelihood ratio test in Python.

### Step 1: Load the Data

In this example, weâ€™ll show how to fit the following two regression models in Python using data from the mtcars dataset:

Full model: mpg = Î²0 + Î²1disp + Î²2carb + Î²3hpÂ + Î²4cyl

Reduced model: mpg = Î²0 + Î²1disp + Î²2carb

First, weâ€™ll load the dataset:

```from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
import pandas as pd
import scipy

#define URL where dataset is located
url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv"

```

Related:Â How to Read CSV Files with Pandas

### Step 2: Fit the Regression Models

First, weâ€™ll fit the full model and calculate the log-likelihood of the model:

```#define response variable
y1 = data['mpg']

#define predictor variables
x1 = data[['disp', 'carb', 'hp', 'cyl']]

#add constant to predictor variables

#fit regression model
full_model = sm.OLS(y1, x1).fit()

#calculate log-likelihood of model
full_ll = full_model.llf

print(full_ll)

-77.55789711787898
```

Then, weâ€™ll fit the reduced model and calculate the log-likelihood of the model:

```#define response variable
y2 = data['mpg']

#define predictor variables
x2 = data[['disp', 'carb']]

#add constant to predictor variables

#fit regression model
reduced_model = sm.OLS(y2, x2).fit()

#calculate log-likelihood of model
reduced_ll = reduced_model.llf

print(reduced_ll)

-78.60301334355185
```

### Step 3: Perform the Log-Likelihood Test

Next, weâ€™ll use the following code to perform the log-likelihood test:

```#calculate likelihood ratio Chi-Squared test statistic
LR_statistic = -2*(reduced_ll-full_ll)

print(LR_statistic)

2.0902324513457415

#calculate p-value of test statistic using 2 degrees of freedom
p_val = scipy.stats.chi2.sf(LR_statistic, 2)

print(p_val)

0.35165094613502257
```

From the output we can see that the Chi-Squared test-statistic is 2.0902Â and the corresponding p-value isÂ 0.3517.

Since this p-value is not less than .05, we will fail to reject the null hypothesis.

This means the full model and the nested model fit the data equally well. Thus, we should use the nested model because the additional predictor variables in the full model donâ€™t offer a significant improvement in fit.

Thus, our final model would be:

mpg = Î²0 + Î²1disp + Î²2carb

Note: We used 2 degrees of freedom when calculating the p-value because this represented the difference between the total predictor variables used between the two models.