*52*

A residual is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

One way to understand how well a regression model fits a dataset is to calculate the **residual sum of squares**, which is calculated as:

Residual sum of squares = Σ(e_{i})^{2}

where:

**Σ**: A Greek symbol that means “sum”**e**: The i_{i}^{th}residual

The lower the value, the better a model fits a dataset.

This tutorial provides a step-by-step example of how to calculate the residual sum of squares for a regression model in Python.

**Step 1: Enter the Data**

For this example we’ll enter data for the number of hours spent studying, total prep exams taken, and exam score received by 14 different students:

import pandas as pd #create DataFrame df = pd.DataFrame({'hours': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4, 3, 6, 5], 'exams': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4, 3, 2, 4], 'score': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90, 75, 96, 90]})

**Step 2: Fit the Regression Model**

Next, we’ll use the OLS() function from the statsmodels library to perform ordinary least squares regression, using “hours” and “exams” as the predictor variables and “score” as the response variable:

**import statsmodels.api as sm
#define response variable
y = df['score']
#define predictor variables
x = df[['hours', 'exams']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
#view model summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: score R-squared: 0.722
Model: OLS Adj. R-squared: 0.671
Method: Least Squares F-statistic: 14.27
Date: Sat, 02 Jan 2021 Prob (F-statistic): 0.000878
Time: 15:58:35 Log-Likelihood: -41.159
No. Observations: 14 AIC: 88.32
Df Residuals: 11 BIC: 90.24
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 71.8144 3.680 19.517 0.000 63.716 79.913
hours 5.0318 0.942 5.339 0.000 2.958 7.106
exams -1.3186 1.063 -1.240 0.241 -3.658 1.021
==============================================================================
Omnibus: 0.976 Durbin-Watson: 1.270
Prob(Omnibus): 0.614 Jarque-Bera (JB): 0.757
Skew: -0.245 Prob(JB): 0.685
Kurtosis: 1.971 Cond. No. 12.1
==============================================================================
**

**Step 3: Calculate the Residual Sum of Squares**

We can use the following code to calculate the residual sum of squares for the model:

**print(model.ssr)
293.25612951525414
**

The residual sum of squares turns out to be **293.256**.

**Additional Resources**

How to Perform Simple Linear Regression in Python

How to Perform Multiple Linear Regression in Python

Residual Sum of Squares Calculator