*53*

We often use three different sum of squares values to measure how well a regression line actually fits a dataset:

**1. Sum of Squares Total (SST) â€“Â **The sum of squared differences between individual data points (y_{i}) and the mean of the response variable (y).

- SST = Î£(y
_{i}â€“ y)^{2}

**2. Sum of Squares Regression (SSR)** â€“ The sum of squared differences between predicted data points (Å·_{i}) and the mean of the response variable(y).

- SSR = Î£(Å·
_{i}â€“ y)^{2}

**3. Sum of Squares Error (SSE)** â€“ The sum of squared differences between predicted data points (Å·_{i}) and observed data points (y_{i}).

- SSE = Î£(Å·
_{i}â€“ y_{i})^{2}

The following step-by-step example shows how to calculate each of these metrics for a given regression model in R.

**Step 1: Create the Data**

First, letâ€™s create a dataset that contains the number of hours studied and exam score received for 20 different students at a certain college:

#create data frame df frame(hours=c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 7, 7, 8), score=c(68, 76, 74, 80, 76, 78, 81, 84, 86, 83, 88, 85, 89, 94, 93, 94, 96, 89, 92, 97)) #view first six rows of data frame head(df) hours score 1 1 68 2 1 76 3 1 74 4 2 80 5 2 76 6 2 78

**Step 2: Fit a Regression Model**

Next, weâ€™ll use theÂ **lm()** function to fit a simple linear regression model using score as the response variable and hours as the predictor variable:

#fit regression model model #view model summary summary(model) Call: lm(formula = score ~ hours, data = df) Residuals: Min 1Q Median 3Q Max -8.6970 -2.5156 -0.0737 3.1100 7.5495 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 73.4459 1.9147 38.360

**Step 3: Calculate SST, SSR, and SSE**

We can use the following syntax to calculate SST, SSR, and SSE:

#find sse sse sum((fitted(model) - df$score)^2) sse [1] 331.0749 #find ssr ssr sum((fitted(model) - mean(df$score))^2) ssr [1] 917.4751 #find sst sst

The metrics turn out to be:

**Sum of Squares Total (SST):**1248.55**Sum of Squares Regression (SSR):**917.4751**Sum of Squares Error (SSE):**331.0749

We can verify that SST = SSR + SSE:

- SST = SSR + SSE
- 1248.55 = 917.4751 + 331.0749

We can also manually calculate the R-squared of the regression model:

- R-squared = SSR / SST
- R-squared = 917.4751 / 1248.55
- R-squared = 0.7348

This tells us thatÂ **73.48%** of the variation in exam scores can be explained by the number of hours studied.

**Additional Resources**

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line: