Home Â» How to Create a Residual Plot by Hand

# How to Create a Residual Plot by Hand

A residual plot is a type of plot that displays the values of a predictor variable in a regression model along the x-axis and the values of the residuals along the y-axis.

This plot is used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity.

The following step-by-step example shows how to create a residual plot for a regression model by hand.

### Step 1: Find the Predicted Values

Suppose we want to fit a regression model to the following dataset:

Using statistical software (like Excel, R, Python, SPSS, etc.) we can find that the fitted regression model is:

y = 10.4486 + 1.3037(x)

We can then use this model to predict the value of y, based on the value of x. For example, if x = 3, then we would predict y to be:

y = 10.4486 + 1.3037(3) = 14.359

We can repeat this process for every observation in our dataset:

### Step 2: Find the Residuals

A residual for a given observation in our dataset is calculated as:

Residual = observed value â€“ predicted value

For example, the residual of the first observation would be calculated as:

Residual = 15 â€“ 14.359 = 0.641

We can repeat this process for every observation in our dataset:

### Step 3: Create the Residual Plot

Lastly, we can create a residual plot by placing the x values along the x-axis and the residual values along the y-axis.

For example, the first point weâ€™ll place in our plot is (3, 0.641)

The next point weâ€™ll place in our plot is (5, 0.033)

Weâ€™ll continue until weâ€™ve placed all 10 pairwise combinations of x values and residual values in the plot:

Any point above zero in the plot represents a positive residual. This means the observed value for y is greater than the value predicted by the regression model.

Any point below zero represents a negative residual. This means the observed value for y is less than the value predicted by the regression model.

Since the points in the plot are randomly scattered around a residual value of 0 with no clear pattern, this indicates that the relationship between x and y is linear and a linear regression model is appropriate to use.

And since the residuals donâ€™t systematically increases or decrease as the predictor variable gets larger, this means heteroskedasticity is not a problem with this regression model.