*52*

**Correlation** and **regression** are two terms in statistics that are related, but not quite the same.

In this tutorial, we’ll provide a brief explanation of both terms and explain how they’re similar and different.

**What is Correlation?**

**Correlation** measures the linear association between two variables, *x* and *y*. It has a value between -1 and 1 where:

- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables

For example, suppose we have the following dataset that contains two variables: (1) Hours studied and (2) Exam Score received for 20 different students:

If we created a scatterplot of hours studied vs. exam score, here’s what it would look like:

Just from looking at the plot, we can tell that students who study more tend to earn higher exam scores. In other words, we can visually see that there is a **positive correlation** between the two variables.

Using a calculator, we can find that the correlation between these two variables is r = **0.915**. Since this value is close to 1, it confirms that there is a strong positive correlation between the two variables.

**What is Regression?**

**Regression **is a method we can use to understand how changing the values of the *x* variable affect the values of the *y* variable.

A regression model uses one variable, *x*, as the predictor variable, and the other variable, *y*, as the response variable. It then finds an equation with the following form that best describes the relationship between the two variables:

**ŷ = b _{0} + b_{1}x**

where:

**ŷ:**The predicted value of the response variable**b**The y-intercept (the value of y when x is equal to zero)_{0}:**b**The regression coefficient (the average increase in y for a one unit increase in x)_{1}:**x:**The value of the predictor variable

For example, consider our dataset from earlier:

Using a linear regression calculator, we find that the following equation best describes the relationship between these two variables:

Predicted exam score = 65.47 + 2.58*(hours studied)

The way to interpret this equation is as follows:

- The predicted exam score for a student who studies zero hours is
**65.47**. - The average increase in exam score associated with one additional hour studied is
**2.58**.

We can also use this equation to predict the score that a student will receive based on the number of hours studied.

For example, a student who studies 6 hours is expected to receive a score of **80.95**:

Predicted exam score = 65.47 + 2.58*(6) = **80.95**.

We can also plot this equation as a line on a scatterplot:

We can see that the regression line “fits” the data quite well.

Recall earlier that the correlation between these two variables was r = **0.915**. It turns out that we can square this value and get a number called “r-squared” that describes the total proportion of variance in the response variable that can be explained by the predictor variable.

In this example, r^{2} = 0.915^{2 }= **0.837**. This means that 83.7% of the variation in exam scores can be explained by the number of hours studied.

**Correlation vs. Regression: Similarities & Differences**

Here is a summary of the similarities and differences between correlation and regression:

**Similarities:**

- Both quantify the direction of a relationship between two variables.
- Both quantify the strength of a relationship between two variables.

**Differences:**

- Regression is able to show a cause-and-effect relationship between two variables. Correlation does not do this.
- Regression is able to use an equation to predict the value of one variable, based on the value of another variable. Correlation does not does this.
- Regression uses an equation to quantify the relationship between two variables. Correlation uses a single number.

**Additional Resources**

The following tutorials offer more in-depth explanations of topics covered in this post.

An Introduction to the Pearson Correlation Coefficient

An Introduction to Simple Linear Regression

Simple Linear Regression Calculator

What is a Good R-squared Value?