*21*

**Regression **is a statistical technique we can use to explain the relationship between one or more predictor variables and a response variable. The most common type of regression is linear regression, which we use when the relationship between the predictor variable and the response variable is *linear*.

That is, when the predictor variable increases, the response variable tends to increase as well. For example, we may use a linear regression model to describe the relationship between the number of hours studied (predictor variable) and the score that a student receives on an exam (response variable).

However, sometimes the relationship between a predictor variable and a response variable is *non-linear*. One common type of non-linear relationship is a **quadratic relationship**, which may look like a U or an upside-down U on a graph.

That is, when the predictor variable increases the response variable tends to increase as well, but after a certain point the response variable begins to decrease as the predictor variable keeps increasing.

For example, we may use a quadratic regression model to describe the relationship between the number of hours spent working and a person’s reported happiness levels. Perhaps the more a person works, the more fulfilled they feel, but once they reach a certain threshold, more work actually leads to stress and decreased happiness. In this case, a quadratic regression model would fit the data better than a linear regression model.

Let’s walk through an example of how to perform quadratic regression in Excel.

**Quadratic Regression in Excel**

Suppose we have data on the number of hours worked per week and the reported happiness level (on a scale of 0-100) for 16 different people:

First, let’s create a scatterplot to see if linear regression is an appropriate model to fit to the data.

Highlight cells **A2:B17**. Next, click the INSERT tab along the top ribbon, then click *Scatter *in the *Charts *area. This will produce a scatterplot of the data:

It’s easy to see that the relationship between hours worked and reported happiness is *not *linear. In fact, it follows a “U” shape, which makes it a perfect candidate for **quadratic regression**.

Before we fit the quadratic regression model to the data, we need to create a new column for the squared values of our predictor variable.

First, highlight all of the values in column B and drag them to column C.

Next, type in the formula **=A2^2 **in cell B2. This produces the value **36**. Next, click on the bottom right corner of cell B2 and drag the formula down to fill in the remaining cells in column B.

Next, we will fit the quadratic regression model.

Click on DATA along the top ribbon, then click the *Data Analysis* option on the far right. If you do not see this option, then you first need to install the free Analysis ToolPak.

Once you click *Data Analysis*, a box will pop up. Click *Regression *and then click *OK*.

Next, fill in the following values in the *Regression *box that pops up. Then click *OK*.

The following results will be displayed:

Here is how to interpret various numbers from the output:

**R Square: **Also known as the *coefficient of determination, *this is the proportion of the variance in the response variable that can be explained by the predictor variables. In this example,the R-square is **0.9092**, which indicates that 90.92% of the variance in the reported happiness levels can be explained by the number of hours worked and the number of hours worked ^2.

**Standard error: **The standard error of the regression is the average distance that the observed values fall from the regression line. In this example, the observed values fall an average of** 9.519 units **from the regression line.

**F Statistic****: **The F statistic is calculated as regression MS / residual MS. This statistic indicates whether the regression model provides a better fit to the data than a model that contains no independent variables. In essence, it tests if the regression model as a whole is useful. Generally if none of the predictor variables in the model are statistically significant, the overall F statistic is also not statistically significant. In this example, the F statistic is **65.09 **and the corresponding p-value is

**Regression coefficients: **The regression coefficients in the last table give us the numbers necessary to write the estimated regression equation:

**y _{hat} = b_{0} + b_{1}x_{1} + b_{2}x_{1}^{2}**

In this example, the estimated regression equation is:

**reported happiness level = -30.252 + 7.173(Hours worked) -0.106(Hours worked) ^{2}**

We can use this equation to calculate the expected happiness level of an individual based on their hours worked. For example, the expected happiness level of someone who works 30 hours per week is:

reported happiness level = -30.252 + 7.173(30) -0.106(30)^{2} =** 88.649**.

**Additional Resources**

**How to Add a Quadratic Trendline in Excel How to Read and Interpret a Regression Table**

**What is a Good R-squared Value?**

Understanding the Standard Error of the Regression

A Simple Guide to Understanding the F-Test of Overall Significance in Regression

Understanding the Standard Error of the Regression

A Simple Guide to Understanding the F-Test of Overall Significance in Regression