Home Â» How to Calculate Partial Correlation in Python

# How to Calculate Partial Correlation in Python

In statistics, we often use theÂ Pearson correlation coefficientÂ to measure the linear relationship between two variables.Â However, sometimes weâ€™re interested in understanding the relationship between two variablesÂ while controlling for a third variable.

For example, suppose we want to measure the association between the number of hours a student studies and the final exam score they receive, while controlling for the studentâ€™s current grade in the class. In this case, we could use aÂ partial correlationÂ to measure the relationship between hours studied and final exam score.

This tutorial explains how to calculate partial correlation in Python.

### Example: Partial Correlation in Python

Suppose we have the following Pandas DataFrame that displays the current grade, total hours studied, and final exam score for 10 students:

```import numpy as np
import panda as pd

data = {'currentGrade':  [82, 88, 75, 74, 93, 97, 83, 90, 90, 80],
'hours': [4, 3, 6, 5, 4, 5, 8, 7, 4, 6],
'examScore': [88, 85, 76, 70, 92, 94, 89, 85, 90, 93],
}

df = pd.DataFrame(data, columns = ['currentGrade','hours', 'examScore'])
df

0            82      4         88
1            88      3         85
2            75      6         76
3            74      5         70
4            93      4         92
5            97      5         94
6            83      8         89
7            90      7         85
8            90      4         90
9            80      6         93
```

To calculate the partial correlation betweenÂ hoursÂ andÂ examScore while controlling forÂ currentGrade, we can use theÂ partial_corr()Â function from theÂ pingouin package, which uses the following syntax:

partial_corr(data, x, y, covar)

where:

• data: name of the dataframe
• x, y: names of columns in the dataframe
• covar: the name of the covariate column in the dataframe (e.g. the variable youâ€™re controlling for)

Here is how to use this function in this particular example:

```#install and import pingouin package
pip install pingouin
import pingouin as pg

#find partial correlation between hours and exam score while controlling for grade

n	    r	       CI95%	   r2	adj_r2	p-val	 BF10	power
pearson	10	0.191	[-0.5, 0.73]	0.036	-0.238	0.598	0.438	0.082
```

We can see that the partial correlation between hours studied and final exam score isÂ .191, which is a small positive correlation. As hours studied increases, exam score tends to increase as well, assuming current grade is held constant.

To calculate the partial correlation between multiple variables at once, we can use theÂ .pcorr()Â function:

```#calculate all pairwise partial correlations, rounded to three decimal places
df.pcorr().round(3)