Home Â» How to Perform Multivariate Normality Tests in Python

How to Perform Multivariate Normality Tests in Python

When weâ€™d like to test whether or not a single variable is normally distributed, we can create aÂ Q-Q plotÂ to visualize the distribution or we can perform a formal statistical test like anÂ Anderson Darling TestÂ or aÂ Jarque-Bera Test.

However, when weâ€™d like to test whether or notÂ severalÂ variables are normally distributed as a group we must perform aÂ multivariate normality test.

This tutorial explains how to perform the Henze-Zirkler multivariate normality test for a given dataset in Python.

Related:Â If weâ€™d like to identify outliers in a multivariate setting, we can use theÂ Mahalanobis distance.

Example: Henze-Zirkler Multivariate Normality Test in Python

The Henze-Zirkler Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution. The null and alternative hypotheses for the test are as follows:

H0Â (null): The variables follow a multivariate normal distribution.

HaÂ (alternative): The variablesÂ do notÂ follow a multivariate normal distribution.

To perform this test in Python we can use the multivariate_normality() function from the pingouin library.

First, we need to install pingouin:

```pip install pingouin
```

Next, we can import theÂ multivariate_normality()Â function and use it to perform a Multivariate Test for Normality for a given dataset:

```#import necessary packages
from pingouin import multivariate_normality
import pandas as pd
import numpy as np

#create a dataset with three variables x1, x2, and x3
df = pd.DataFrame({'x1':np.random.normal(size=50),
'x2': np.random.normal(size=50),
'x3': np.random.normal(size=50)})

#perform the Henze-Zirkler Multivariate Normality Test
multivariate_normality(df, alpha=.05)

HZResults(hz=0.5956866563391165, pval=0.6461804077893423, normal=True)```

The results of the test are as follows:

• H-Z Test Statistic:Â 0.59569
• p-value:Â 0.64618

Since the p-value of the test is not less than our specified alpha value of .05, we fail to reject the null hypothesis. The dataset can be assumed to follow a multivariate normal distribution.

Related: Learn how the Henze-Zirkler test is used in real-life medical applications in this research paper.