*53*

Quadratic discriminant analysisÂ is a method you can use when you have a set of predictor variables and youâ€™d like to classify aÂ response variable into two or more classes.

It is considered to be the non-linear equivalent toÂ linear discriminant analysis.

This tutorial provides a step-by-step example of how to perform quadratic discriminant analysis in Python.

**Step 1: Load Necessary Libraries**

First, weâ€™ll load the necessary functions and libraries for this example:

**from sklearn.model_selection import train_test_split
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np**

**Step 2: Load the Data**

For this example, weâ€™ll use the **iris** dataset from the sklearn library. The following code shows how to load this dataset and convert it to a pandas DataFrame to make it easy to work with:

#loadirisdataset iris = datasets.load_iris() #convert dataset to pandas DataFrame df = pd.DataFrame(data = np.c_[iris['data'], iris['target']], columns = iris['feature_names'] + ['target']) df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species'] #view first six rows of DataFrame df.head() s_length s_width p_length p_width target species 0 5.1 3.5 1.4 0.2 0.0 setosa 1 4.9 3.0 1.4 0.2 0.0 setosa 2 4.7 3.2 1.3 0.2 0.0 setosa 3 4.6 3.1 1.5 0.2 0.0 setosa 4 5.0 3.6 1.4 0.2 0.0 setosa #find how many total observations are in dataset len(df.index) 150

We can see that the dataset contains 150 total observations.

For this example weâ€™ll build a quadratic discriminant analysis model to classify which species a given flower belongs to.

Weâ€™ll use the following predictor variables in the model:

- Sepal length
- Sepal width
- Petal length
- Petal width

And weâ€™ll use them to predict the response variable *Species*, which takes on the following three potential classes:

- setosa
- versicolor
- virginica

**Step 3: Fit the QDA Model**

Next, weâ€™ll fit the QDA model to our data using the QuadraticDiscriminantAnalsyis function from sklearn:

#define predictor and response variables X = df[['s_length', 's_width', 'p_length', 'p_width']] y = df['species'] #Fit the QDA model model = QuadraticDiscriminantAnalysis() model.fit(X, y)

**Step 4: Use the Model to Make Predictions**

Once weâ€™ve fit the model using our data, we can evaluate how well the model performed by using repeated stratified k-fold cross validation.

For this example, weâ€™ll use 10 folds and 3 repeats:

#Define method to evaluate model cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) #evaluate model scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1) print(np.mean(scores)) 0.97333333333334

We can see that the model performed a mean accuracy ofÂ **97.33%**.

We can also use the model to predict which class a new flower belongs to, based on input values:

#define new observation new = [5, 3, 1, .4] #predict which class the new observation belongs to model.predict([new]) array(['setosa'], dtype='

We can see that the model predicts this new observation to belong to the species calledÂ *setosa*.

You can find the complete Python code used in this tutorial here.