# How to Perform One-Hot Encoding in R

One-hot encoding is used to convert categorical variables into a format that can be used by machine learning algorithms.

The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to represent the original categorical values.

For example, the following image shows how we would perform one-hot encoding to convert a categorical variable that contains team names into new variables that contain only 0 and 1 values:

The following step-by-step example shows how to perform one-hot encoding for this exact dataset in R.

### Step 1: Create the Data

First, letâ€™s create the following data frame in R:

```#create data frame
df frame(team=c('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'),
points=c(25, 12, 15, 14, 19, 23, 25, 29))

#view data frame
df

team points
1    A     25
2    A     12
3    B     15
4    B     14
5    B     19
6    B     23
7    C     25
8    C     29
```

### Step 2: Perform One-Hot Encoding

Next, letâ€™s use the dummyVars() function from the caret package to perform one-hot encoding on the â€˜teamâ€™ variable in the data frame:

```library(caret)

#define one-hot encoding function
dummy  ~ .", data=df)

#perform one-hot encoding on data frame
final_df frame(predict(dummy, newdata=df))

#view final data frame
final_df

teamA teamB teamC points
1     1     0     0     25
2     1     0     0     12
3     0     1     0     15
4     0     1     0     14
5     0     1     0     19
6     0     1     0     23
7     0     0     1     25
8     0     0     1     29 ```

Notice that three new columns were added to the data frame since the original â€˜teamâ€™ column contained three unique values.

Also notice that the original â€˜teamâ€™ column was dropped from the data frame since itâ€™s no longer needed.

The one-hot encoding is complete and we can now feed this dataset into any machine learning algorithm that weâ€™d like.

Note: You can find the complete online documentation for the dummyVars() function here.