*51*

The **Levenshtein distance** between two strings is the minimum number of single-character edits required to turn one word into the other.

The word “edits” includes substitutions, insertions, and deletions.

For example, suppose we have the following two words:

- PARTY
- PARK

The Levenshtein distance between the two words (i.e. the number of edits we have to make to turn one word into the other) would be **2**:

In practice, the Levenshtein distance is used in many different applications including approximate string matching, spell-checking, and natural language processing.

This tutorial explains how to calculate the Levenshtein distance between strings in R by using the stringdist() function from the **stringdist** package in R.

This function uses the following basic syntax:

#load stringdist package library(stringdist) #calculate Levenshtein distance between two strings stringdist("string1", "string2", method = "lv")

Note that this function can calculate many different distance metrics. By specifying method = “lv”, we tell the function to calculate the Levenshtein distance.

**Example 1: Levenshtein Distance Between Two Strings**

The following code shows how to calculate the Levenshtein distance between the two strings “party” and “park” using the **stringdist()** function:

#load stringdist package library(stringdist) #calculate Levenshtein distance between two strings stringdist('party', 'park', method = 'lv') [1] 2

The Levenshtein distance turns out to be **2**.

**Example 2: Levenshtein Distance Between Two Vectors**

The following code shows how to calculate the Levenshtein distance between every pairwise combination of strings in two different vectors:

#load stringdist package library(stringdist) #define vectors a #calculate Levenshtein distance between two vectors stringdist(a, b, method='lv') [1] 6 4 5 5

The way to interpret the output is as follows:

- The Levenshtein distance between ‘Mavs’ and ‘Rockets’ is
**6**. - The Levenshtein distance between ‘Spurs’ and ‘Pacers’ is
**4**. - The Levenshtein distance between ‘Lakers’ and ‘Warriors’ is
**5**. - The Levenshtein distance between ‘Cavs’ and ‘Celtics’ is
**5**.

**Example 3: Levenshtein Distance Between Data Frame Columns**

The following code shows how to calculate the Levenshtein distance between every pairwise combination of strings in two different columns of a data frame:

#load stringdist package library(stringdist) #define data data #calculate Levenshtein distance stringdist(data$a, data$b, method='lv') [1] 6 4 5 5

We could then append the Levenshtein distance as a new column in the data frame if we’d like:

#save Levenshtein distance as vector lev lv') #append Levenshtein distance as new column data$lev #view data frame data a b lev 1 Mavs Rockets 6 2 Spurs Pacers 4 3 Lakers Warriors 5 4 Cavs Celtics 5

**Additional Resources**

How to Calculate Hamming Distance in R

How to Calculate Euclidean Distance in R

How to Calculate Manhattan Distance in R