Home » Pandas: How to Sort DataFrame Based on String Column

Pandas: How to Sort DataFrame Based on String Column

by Tutor Aspire

You can use the following methods to sort the rows of a pandas DataFrame based on the values in a particular string column:

Method 1: Sort by String Column (when column only contains characters)

df = df.sort_values('my_string_column')

Method 2: Sort by String Column (when column contains characters and digits)

#create 'sort' column that contains digits from 'my_string_column'
df['sort'] = df['my_string_column'].str.extract('(d+)', expand=False).astype(int)

#sort rows based on digits in 'sort' column
df = df.sort_values('sort')

The following examples show how to use each method in practice.

Example 1: Sort by String Column (when column only contains characters)

Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['Apples', 'Oranges', 'Bananas', 'Lettuce', 'Beans'],
                   'sales': [18, 22, 19, 14, 29]})

#view DataFrame
print(df)

   product  sales
0   Apples     18
1  Oranges     22
2  Bananas     19
3  Lettuce     14
4    Beans     29

We can use the following syntax to sort the rows of the DataFrame based on the strings in the product column:

#sort rows from A to Z based on string in 'product' column
df = df.sort_values('product')

#view updated DataFrame
print(df)

   product  sales
0   Apples     18
2  Bananas     19
4    Beans     29
3  Lettuce     14
1  Oranges     22

Notice that the rows are now sorted from A to Z based on the strings in the product column.

If you’d like to instead sort from Z to A, simply add the argument ascending=False:

#sort rows from Z to A based on string in 'product' column
df = df.sort_values('product', ascending=False)

#view updated DataFrame
print(df)

   product  sales
1  Oranges     22
3  Lettuce     14
4    Beans     29
2  Bananas     19
0   Apples     18

Notice that the rows are now sorted from Z to A based on the strings in the product column.

Example 2: Sort by String Column (when column contains characters and digits)

Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['A3', 'A5', 'A22', 'A50', 'A2', 'A7', 'A9', 'A13'],
                   'sales': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  product  sales
0      A3     18
1      A5     22
2     A22     19
3     A50     14
4      A2     14
5      A7     11
6      A9     20
7     A13     28

Notice that the strings in the product column contain both characters and digits.

If we attempt to sort the rows of the DataFrame using the values in the product column, the strings will not be sorted in the correct order based on the digits:

import pandas as pd

#sort rows based on strings in 'product' column
df = df.sort_values('product')

#view updated DataFrame
print(df)

  product  sales
7     A13     28
4      A2     14
2     A22     19
0      A3     18
1      A5     22
3     A50     14
5      A7     11
6      A9     20

Instead, we must create a new temporary column called sort that contains only the digits from the product column, then sort by the values in the sort column, then drop the column entirely:

import pandas as pd

#create new 'sort' column that contains digits from 'product' column
df['sort'] = df['product'].str.extract('(d+)', expand=False).astype(int)

#sort rows based on digits in 'sort' column
df = df.sort_values('sort')

#drop 'sort' column
df = df.drop('sort', axis=1)

#view updated DataFrame
print(df)

  product  sales
4      A2     14
0      A3     18
1      A5     22
5      A7     11
6      A9     20
7     A13     28
2     A22     19
3     A50     14

Notice that the rows are now sorted by the strings in the product column and the digits are sorted in the correct order.

Note: You can find the complete documentation for the pandas sort_values() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Sort by Date
Pandas: How to Sort Columns by Name
Pandas: How to Sort by Both Index and Column

You may also like