Pandas DataFrame.drop_duplicates()

by Online Tutorials Library July 14, 2022

Pandas DataFrame.drop_duplicates()

The drop_duplicates() function performs common data cleaning task that deals with duplicate values in the DataFrame. This method helps in removing duplicate values from the DataFrame.

Syntax

Parameters

subset: It takes a column or the list of column labels. It considers only certain columns for identifying duplicates. Default value None.
keep: It is used to control how to consider duplicate values. It has three distinct values that are as follows:
- first: It drops the duplicate values except for the first occurrence.
- last: It drops the duplicate values except for the last occurrence.
- False: It drops all the duplicates.
inplace: Returns the boolean value. Default value is False.

If it is true, it removes the rows with duplicate values.

Return

Depending on the arguments passed, it returns the DataFrame with the removal of duplicate rows.

Example

  import pandas as pd  emp = {“Name”: [“Parker”, “Smith”, “William”, “Parker”],  “Age”: [21, 32, 29, 21]}  info = pd.DataFrame(emp)  print(info)  

Output

        Name     Age  0     Parker     21  1     Smith      32  2     William    29  3     Parker     21

  import pandas as pd  emp = {“Name”: [“Parker”, “Smith”, “William”, “Parker”],  “Age”: [21, 32, 29, 21]}  info = pd.DataFrame(emp)  info = info.drop_duplicates()  print(info)  

Output

       Name    Age  0    Parker    21  1    Smith     32  2    William   29

Next TopicDataFrame.groupby()

Pandas DataFrame.drop_duplicates()

Pandas DataFrame.drop_duplicates()

Syntax

Parameters

Return

Example

Arithmetic in Prolog

Automate Instagram Messages using Python

You may also like