121
Apache Pig DISTINCT Operator
The Apache Pig DISTINCT operator is used to remove duplicate tuples in a relation. Initially, Pig sorts the given data and then eliminates duplicates.
Example of DISTINCT Operator
In this example, we eliminate the duplicate tuples.
Steps to execute DISTINCT Operator
- Create a text file in your local machine and provide some values to it.
- Check the values written in the text files.
- Upload the text files on HDFS in the specific directory.
- Open the pig MapReduce run mode.
- Load the file that contains the data.
- Now, execute and verify the data.
- Let’s execute DISTINCT operator to eliminate duplicate tuples.
- Now, execute and verify the data.
Here, we got the desired output.
Next TopicFILTER Operator