Home » Apache Spark Distinct Function

Apache Spark Distinct Function

by Online Tutorials Library

Spark Distinct Function

In Spark, the Distinct function returns the distinct elements from the provided dataset.

Example of Distinct function

In this example, we ignore the duplicate elements and retrieves only the distinct elements.

  • To open the spark in Scala mode, follow the below command.

Spark Distinct Function

  • Create an RDD using parallelized collection.
  • Now, we can read the generated result by using the following command.

Spark Distinct Function

  • Apply distinct() function to ignore duplicate elements.
  • Now, we can read the generated result by using the following command.

Spark Distinct Function

Here, we got the desired output.


You may also like