RDD Persistence Spark provides a convenient way to work on the dataset by persisting it in memory across operations. While persisting an…
spark components
-
-
Spark Architecture The Spark follows the master-slave architecture. Its cluster consists of a single master and multiple slaves. The Spark architecture depends…
-
RDD Shared Variables In Spark, when any function passed to a transformation operation, then it is executed on a remote cluster node.…
-
Spark Cartesian Function In Spark, the Cartesian function generates a Cartesian product of two datasets and returns all the possible combination of…
-
What is RDD? The RDD (Resilient Distributed Dataset) is the Spark’s core abstraction. It is a collection of elements, partitioned across the…
-
Spark reduceByKey Function In Spark, the reduceByKey function is a frequently used transformation operation that performs aggregation of data. It receives key-value…
-
Spark cogroup Function In Spark, the cogroup function performs on different datasets, let’s say, (K, V) and (K, W) and returns a…
-
Spark sortByKey Function In Spark, the sortByKey function maintains the order of elements. It receives key-value pairs (K, V) as an input,…
-
Spark Components The Spark project consists of different types of tightly integrated components. At its core, Spark is a computational engine that…
-
Spark Take Function In Spark, the take function behaves like an array. It receives an integer value (let say, n) as a…