Home » PySpark Sparkxconf

PySpark Sparkxconf

by Online Tutorials Library

SparkConf

What is SparkConf?

The SparkConf offers configuration for any Spark application. To start any Spark application on a local Cluster or a dataset, we need to set some configuration and parameters, and it can be done using SparkConf.

Features of Sparkconf and their usage

The most commonly used features of the Sparkconf when working with PySpark is given below:

  • set(key, value)-
  • setMastervalue(value) –
  • setAppName(value)-
  • get(key,defaultValue=None) –
  • setSparkHome(value) –

Consider the following example to understand some attributes of SparkConf:

Output:

'PySpark Demo App' 

The initial thing any spark program does is creating a SparkContext object which tells the application how to access a cluster. To accomplish the task, you need to implement SparkConf so that the SparkContext object contains the configuration information about the application. Below we are describing the SparkContext in detail:

SparkContext

What is SparkContext?

The SparkContext is the first and essential thing that gets initiated when we run any Spark application. The most important step of any Spark driver application is to generate SparkContext. It is an entry gate for any spark derived application or functionality. It is available as sc by default in Pyspark.

Note: You need to remember that creating the other variable instead of sc will give an error.

Parameters:

SparkContext accepts the following parameter that we have described below:

Master

The URL of the cluster connects to Spark.

appName

The name of your task.

SparkHome

SparkHome is a Spark installation directory.

pyFiles

.zip or .py files are send to the cluster and then added to the PYTHONPATH.

Environment

It represents the worker nodes environment variables.

BatchSize

The number of Python object represents the BatchSize. If you want to disable the batching, set it to 1. It automatically chooses the batch size based on object size 0, set 1 for unlimited batch size.

Serializer

It represents the Serializer, an RDD.

Conf

It set all the spark properties. An object of L {SparkConf} is there.

profiler_cls

It is a class of custom profile which is used to do the profiling, although make sure the pyspark.profiler.BasicProfiler is the default one.

The Master and Appname are the most widely used parameter among the parameters. The following are the initial code for any PySpark application.


Next TopicPySpark SQL

You may also like