Is it possible to get the current spark context settings in PySpark?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Is it possible to get the current spark context settings in PySpark?



I'm trying to get the path to spark.worker.dir for the current sparkcontext.


spark.worker.dir


sparkcontext



If I explicitly set it as a config param, I can read it back out of SparkConf, but is there anyway to access the complete config (including all defaults) using PySpark?


config param


SparkConf


config


PySpark





No - you can get the conf object but not the things you'd looking for. Defaults are not available through SparkConf (they're hardcoded in the sources). And spark.worker.dir sounds like a configuration for the Worker daemon, not something your app would see.
– vanza
Jun 1 '15 at 3:34


SparkConf


spark.worker.dir





My answer directly addresses your question : please provide feedback
– javadba
Jun 13 '15 at 20:38




10 Answers
10



Yes: sc._conf.getAll()



Which uses the method:


SparkConf.getAll()



as accessed by


SparkContext.sc._conf



Note the Underscore: that makes this tricky. I had to look at the spark source code to figure it out ;)



But it does work:


In [4]: sc._conf.getAll()
Out[4]:
[(u'spark.master', u'local'),
(u'spark.rdd.compress', u'True'),
(u'spark.serializer.objectStreamReset', u'100'),
(u'spark.app.name', u'PySparkShell')]





perfect, thank you!
– noli
Aug 10 '15 at 2:56





also, note that the underscore means that the package developers think that accessing this data element isn't a great idea.
– Boris Gorelik
Mar 17 '16 at 9:27





"Note that only values explicitly specified through spark-defaults.conf, SparkConf, or the command line will appear. For all other configuration properties, you can assume the default value is used." (see spark.apache.org/docs/latest/…)
– asmaier
Sep 14 '17 at 8:33



Spark 2.1+



spark.sparkContext.getConf().getAll() where spark is your sparksession (gives you a dict with all configured settings)


spark.sparkContext.getConf().getAll()


spark


sparksession


dict





This should be accepted answer.
– chhantyal
Jul 9 at 10:18





@hhantyal no. When the question was asked there was no spark2.1. The top answer works for all versions of spark, especially old ones
– wotanii
Aug 9 at 13:15



Spark 1.6+


sc.getConf.getAll.foreach(println)





1.6.3: >>> sc.getConf.getAll.foreach(println) AttributeError: 'SparkContext' object has no attribute 'getConf'
– dovka
Jan 17 '17 at 11:53






@dovka - I used the same sc.getConf.getAll.foreach(println) as suggested by @ecesena and it worked fine for me (in Scala) - Perhaps the syntax is not for Python?
– codeaperature
Feb 25 '17 at 2:30



sc.getConf.getAll.foreach(println)





Not in pyspark 1.6.0 as you can see here: spark.apache.org/docs/1.6.0/api/python/…
– Bradley Kreider
Feb 15 at 18:16



For a complete overview of your Spark environment and configuration I found the following code snippets useful:



SparkContext:


for item in sorted(sc._conf.getAll()): print(item)



Hadoop Configuration:


hadoopConf =
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
prop = iterator.next()
hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)



Environment variables:


import os
for item in sorted(os.environ.items()): print(item)



You can use:


ssc.sparkContext.getConf.getAll



For example, I often have the following at the top of my Spark programs:


logger.info(ssc.sparkContext.getConf.getAll.mkString("n"))



update configuration in Spark 2.3.1



To change the default spark configurations you can follow these steps:



Import the required classes


from pyspark.conf import SparkConf
from pyspark.sql import SparkSession



Get the default configurations


spark.sparkContext._conf.getAll()



Update the default configurations


conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])



Stop the current Spark Session


spark.sparkContext.stop()



Create a Spark Session


spark = SparkSession.builder.config(conf=conf).getOrCreate()



Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:


from pyspark import SparkFiles
print SparkFiles.getRootDirectory()



Just for the records the analogous java version:


Tuple2<String, String> sc = sparkConf.getAll();
for (int i = 0; i < sc.length; i++)
System.out.println(sc[i]);



For Spark 2+ you can also use when using scala


spark.conf.getAll; //spark as spark session



Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf), set through the SparkConf object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:


spark-defaults.conf


SparkConf


http://<driver>:4040


SparkContext


getAll


spark.sparkContext._conf.getAll()


SET


spark.sql("SET").toPandas()


SET -v



(These three methods all return the same data on my cluster.)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard