Is it possible to get the current spark context settings in PySpark?
Clash Royale CLAN TAG#URR8PPP
Is it possible to get the current spark context settings in PySpark?
I'm trying to get the path to spark.worker.dir
for the current sparkcontext
.
spark.worker.dir
sparkcontext
If I explicitly set it as a config param
, I can read it back out of SparkConf
, but is there anyway to access the complete config
(including all defaults) using PySpark
?
config param
SparkConf
config
PySpark
SparkConf
spark.worker.dir
My answer directly addresses your question : please provide feedback
– javadba
Jun 13 '15 at 20:38
10 Answers
10
Yes: sc._conf.getAll()
Which uses the method:
SparkConf.getAll()
as accessed by
SparkContext.sc._conf
Note the Underscore: that makes this tricky. I had to look at the spark source code to figure it out ;)
But it does work:
In [4]: sc._conf.getAll()
Out[4]:
[(u'spark.master', u'local'),
(u'spark.rdd.compress', u'True'),
(u'spark.serializer.objectStreamReset', u'100'),
(u'spark.app.name', u'PySparkShell')]
perfect, thank you!
– noli
Aug 10 '15 at 2:56
also, note that the underscore means that the package developers think that accessing this data element isn't a great idea.
– Boris Gorelik
Mar 17 '16 at 9:27
"Note that only values explicitly specified through spark-defaults.conf, SparkConf, or the command line will appear. For all other configuration properties, you can assume the default value is used." (see spark.apache.org/docs/latest/…)
– asmaier
Sep 14 '17 at 8:33
Spark 2.1+
spark.sparkContext.getConf().getAll()
where spark
is your sparksession
(gives you a dict
with all configured settings)
spark.sparkContext.getConf().getAll()
spark
sparksession
dict
This should be accepted answer.
– chhantyal
Jul 9 at 10:18
@hhantyal no. When the question was asked there was no spark2.1. The top answer works for all versions of spark, especially old ones
– wotanii
Aug 9 at 13:15
Spark 1.6+
sc.getConf.getAll.foreach(println)
1.6.3: >>> sc.getConf.getAll.foreach(println) AttributeError: 'SparkContext' object has no attribute 'getConf'
– dovka
Jan 17 '17 at 11:53
@dovka - I used the same
sc.getConf.getAll.foreach(println)
as suggested by @ecesena and it worked fine for me (in Scala) - Perhaps the syntax is not for Python?– codeaperature
Feb 25 '17 at 2:30
sc.getConf.getAll.foreach(println)
Not in pyspark 1.6.0 as you can see here: spark.apache.org/docs/1.6.0/api/python/…
– Bradley Kreider
Feb 15 at 18:16
For a complete overview of your Spark environment and configuration I found the following code snippets useful:
SparkContext:
for item in sorted(sc._conf.getAll()): print(item)
Hadoop Configuration:
hadoopConf =
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
prop = iterator.next()
hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)
Environment variables:
import os
for item in sorted(os.environ.items()): print(item)
You can use:
ssc.sparkContext.getConf.getAll
For example, I often have the following at the top of my Spark programs:
logger.info(ssc.sparkContext.getConf.getAll.mkString("n"))
update configuration in Spark 2.3.1
To change the default spark configurations you can follow these steps:
Import the required classes
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
Get the default configurations
spark.sparkContext._conf.getAll()
Update the default configurations
conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])
Stop the current Spark Session
spark.sparkContext.stop()
Create a Spark Session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:
from pyspark import SparkFiles
print SparkFiles.getRootDirectory()
Just for the records the analogous java version:
Tuple2<String, String> sc = sparkConf.getAll();
for (int i = 0; i < sc.length; i++)
System.out.println(sc[i]);
For Spark 2+ you can also use when using scala
spark.conf.getAll; //spark as spark session
Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf
), set through the SparkConf
object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:
spark-defaults.conf
SparkConf
http://<driver>:4040
SparkContext
getAll
spark.sparkContext._conf.getAll()
SET
spark.sql("SET").toPandas()
SET -v
(These three methods all return the same data on my cluster.)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
No - you can get the conf object but not the things you'd looking for. Defaults are not available through
SparkConf
(they're hardcoded in the sources). Andspark.worker.dir
sounds like a configuration for the Worker daemon, not something your app would see.– vanza
Jun 1 '15 at 3:34