How to aggregate custom application logs in Spark on HDInsight?
Clash Royale CLAN TAG#URR8PPP
How to aggregate custom application logs in Spark on HDInsight?
CONTEXT
I want to configure custom logging in an application written in python and running on a HDInsight Spark cluster (hence Hortonworks-style).
HDInsight cluster type: Spark 2.2 on Linux (HDI 3.6), Spark version: 2.2.0.2.6.3.2-13
My requirements are as follows:
RESEARCH
I managed to modify the log4j.properties
creating a custom log appender and a logger that uses it and it writes to a file but I'm failing to make it aggregate the logs.
log4j.properties
When I tried to use the standard $spark.yarn.app.container.log.dir/filename.log
it got resolved to /filename.log
and returned a permission denied
error both in pyspark
and using spark-submit
but the file filename.log
appeared in the RM UI (it was empty though).
$spark.yarn.app.container.log.dir/filename.log
/filename.log
permission denied
pyspark
spark-submit
filename.log
The path spark.yarn.app.container.log.dir
normally should look like this: /var/log/hadoop-yarn/container/<applicationId>/<containerId>
, e.g.: /var/log/hadoop-yarn/container/application_1504924099862_7571/container_e16_1504924099862_7571_01_000005
so the solution I was considering is to set the appender destination file from within the application using either the value of spark.yarn.app.container.log.dir
or the applicationId and containerId.
spark.yarn.app.container.log.dir
/var/log/hadoop-yarn/container/<applicationId>/<containerId>
/var/log/hadoop-yarn/container/application_1504924099862_7571/container_e16_1504924099862_7571_01_000005
spark.yarn.app.container.log.dir
In both cases I don't know how to do it in python: spark.yarn.app.container.log.dir
looks unset (sc._conf.getAll()
doesn't contain it) and I don't know where to look for
containerId, other than extracting it from the spark.yarn.app.container.log.dir
path.
spark.yarn.app.container.log.dir
sc._conf.getAll()
spark.yarn.app.container.log.dir
I managed to obtain spark.yarn.app.container.log.dir
in Scala thanks to How do I get the YARN ContainerId from inside the container? but it returns multiple paths so I'm not sure if it is usable.
spark.yarn.app.container.log.dir
QUESTIONS
Is it possible that spark.yarn.app.container.log.dir
has different values from Scala and Python APIs?
spark.yarn.app.container.log.dir
How can I read the value of spark.yarn.app.container.log.dir
in pyspark knowing that I can do this using System.getProperty("spark.yarn.app.container.log.dir")
in Scala?
spark.yarn.app.container.log.dir
System.getProperty("spark.yarn.app.container.log.dir")
Can I make YARN aggregate logs from a custom appender not using spark.yarn.app.container.log.dir
?
spark.yarn.app.container.log.dir
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.