Databricks Spark notebook re-using Scala objects between runs?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Databricks Spark notebook re-using Scala objects between runs?



I have written an Azure Databricks scala notebook (based on a JAR library), and I run it using a Databricks job once every hour.



In the code, I use the Application Insights Java SDK for log tracing, and init a GUID that marks the "RunId". I do this in a Scala 'object' constructor:


object AppInsightsTracer

TelemetryConfiguration.getActive().setInstrumentationKey("...");
val tracer = new TelemetryClient();
val properties = new java.util.HashMap[String, String]()
properties.put("RunId", java.util.UUID.randomUUID.toString);

def trackEvent(name: String)

tracer.trackEvent(name, properties, null)




The notebook itself simply calls the code in the JAR:


import com.mypackage._
Flow.go()



I expect to have a different "RunId" every hour. The weird behavior I am seeing is that for all runs, I get exactly the same "RunId" in the logs!
As if the Scala object constructor code is run exactly once, and is re-used between notebook runs...



Do Spark/Databricks notebooks retain context between runs? If so how can this be avoided?




1 Answer
1



You start with a new context every time you refresh the notebook.



I would recommend saving your RunId to a file to disk, then reading that file on every notebook run and then increment the RunId in the file.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard