spark.sql vs SqlContext

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



spark.sql vs SqlContext



I have used SQL in Spark, in this example:


results = spark.sql("select * from ventas")



where ventas is a dataframe, previosuly cataloged like a table:


df.createOrReplaceTempView('ventas')



but I have seen other ways of working with SQL in Spark, using the class SqlContext:


df = sqlContext.sql("SELECT * FROM table")



What is the difference between both of them?



Thanks in advance




2 Answers
2



Sparksession is the preferred way of working with Spark object now. Both Hivecontext and SQLContext are available as a part of this single object SparkSession.



You are using the latest syntax by creating a view df.createOrReplaceTempView('ventas').



From a user's perspective (not a contributor), I can only rehash what the developer's provided in the upgrade notes:



Before 2.0, the SqlContext needed an extra call to the factory that creates it. With SparkSession, they made things a lot more convenient.


SqlContext


SparkSession



If you take a look at the source code, you'll notice that the SqlContext class is mostly marked @deprecated. Closer inspection shows that the most commonly used methods simply call sparkSession.


SqlContext


@deprecated


sparkSession



For more info, take a look at the developer notes, Jira issues, conference talks on spark 2.0, and Databricks blog.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard