spark dataframe groupping does not count nulls

I have a spark DataFrame which is grouped by a column aggregated with a count:

df.groupBy('a').agg(count("a")).show +---------+----------------+ |a |count(a) | +---------+----------------+ | null| 0| | -90| 45684| +---------+----------------+ df.select('a').filter('aisNull').count

returns

warning: there was one feature warning; re-run with -feature for details res9: Long = 26834

which clearly shows that the null values were not counted initially.

What is the reason for this behaviour? I would have expected (if nullat all is contained in the grouping result) to properly see the counts.

null

2 Answers
2

Yes, count applied to a specific column does not count the null-values. If you want to include the null-values, use:

count

df.groupBy('a).agg(count("*")).show

What is the reason for this behaviour?

SQL-92 standard. In particular (emphasis mine):

Let T be the argument or argument source of a <set function specification>.

If COUNT(*) is specified, then the result is the cardinality of T.

Otherwise, let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values.

If DISTINCT is specified, then let TXA be the result of eliminating redundant duplicate values from TX. Otherwise, let TXA be
TX.

If the COUNT is specified, then the
result is the cardinality of TXA.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

spark dataframe groupping does not count nulls

spark dataframe groupping does not count nulls

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

spark dataframe groupping does not count nulls

spark dataframe groupping does not count nulls

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2