How to do 2 tests when filtering a RDD in pyspark?

How to do 2 tests when filtering a RDD in pyspark?

I have 2 parameters:

NB_line =10 NB2_line=11

I have a python function, where I did a test of a number of the lines in my dataframe if is not OK.
The dataframe that take 2 cases of number of lines, is NB_line=10 or NB2_line=11.

python

NB_line=10

NB2_line=11

in the begin it was like this my dataframe:

rddLignesErreur=rddstats.filter(lambda x : len(x) != NB_line)

After evolution of a use case, I modified it like this:

rddLignesErreur=rddstats.filter(lambda x : len(x) != NB_line or len(x) != NB2_line)

Is it true or I or no ? ==> I'm beginning in python.

Thank you

The or is correct. Within a lambda expression, you have to write plain python code. Note also that if NB_line and NB2_line are different, your condition will always be true.
– Oli
Aug 6 at 9:58

or

1 Answer
1

Why not just use not in?

not in

lambda x: len(x) not in NB_line, NB2_line

Thank you, I will try it.
– vero
Aug 6 at 10:25

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

0rXMQt1O ySwd6 L

搜尋此網誌

Sfyjdyy