How to do 2 tests when filtering a RDD in pyspark?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to do 2 tests when filtering a RDD in pyspark?



I have 2 parameters:


NB_line =10
NB2_line=11



I have a python function, where I did a test of a number of the lines in my dataframe if is not OK.
The dataframe that take 2 cases of number of lines, is NB_line=10 or NB2_line=11.


python


NB_line=10


NB2_line=11



in the begin it was like this my dataframe:


rddLignesErreur=rddstats.filter(lambda x : len(x) != NB_line)



After evolution of a use case, I modified it like this:


rddLignesErreur=rddstats.filter(lambda x : len(x) != NB_line or len(x) != NB2_line)



Is it true or I or no ? ==> I'm beginning in python.



Thank you





The or is correct. Within a lambda expression, you have to write plain python code. Note also that if NB_line and NB2_line are different, your condition will always be true.
– Oli
Aug 6 at 9:58



or




1 Answer
1



Why not just use not in?


not in


lambda x: len(x) not in NB_line, NB2_line





Thank you, I will try it.
– vero
Aug 6 at 10:25






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

Creating a leaderboard in HTML/JS