filter a data frame and applying a cutoff on multiple columns in r

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



filter a data frame and applying a cutoff on multiple columns in r



I have a data frame as follows:


[,1] [,2] [,3] [,4] [,5] [,6] [,7]

[1,] A 4 NA NA 1.55 4 NA

[2,] B NA NA 4 0.56 NA NA

[3,] C 4 4 NA 0.62 4 4

[4,] D NA NA NA 1.61 4 NA

[5,] E 4 NA NA 0.5 4 NA



What I would like to get as the output after filtering is:


[,1] [,2] [,3] [,4] [,5] [,6] [,7]


[3,] C 4 4 NA 0.62 4 4


[5,] E 4 NA NA 0.5 4 NA



I would like to have at least one value equals to 4 in columns 2 to 4 & at least one value equals to 4 in columns 6 to 7.



I was thinking to use the following command But I am not sure how to use it in a proper way that gives me the correct output.



here is the command:


new.df <- df %>%
dplyr::filter_at((vars(c(2:4)), any_vars(. == 4) & vars(c(6:7)), any_vars(. == 4))



Do you have any idea how can I get the desired new.df?
Thanks!





Because of the way you formatted your example data frame, it's impossible to copy it into R to work with. This makes it really hard for us to try to solve your problem. Please use dput to make a version of it we can put directly into R. Take a look at How to make a great R reproducible example to see more
– divibisan
Aug 8 at 19:22


dput




3 Answers
3



In base R you could do something like:


df[rowSums(df[2:4]==4,T)>0 & rowSums(df[6:7]==4,T)>0,]
col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
3 C 4 4 NA 0.62 4 4
5 E 4 NA NA 0.50 4 NA



A slightly different use of dplyr:


dplyr


df %>%
filter_at(vars(col2, col3, col4), any_vars(. == 4)) %>%
filter_at(vars(col6, col7), any_vars(. == 4))

col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
2 C 4 4 NA 0.62 4 4
3 E 4 NA NA 0.50 4 NA



With column positions:


df %>%
filter_at(c(2,3,4), any_vars(. == 4)) %>%
filter_at(c(6,7), any_vars(. == 4))





Thanks for your response. I have tried this code. I want to do an & operation. something like ( (col1 | col2 | col3) == 4 ) & ( ( col6 | col7) == 4 ). Is there any way I can do it with dplyr ?
– yas.f
Aug 8 at 19:50





Adam Warner posted a solution which is exactly like that.
– tmfmnk
Aug 8 at 20:11




I am not certain what is wrong with unless it is too verbose for you and you want a way to not name the columns.


df = data.frame(col1 = c("A", "B", "C", "D", "E"),
col2 = c(4, NA, 4, NA, 4),
col3 = c(NA, NA, 4, NA, NA),
col4 = c(NA, 4, NA, NA, NA),
col5 = c(1.55, 0.56, 0.62, 1.61, 0.5 ),
col6 = c(4, NA, 4, 4, 4),
col7 = c(NA, NA, 4, NA, NA))

df %>% filter((col2 == 4| col3 == 4 | col4 == 4) & (col6 == 4 | col7 == 4))



Which produces:


col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
2 C 4 4 NA 0.62 4 4
3 E 4 NA NA 0.50 4 NA






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard