filter a data frame and applying a cutoff on multiple columns in r
Clash Royale CLAN TAG#URR8PPP
filter a data frame and applying a cutoff on multiple columns in r
I have a data frame as follows:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] A 4 NA NA 1.55 4 NA
[2,] B NA NA 4 0.56 NA NA
[3,] C 4 4 NA 0.62 4 4
[4,] D NA NA NA 1.61 4 NA
[5,] E 4 NA NA 0.5 4 NA
What I would like to get as the output after filtering is:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[3,] C 4 4 NA 0.62 4 4
[5,] E 4 NA NA 0.5 4 NA
I would like to have at least one value equals to 4 in columns 2 to 4 & at least one value equals to 4 in columns 6 to 7.
I was thinking to use the following command But I am not sure how to use it in a proper way that gives me the correct output.
here is the command:
new.df <- df %>%
dplyr::filter_at((vars(c(2:4)), any_vars(. == 4) & vars(c(6:7)), any_vars(. == 4))
Do you have any idea how can I get the desired new.df?
Thanks!
dput
3 Answers
3
In base R you could do something like:
df[rowSums(df[2:4]==4,T)>0 & rowSums(df[6:7]==4,T)>0,]
col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
3 C 4 4 NA 0.62 4 4
5 E 4 NA NA 0.50 4 NA
A slightly different use of dplyr
:
dplyr
df %>%
filter_at(vars(col2, col3, col4), any_vars(. == 4)) %>%
filter_at(vars(col6, col7), any_vars(. == 4))
col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
2 C 4 4 NA 0.62 4 4
3 E 4 NA NA 0.50 4 NA
With column positions:
df %>%
filter_at(c(2,3,4), any_vars(. == 4)) %>%
filter_at(c(6,7), any_vars(. == 4))
Thanks for your response. I have tried this code. I want to do an & operation. something like ( (col1 | col2 | col3) == 4 ) & ( ( col6 | col7) == 4 ). Is there any way I can do it with dplyr ?
– yas.f
Aug 8 at 19:50
Adam Warner posted a solution which is exactly like that.
– tmfmnk
Aug 8 at 20:11
I am not certain what is wrong with unless it is too verbose for you and you want a way to not name the columns.
df = data.frame(col1 = c("A", "B", "C", "D", "E"),
col2 = c(4, NA, 4, NA, 4),
col3 = c(NA, NA, 4, NA, NA),
col4 = c(NA, 4, NA, NA, NA),
col5 = c(1.55, 0.56, 0.62, 1.61, 0.5 ),
col6 = c(4, NA, 4, 4, 4),
col7 = c(NA, NA, 4, NA, NA))
df %>% filter((col2 == 4| col3 == 4 | col4 == 4) & (col6 == 4 | col7 == 4))
Which produces:
col1 col2 col3 col4 col5 col6 col7
1 A 4 NA NA 1.55 4 NA
2 C 4 4 NA 0.62 4 4
3 E 4 NA NA 0.50 4 NA
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Because of the way you formatted your example data frame, it's impossible to copy it into R to work with. This makes it really hard for us to try to solve your problem. Please use
dput
to make a version of it we can put directly into R. Take a look at How to make a great R reproducible example to see more– divibisan
Aug 8 at 19:22