Order dataframe based on inherent values within column

I would like to capture inherent values of within the dataframe and then order the columns and rows from descending to ascending based on the number of events in each column and row.

Sample Data

#A tibble: 26 x 9 sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H <fct> <int> <int> <int> <int> <int> <int> <int> <int> 1 A -1 0 0 0 -1 0 0 -1 2 B 1 0 -1 1 -1 -1 -1 0 3 C 1 0 -1 0 1 0 0 -1 4 D -1 0 0 -1 1 1 -1 1 5 E 1 1 1 1 -1 1 -1 0 6 F -1 -1 1 1 1 -1 0 0 7 G 0 0 -1 -1 0 -1 0 -1 8 H 1 1 1 0 1 -1 -1 0 9 I 0 -1 -1 -1 0 -1 0 1 10 J -1 0 0 1 -1 -1 0 1 # ... with 16 more rows

The final result I am going for is a table that is ordered on the following hierarchy:

Here is the sample output:

# A tibble: 26 x 9 sample_id Gene_B Gene_G Gene_H Gene_A Gene_C Gene_D Gene_F Gene_E * <chr> <int> <int> <int> <int> <int> <int> <int> <int> 1 A 0 0 -1 -1 0 0 0 -1 2 U 0 -1 0 0 0 -1 0 1 3 C 0 0 -1 1 -1 0 0 1 4 G 0 0 -1 0 -1 -1 -1 0 5 W 0 -1 1 1 0 1 0 0 6 Y 0 0 1 1 0 1 1 0 7 I -1 0 1 0 -1 -1 -1 0 8 J 0 0 1 -1 0 1 -1 -1 9 O 0 1 0 0 1 -1 1 1 10 P 1 -1 -1 0 -1 0 0 -1 # ... with 16 more rows

My first though was to take absolute sum and add a column with total for each sample, take absolute sum and add a row with total for each column and then use order.

Generating Sample Data

dummy.tb <- tibble (sample_id = (sample (1:30,30)), Gene_A = (sample (-1:1,30, replace = T)), Gene_B = (sample (-1:1,30, replace = T))) dummy1.tb <- tibble (Gene_C = (sample (-1:1,30, replace = T)), Gene_D = (sample (-1:1,30, replace = T)), Gene_E = (sample (-1:1,30, replace = T))) dummy2.tb <- tibble (Gene_F = (sample (-1:1,30, replace = T)), Gene_G = (sample (-1:1,30, replace = T)), Gene_H = (sample (-1:1,30, replace = T))) dummy.tb <- cbind.data.frame(dummy.tb, dummy1.tb, dummy2.tb) dummy.genes <- c ("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E", "Gene_F", "Gene_G", "Gene_H") dummy.total <- as.data.frame (dummy.total)

Add Col_Total and Column Sums

dummy.total <- dummy.tb %>% bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(abs(.)) else "Col_Total"))) dummy.total <- as.data.frame (dummy.total)

sort for column

dummy.total <- dummy.total [,order(dummy.total[nrow(dummy.total),], decreasing = FALSE)]

delete Col_Total row

dummy.total <- dummy.total %>% filter (!sample_id == "Col_Total") dummy.total <- as.data.frame (dummy.total)

Add Row Totals and Row Sums

dummy.total <- dummy.total %>% mutate (Row_Total = rowSums (abs((select (., one_of(dummy.genes)))))) dummy.total <- as.data.frame (dummy.total)

sort for row

dummy.total <- dummy.total [order (dummy.total [,ncol(dummy.total)], decreasing = FALSE),]

delete Row_Total column

dummy.total <- dummy.total %>% select (-Row_Total)

Reorder with sample_id

dummy.total <- dummy.total %>% select (sample_id, everything()) dummy.total <- as.tibble(dummy.total)

This code does works and gives the desired output but it seems too verbose with lot of changes in structures along the way. Any suggestions using tidyverse or other methods? I tried using arrange () instead of order for ordering the columns based on values on Col_Total but did not seem to work.

arrange ()

Col_Total

Note: It is per event, not just a straight forward sum so absolute values are included to calculate the number of events.

Please correct the quotes. It is giving errors
– akrun
Aug 9 at 23:38

Changed the quotes for dummy.genes
– KP1
Aug 10 at 14:34

1 Answer
1

For this problem matrix is a better fit as you handle homogeneous (numeric) data. If you assign column names and sample_id to dimnames of matrix you will be able to preserve the column and row identifiers after sorting.

matrix

sample_id

dimnames

matrix

I recommend you to use set.seed so your examples will be reproducible and it will be possible to validate the answer with the desired output.

set.seed

Please see as below:

set.seed(123) n <- 30 m <- 9 mat <- matrix( sample(-1:1, n * m, replace = TRUE), nrow = n, dimnames = list(1:n, paste("Gene", LETTERS[1:m], sep = "_")) ) foo <- mat[, order(colSums(abs(mat)))] bar <- foo[order(rowSums(abs(foo))), ] head(bar)

Output:

Gene_F Gene_D Gene_I Gene_G Gene_C Gene_A Gene_H Gene_B Gene_E 18 -1 0 0 0 0 -1 0 0 1 15 0 0 0 1 0 -1 -1 -1 0 27 0 0 0 0 1 0 -1 -1 -1 1 1 -1 0 1 0 -1 0 1 0 3 0 0 -1 1 0 0 -1 1 -1 6 0 -1 1 0 0 -1 1 0 1

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Order dataframe based on inherent values within column

Order dataframe based on inherent values within column

1 Answer
1

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Order dataframe based on inherent values within column

Order dataframe based on inherent values within column

1 Answer 1

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

1 Answer
1