R Feature Selection with caret - Limit results plot to top 10 and also store full results into data frame

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



R Feature Selection with caret - Limit results plot to top 10 and also store full results into data frame



I am relatively new to R and trying my hand at feature selection for the first time. I followed a tutorial online that used the PimaIndiansDiabetes dataset as an example. I repeated the steps in this tutorial on my own dataset that has over 110 features.



I have included the sample code for the tutorial I used below. The only difference is that my code has a larger dataset and different naming conventions.



When I plot the importance value for my own results the plot has over 110 items appearing. Does anybody know how I can limit this to the top 10?


library(mlbench)
library(caret)
# ensure results are repeatable
set.seed(7)

# load the dataset
data(PimaIndiansDiabetes)

# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)

# train the model
model <- train(diabetes~., data=PimaIndiansDiabetes, method="lvq",
preProcess="scale", trControl=control)

# estimate variable importance
importance <- varImp(model, scale=FALSE)

# summarize importance

print(importance)

# plot importance
plot(importance)



I also want to be able to store these full results into a dataframe.
I tried the following command:


importanceDF <- as.data.frame(importance)



but I get the following error


Error in as.data.frame.default(importance) :
cannot coerce class ""varImp.train"" to a data.frame



Apologies if this is a simple question, I have tried googling but have yet to find an answer that works.



Thanks in advance,



Amy



EDIT:



As per zacdav's answer I have applied the following logic:


importance$importance
temp <- importance
temp$importance <- importance$importance[1:5, ]
plot(temp)



However I noted that when I original run
plot(importance)



The order is as follows in the sample data:


Importance
glucose 0.7881
mass 0.6876
age 0.6869
pregnant 0.6195
pedigree 0.6062
pressure 0.5865
triceps 0.5536
insulin 0.5379



Then when I run
temp$importance <- importance$importance[1:5, ]
plot(temp)



I get the following order:


glucose
pregnant
pressure
triceps
insulin



This is taking the top 5 rows in how they appear n the original table rather than based on their importance.



I tried running the following:


# put into DF
importanceDF <- importance$importance
# sort
importanceDF_Ordered <- importanceDF[order(-importanceDF$neg),]
temp <- importanceDF_Ordered



The last line then gives an error:


Error in `$<-.data.frame`(`*tmp*`, "importance", value = list(neg =
c(0.619514925373134, :
replacement has 5 rows, data has 8




1 Answer
1



Looking at the structure of the importance object you will see it is a list comprising of three elements, a data.frame of the importance values towards each response class and other metadata. You can just index the data.frame using the $ notation.


$


str(importance)

List of 3
$ importance:'data.frame': 8 obs. of 2 variables:
..$ neg: num [1:8] 0.62 0.788 0.586 0.554 0.538 ...
..$ pos: num [1:8] 0.62 0.788 0.586 0.554 0.538 ...
$ model : chr "ROC curve"
$ calledFrom: chr "varImp"
- attr(*, "class")= chr "varImp.train"



So to get the data.frame all you need to do is importance$importance


importance$importance



As far as adjusting this object so you can plot a subset of the features you can adjust the object. I would suggest maybe making a copy so that analysis does not need to be rerun. A crude example is as follows:


temp <- importance
temp$importance <- importance$importance[1:5, ]
plot(temp)



I have chosen to plot the first five using 1:5 row index on the data.frame to override the temp objects data.frame.
If you are interested in calling the plot method directly use caret:::plot.varImp.train


1:5


caret:::plot.varImp.train





Thank you so much for you help really appreciate it.
– Amy
Aug 6 at 2:14





thanks for your help, however I noticed that when I run the command plot(temp) it does not actually plot it in order. Rather it takes the rows as ho they appear originally and not by importance. I have edited my question to include code I tried using to fix this issue, would you mind looking to see if you know where I am going wrong please
– Amy
Aug 6 at 3:04






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard