R problems using rpart with 4000 records and 13 attributes
I have attempted to email the author of this package without success, just wondering if anybody else has experienced this.
I am having an using rpart
on 4000 rows of data with 13 attributes.
I can run the same test on 300 rows of the same data with no issue.
When I run on 4000 rows, Rgui.exe runs consistently at 50% CPU and the
UI hangs; it will stay like this for at least 4-5hours if I let it
run, and never exit or become responsive.
here is the code I am using both on the 300 and 4000 size subset:
开发者_如何转开发train <- read.csv("input.csv", header=T)
y <- train[, 18]
x <- train[, 3:17]
library(rpart)
fit <- rpart(y ~ ., x)
Is this a known limitation of rpart
, am I doing something wrong?
potential workarounds?
Can you reproduce the error message when you feed rpart random data of similar dimensions, rather than your real data (from input.csv)? If not, it's probably a problem with your data (formatting perhaps?). After importing your data using read.csv, check the data for format issues by looking at the output from str(train).
#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))
y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))
the problem here was data prep error.
a header was re-written far down in the middle of the data set.
精彩评论