开发者

Extend memory size limit in R

开发者_开发问答I have a R program that combines 10 files each file is of size 296MB and I have increased the memory size to 8GB (Size of RAM)

--max-mem-size=8192M

and when I ran this program I got a error saying

In type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) :
  Reached total allocation of 7646Mb: see help(memory.size) 

Here is my R program

S1 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1_400.txt");
S2 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_401_800.txt");
S3 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_801_1200.txt");
S4 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1201_1600.txt");
S5 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1601_2000.txt");
S6 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2001_2400.txt");
S7 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2401_2800.txt");
S8 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2801_3200.txt");
S9 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3201_3600.txt");
S10 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3601_4000.txt");
options(max.print=154.8E10);
combine_result <- rbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
write.table(combine_result,file="C:/sim_omega3_1_4000.txt",sep=";",
             row.names=FALSE,col.names=TRUE, quote = FALSE);

Can anyone, help me with this

Thanks,

Shruti.


I suggest incorporating the suggestions in ?read.csv2:

Memory usage:

 These functions can use a surprising amount of memory when reading
 large files.  There is extensive discussion in the ‘R Data
 Import/Export’ manual, supplementing the notes here.

 Less memory will be used if ‘colClasses’ is specified as one of
 the six atomic vector classes.  This can be particularly so when
 reading a column that takes many distinct numeric values, as
 storing each distinct value as a character string can take up to
 14 times as much memory as storing it as an integer.

 Using ‘nrows’, even as a mild over-estimate, will help memory
 usage.

 Using ‘comment.char = ""’ will be appreciably faster than the
 ‘read.table’ default.

 ‘read.table’ is not the right tool for reading large matrices,
 especially those with many columns: it is designed to read _data
 frames_ which may have columns of very different classes.  Use
 ‘scan’ instead for matrices.


Memory allocation needs contiguous blocks. The size taken by the file on disk may not be a good index of how large the object is when loaded into R. Can you look at one of these S files with the function:

?object.size

Here is a function I use to see what is taking up the most space in R:

getsizes <- function() {z <- sapply(ls(envir=globalenv()), 
                                function(x) object.size(get(x)))
               (tmp <- as.matrix(rev(sort(z))[1:10]))}


If you remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10) then gc() after calculating combine_result, you might free enough memory. I also find that running it through RScript seems to allows access to more memory than through the GUI if you are on Windows.


If this files are in standard format and you want to do this in R then why bother read/write csv. Use readLines/writeLines:

files_in <- file.path("C:/Sim_Omega3_results",c(
    "sim_omega3_1_400.txt",
    "sim_omega3_401_800.txt",
    "sim_omega3_801_1200.txt",
    "sim_omega3_1201_1600.txt",
    "sim_omega3_1601_2000.txt",
    "sim_omega3_2001_2400.txt",
    "sim_omega3_2401_2800.txt",
    "sim_omega3_2801_3200.txt",
    "sim_omega3_3201_3600.txt",
    "sim_omega3_3601_4000.txt"))


file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt")
file_out <- file(out_file_name, "at")
for (file_in in files_in[-1]) {
    x <- readLines(file_in)
    writeLines(x[-1], file_out)
}
close(file_out)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜