R text mining package DocumentTermMatrix with a dictionary in the control list takes way too much memory [closed]
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this questionI have noticed that DocumentTermMatrix(myCorpus, control=list(dictionary=myDict))
consumes way more memory than DocumentTermMatrix(myCorpus)
Why is this happening?
Any leads?
Here is the code snippet:
library(tm)
library(XML)
source("MyXMLReader.r") # contains the myXML reader code
myCorpus <- Corpus(DirSource(paste(basepath,"corpus",sep=""))
readerControl = list(reader = myXMLReader))
myDict = unlist(readLines("some-file-containing-a-fixed-vocab"))
Now here is my question:
dtm = DocumentTermMatrix(mYCorpus) # takes very little extra RAM to do this
dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict)) # Takes a whol开发者_运维百科e lot of # RAM` which is not even released after dtm is formed...
I guess there is a memory leak and possible bug.
精彩评论