开发者

Reading the last n lines from a huge text file

I've tried something like this

file_in <- file("myfile.log","r")
x <- readLines(file_in, n=-开发者_如何转开发100)

but I'm still waiting...

Any help would be greatly appreciated


I'd use scan for this, in case you know how many lines the log has :

scan("foo.txt",sep="\n",what="char(0)",skip=100)

If you have no clue how many you need to skip, you have no choice but to move towards either

  • reading in everything and taking the last n lines (in case that's feasible),
  • using scan("foo.txt",sep="\n",what=list(NULL)) to figure out how many records there are, or
  • using some algorithm to go through the file, keeping only the last n lines every time

The last option could look like :

ReadLastLines <- function(x,n,...){    
  con <- file(x)
  open(con)
  out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...)

  while(TRUE){
    tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE)
    if(length(tmp)==0) {close(con) ; break }
    out <- c(out[-1],tmp)
  }
  out
}

allowing :

ReadLastLines("foo.txt",100)

or

ReadLastLines("foo.txt",100,skip=1e+7)

in case you know you have more than 10 million lines. This can save on the reading time when you start having extremely big logs.


EDIT : In fact, I'd not even use R for this, given the size of your file. On Unix, you can use the tail command. There is a windows version for that as well, somewhere in a toolkit. I didn't try that out yet though.


You could do this with read.table by specifying the skip parameter. If your lines are not to be parsed to variables, specify the separator to be '\n' as @Joris Meys pointed out below, and also set as.is=TRUE to get character vectors instead of factors.

Small example (skipping the first 2000 lines):

df <- read.table('foo.txt', sep='\n', as.is=TRUE, skip=2000)


As @JorisMeys already mentioned the unix command tail would be the easiest way to solve this problem. However I want to propose a seek based R solution that starts reading the file from the end of the file:

tailfile <- function(file, n) {
  bufferSize <- 1024L
  size <- file.info(file)$size

  if (size < bufferSize) {
    bufferSize <- size
  }

  pos <- size - bufferSize
  text <- character()
  k <- 0L

  f <- file(file, "rb")
  on.exit(close(f))

  while(TRUE) {
    seek(f, where=pos)
    chars <- readChar(f, nchars=bufferSize)
    k <- k + length(gregexpr(pattern="\\n", text=chars)[[1L]])
    text <- paste0(text, chars)

    if (k > n || pos == 0L) {
      break
    }

    pos <- max(pos-bufferSize, 0L)
  }

  tail(strsplit(text, "\\n")[[1L]], n)
}

tailfile(file, n=100)


You can read last n lines by following method

Step 1 - Open your file as your wish df <- read.csv("hw1_data.csv")

Step 2 - Now use tail function to read n lines from last

tail(df, 2)


Some folks have said it already, but if you have a large log, it is most efficient to only read in what you need instead of reading it all into memory, then subsetting what you need.

For this, we use R's system() to run the Linux tail command.

Read the last 10 lines of the log:

system("tail path/to/my_file.log")

Read the last 2 lines of the log:

system("tail -n 2 path/to/my_file.log")

Read the last 2 lines of the log and capture the output in a character vector:

last_2_lines <- system("tail -n 2 path/to/my_file.log", intern = TRUE)


For seeing the last few lines:

tail(file_in,100) 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜