开发者

Output formatting in R

I am new to R and trying to do some correlation analysis on multiple sets of data. I am able to do the analysis, but I am trying to figure out how I can output开发者_运维知识库 the results of my data. I'd like to have output like the following:

 NAME,COR1,COR2
 ....,....,....
 ....,....,....

If I could write such a file to output, then I can post process it as needed. My processing script looks like this:

run_analysis <- function(logfile, name)
{
  preds <- read.table(logfile, header=T, sep=",")

  # do something with the data: create some_col, another_col, etc.

  result1 <- cor(some_col, another_col)
  result1 <- cor(some_col2, another_col2)

  # somehow output name,result1,result2 to a CSV file
 }

args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)

for (f in logfiles) {
  name = unlist(strsplit(f,"\\."))[1]
  logfile = paste(logbase, f, sep="/")
  run_analysis(logfile, name)
}

Is there an easy way to create a blank data frame and then add data to it, row by row?


Have you looked at the functions in R for writing data to files? For instance, write.csv. Perhaps something like this:

rs <- data.frame(name = name, COR1 = result1, COR2 = result2)
write.csv(rs,"path/to/file",append = TRUE,...)


I like using the foreach library for this sort of thing:

library(foreach)

run_analysis <- function(logfile, name) {
  preds <- read.table(logfile, header=T, sep=",")
  # do something with the data: create some_col, another_col, etc.
  result1 <- cor(some_col, another_col)
  result2 <- cor(some_col2, another_col2)

  # Return one row of results.
  data.frame(name=name, cor1=result1, cor2=result2)
}

args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)

## Collect results from run_analysis into a table, by rows.
dat <- foreach (f=logfiles, .combine="rbind") %do% {
  name = unlist(strsplit(f,"\\."))[1]
  logfile = paste(logbase, f, sep="/")
  run_analysis(logfile, name)
}

## Write output.
write.csv(dat, "output.dat", quote=FALSE)

What this does is to generate one row of output on each call to run_analysis, binding them into a single table called dat (the .combine="rbind" part of the call to foreach causes row binding). Then you can just use write.csv to get the output you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜