Improving RODBC-Postgres Write Performance
I've recently begun using RODBC to connect to PostgreSQL as I couldn't get RPostgreSQL to compile and run in Windows x64. I've found that read performance is similar between the two packages, but write performance is not. For example, using RODBC (where z is a ~6.1M row dataframe):
library(RODBC)
con <- odbcConnect("PostgreSQL84")
#autoCommit=FALSE seems to speed things up
odbcSetAutoCommit(con, autoCommit = FALSE)
system.time(sqlSave(con, z, "ERASE111", fast = TRUE))
user  system elapsed
275.34  369.86 1979.59 
odbcEndTran(con, commit = TRUE)
odbcCloseAll()
Whereas for the开发者_如何学Go same ~6.1M row dataframe using RPostgreSQL (under 32-bit):
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="gisdb", user="postgres", password="...")
system.time(dbWriteTable(con, "ERASE222", z))
user  system elapsed 
467.57   56.62  668.29 
dbDisconnect(con)
So, in this test, RPostgreSQL is about 3X as fast as RODBC in writing tables.  This performance ratio seems to stay more-or-less constant regardless of the number of rows in the dataframe (but the number of columns has far less effect).  I do notice that RPostgreSQL uses something like COPY <table> FROM STDIN while RODBC issues a bunch of INSERT INTO <table> (columns...) VALUES (...) queries.  I also notice that RODBC seems to choose int8 for integers, while RPostgreSQL chooses int4 where appropriate. 
I need to do this kind of dataframe copy often, so I would very sincerely appreciate any advice on speeding up RODBC. For example, is this just inherent in ODBC, or am I not calling it properly?
It seems there is no immediate answer to this, so I'll post a kludgy workaround in case it is helpful for anyone.
Sharpie is correct--COPY FROM is by far the fastest way to get data into Postgres. Based on his suggestion, I've hacked together a function that gives a significant performance boost over RODBC::sqlSave().   For example, writing a 1.1 million row (by 24 column) dataframe took 960 seconds (elapsed) via sqlSave vs 69 seconds using the function below.  I wouldn't have expected this since the data are written once to disk then again to the db.
library(RODBC)
con <- odbcConnect("PostgreSQL90")
#create the table
createTab <- function(dat, datname) {
  #make an empty table, saving the trouble of making it by hand
  res <- sqlSave(con, dat[1, ], datname)
  res <- sqlQuery(con, paste("TRUNCATE TABLE",datname))
  #write the dataframe
  outfile = paste(datname, ".csv", sep = "")
  write.csv(dat, outfile)
  gc()   # don't know why, but memory is 
         # not released after writing large csv?
  # now copy the data into the table.  If this doesn't work,
  # be sure that postgres has read permissions for the path
  sqlQuery(con,  
  paste("COPY ", datname, " FROM '", 
    getwd(), "/", datname, 
    ".csv' WITH NULL AS 'NA' DELIMITER ',' CSV HEADER;", 
    sep=""))
  unlink(outfile)
}
odbcClose(con)
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论