Recommendations for database with R [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
开发者_Python百科Closed 2 years ago.
Improve this questionI am using R to run simulations using time series data. I have been using arrays to store data but I need a less memory intensive solution for storing data at intermediate steps in order to document the process. I am not a programmer so I am looking for something relatively easy to setup on multiple platforms if possible (Windows, Mac, Linux). I also need to be able to directly call the database from R since learning another language is not feasible now. Ideally, I would like to be able to read and write frequently to the database in a manner similar to an array though I don't know if that is realistic. I will gladly sacrifice speed for ease of use but I am willing to work to learn open source solutions. Any suggestions would be appreciated.
Quick comments:
- R is good at this, as a language for programming with data, there are plenty of interfaces
- There is an entire manual devoted to data import/export, and it has a section on relational databases, so start there.
- R has the widely-used DBI package which provides a unified interface for many backends, among them SQLite, MySQL, PostgreSQL, Oracle, ... Use that, maybe with RSQLite to get something going quickly. You can still switch backends afterwards.
- There is also RODBC but I find ODBC tedious to work with.
- R also has a specialised variant in the TSdbi package by Paul Gilbert which brings the DBI-alike abstraction to timeseries databases. It also supports multiple backends.
- The data.table package was written for this and is very fast on indexing and aggregation.
I also need to be able to directly call the database from R
I suggest setting up MySQL with RMySQL interface.
Once the DB connection is open, you can query the database and get the the data into R, example:
# Run an SQL statement by creating first a resultSet object
rs <- dbSendQuery(con, statement = paste(
"SELECT w.laser_id, w.wavelength, p.cut_off",
"FROM WL w, PURGE P",
"WHERE w.laser_id = p.laser_id",
"SORT BY w.laser_id")
# we now fetch records from the resultSet into a data.frame
data <- fetch(rs, n = -1) # extract all rows
RMySQL: R interface to the MySQL database
Database interface and MySQL driver for R. This version complies with the database interface definition as implemented in the package DBI 0.2-2.
MySQL Database:
Available for all the platforms you cited in the question, and more, download here.
Do you really need a database solution for your purpose? You say you want a "solution for storing data at intermediate steps " -- how about simply saving the data array to disk at the required time points?
Edit: to make it possible to retrieve the information, you can embed meta-information, e.g. trial index and/or timestamp, in the filename. Then later you can locate and load the file using the correct filename.
You can also take a look at the ff package.
精彩评论