How can I do SQL like operations on a R data frame?
For example, I have a data frame with data across categories and subcategories and I want to be able to get row with maximum value in a particular column etc.
SQL is what comes to mind first. But since I am not interested in joins or indices etc, python's list comprehensions would do the same thing better with a more modern syntax.
What's best practice in R for such operations?
EDIT:
For now I think I am fine with which.max
. Why I asked the question the way I did is simply that I have come to learn that in R there are many libraries etc doing pretty much the same thing. Just by reading the documentation it's very hard to evaluate how popular (ie how well the library fulfills its purpose). My personal experience with Python is that the day you figure ou开发者_StackOverflow中文版t how to use list comprehensions (with itertools
as a bonus), you are pretty much covered. Over time this has evolved as best practice, you don't see lambda
and filter
for example that often in the general python debate these days as list comprehensions does the same thing easier and more uniform.
If you really mean SQL, a pretty straightforward answer is the 'sqldf' package:
http://cran.at.r-project.org/web/packages/sqldf/index.html
From the help for ?sqldf
library(sqldf)
a1s <- sqldf("select * from warpbreaks limit 6")
Some additional context would help, but from the sounds of it - you may be looking for which.max()
or the related functions. For group by operations, I default to the plyr
family of functions, but there are certainly faster alternatives in base R if speed is of utmost importance.
library(plyr)
#Make a local copy of mycars data and add the rownames as a column since ddply
#seems to drop them. I've never encountered that before actually...
myCars <- mtcars
myCars$carname <- rownames(myCars)
#Find the max mpg
myCars[which.max(myCars$mpg) ,]
mpg cyl disp hp drat wt qsec vs am gear carb carname
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1 Toyota Corolla
#Find the max mpg by cylinder category
ddply(myCars, "cyl", function(x) x[which.max(x$mpg) ,])
mpg cyl disp hp drat wt qsec vs am gear carb carname
1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
2 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
3 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
精彩评论