
Best R data structure to return table value counts

The following function returns a data.frame with two columns:

  q="SELECT t,count(*) AS count FROM data GROUP BY t"
  dbGetQuery(con,q)    #Returns a data frame

t is a DATE column, so output looks like:

       t       count(*)
1 2011-09-22     1438

All I'm really interested in is if any records for a given date already exist; but I will also use the count as a sanity check.

In C++ I'd return a std::map<std::string,int> or std::unordered_map<std::string,int> (*). In PHP I'd use an associative array with the date as the key.

What is the best data structure in R? Is it a 2-column data.frame? My first thought was to turn the t column into rownames:


But data.frame rownames are not unique, so conceptually it does not quite fit. I'm also not sure if it makes using it any quicker.

(Any and all definitions of "best": quickest, least memory, code clarity, least surprise for experienced R developers, etc开发者_高级运维. Maybe there is one solution for all; if not then I'd like to understand the trade-offs and when to choose each alternative.)

*: (for C++) If benchmarking showed this was a bottleneck, I might convert the datestamp to a YYYYMMDD integer and use std::unordered_map<int,int>; knowing the data only covers a few years I might even use a block of memory with one int per day between min(t) and max(t) (wrapping all that in a class).

Contingency tables are actually arrays (or matrices) and can very easily be created.The dimnames hold the values and the array/matrix at its "core" holds the count data. The "table" and "tapply" functions are natural creators. You access the counts with "[" and use dimnames( ) followed by an "[" to get you the row annd column names. I would say it was wiser to use the "Date" class for dates than storing in "character" vectors.





验证码 换一张
取 消

