Filtering a data frame with multiple conditions

2023-03-23 01:36 问答作者：

I am looking to subset a data frame in R. My syntax is obviously wrong (ie producing the wrong results).

    data[i,]$m_cnt <- nrow(w_data[
        w_data$direction >= data[i,]$min_a &
        w_data$direction < data[i,]$max_a & 
        w_data$windspeed >= 3 & 
        w_data$windspeed < 15,
    ])/records;

Similar question: Filtering a data.frame

The w_data data.frame (simplified for brevity) consists of wind speed and wind direction time-series data.

time_stamp          windspeed    direction
2010-06-01 00:00    12.2          125
2010-06-03 02:50    17.4          312
2010-06-05 21:30    2.1           132
2010-06-12 15:10    7.8           71
2010-06-22 17:40    2.6           307
2010-06-30 03:20    5.1           310

The above R statement is supposed to count the number of records within a certain wind direction range, say for instance >=120° and <135° and within a certain wind speed range, in this example >=3m/s and <15m/s. The count is then converted into a percentage of the total number of measurements taken, so the above example should be equal to 1 record out of 6 = 16.66%. The percentage i开发者_开发技巧s then recorded into another data.frame (data) which has the structure:

min_a    max_a    l_cnt    m_cnt    h_cnt
0        15       0        0        0
15       30       0        0        0
30       45       0        0        0 
45       60       0        0        0 
60       75       0        0.1666   0
75       90       0        0        0
90       105      0        0        0
105      120      0        0        0 
120      135      0.1666   0.1666   0
135      150      0        0        0
150      165      0        0        0
165      180      0        0        0
180      195      0        0        0 
195      210      0        0        0 
210      225      0        0        0
225      240      0        0        0
240      255      0        0        0
255      270      0        0        0 
270      285      0        0        0
285      300      0        0        0
300      315      0.1666   0.1666   0.1666
315      330      0        0        0 
330      345      0        0        0
345      360      0        0        0

The problem I am experiencing is that the sum of all percentages do not equal 100% (this example does, but not I run my script over 10,000's of records).

I have also experienced weird results, such as:

    data[i,]$l_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a &  
                                w_data$windspeed <= 3,
                          ])/records;

    data[i,]$m_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a & 
                                w_data$windspeed <= 15,
                          ])/records;

    data[i,]$h_cnt <- nrow(w_data[
                                w_data$direction >= data[i,]$min_a &
                                w_data$direction < data[i,]$max_a & 
                                w_data$windspeed > 15,
                          ])/records;

Produces totals of:

l_cnt    0,360637343 
m_cnt    0,187836625
h_cnt    0,811938959
total    1,360412926

But if I qualify the m_cnt calculation with a greater than and less than, ie:

    data[i,]$m_cnt <- nrow(w_data[
        w_data$direction >= data[i,]$min_a &
        w_data$direction < data[i,]$max_a & 
        w_data$windspeed >= 3 & 
        w_data$windspeed < 15,
    ])/records;

I get:

l_cnt    0
m_cnt    0,360637343
h_cnt    0,811938959
total    1,172576302

probably this is near from what you want:

> # data
> w_data
  windspeed direction
1      12.2       125
2      17.4       312
3       2.1       132
4       7.8        71
5       2.6       307
6       5.1       310

> # grouping by cut
> w_data <- transform(w_data,
+                     dg = cut(direction, breaks=0:24*15),
+                     wg = cut(windspeed, breaks=c(0, 3, 15, Inf)))

> # now the data looks like:
> w_data
  windspeed direction        dg       wg
1      12.2       125 (120,135]   (3,15]
2      17.4       312 (300,315] (15,Inf]
3       2.1       132 (120,135]    (0,3]
4       7.8        71   (60,75]   (3,15]
5       2.6       307 (300,315]    (0,3]
6       5.1       310 (300,315]   (3,15]

> # tabulate and calculate the parcentage
> table(w_data$dg, w_data$wg) / nrow(w_data)

                (0,3]    (3,15]  (15,Inf]
  (0,15]    0.0000000 0.0000000 0.0000000
  (15,30]   0.0000000 0.0000000 0.0000000
  (30,45]   0.0000000 0.0000000 0.0000000
  (45,60]   0.0000000 0.0000000 0.0000000
  (60,75]   0.0000000 0.1666667 0.0000000
  (75,90]   0.0000000 0.0000000 0.0000000
  (90,105]  0.0000000 0.0000000 0.0000000
  (105,120] 0.0000000 0.0000000 0.0000000
  (120,135] 0.1666667 0.1666667 0.0000000
  (135,150] 0.0000000 0.0000000 0.0000000
  (150,165] 0.0000000 0.0000000 0.0000000
  (165,180] 0.0000000 0.0000000 0.0000000
  (180,195] 0.0000000 0.0000000 0.0000000
  (195,210] 0.0000000 0.0000000 0.0000000
  (210,225] 0.0000000 0.0000000 0.0000000
  (225,240] 0.0000000 0.0000000 0.0000000
  (240,255] 0.0000000 0.0000000 0.0000000
  (255,270] 0.0000000 0.0000000 0.0000000
  (270,285] 0.0000000 0.0000000 0.0000000
  (285,300] 0.0000000 0.0000000 0.0000000
  (300,315] 0.1666667 0.1666667 0.1666667
  (315,330] 0.0000000 0.0000000 0.0000000
  (330,345] 0.0000000 0.0000000 0.0000000
  (345,360] 0.0000000 0.0000000 0.0000000

继续阅读：dataframe

Filtering a data frame with multiple conditions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？