开发者

Help with persisting problem when using 'subset' function in R

I would like to use the subset function in R to extract smaller groups of panel study time series data.

My data consists of a dataframe made up of six columns: district(8 districts), gender, age interval(4 groups), year, month and a count column.

Example:

  District Gender Year Month AgeGroupNew TotalDeaths
1 Eastern  Female 2003     1           0           4
2 Eastern  Female 2003     1        01-4           1
3 Eastern  Female 2003     1       05-14           1
4 Eastern  Female 2003     1         15+          91
5 Eastern  Female 2003     2           0           4
6 Eastern  Female 2003     2        01-4           1

I would like to extract smaller subset for each district, Gender and age interval to get something like this:

     District  Gender Year Month AgeGroupNew TotalDeaths
     Northern    Male 2003     1        01-4           0
     Northern    Male 2003     2        01-4           1
     Northern    Male 2003     3        01-4           0
     Northern    Male 2003     4        01-4           3
     Northern    Male 2003     5        01-4           4
     Northern    Male 2003     6        01-4           6
     Northern    Male 2003     7        01-4           5
     Northern    Male 2003     8        01-4           0
     Northern    Male 2003     9        01-4           1
     Northern    Male 2003    10        01-4           2
     Northern    Male 2003    11        01-4           0
     Northern    Male 2003    12        01-4           1
     Northern    Male 2004     1        01-4           1
     Northern    Male 2004     2        01-4           0

Going to

     Northern    Male 2006    11        01-4           0
     Northern    Male 2006    12        01-4           0

So far I have been trying to use this, thanks to DWin pointing it out in a previous question.

subset(datNew, subset=(District=="Eastern" &  Gender=="Female" &  AgeGroupNew=="01-4"))
[1] District    Gender      Year        Month       AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)

But R keeps on giving me the output as above - which it shouldn't.

I have tried other combinations with success, but it seems using 'District' in the subset causes this <0 rows> (or 0-length row.names).

This works:

> head(subset(datNew, Year=="2004" & Month=="8" & AgeGroupNew =="0"))
         District Gender Year Month AgeGroupNew TotalDeaths
77       Eastern  Female 2004     8           0          10
269      Eastern    Male 2004     8           0           6
461  Khayelitsha  Female 2004     8           0          13
653  Khayelitsha    Male 2004     8           0          15
845  Klipfontein  Female 2004     8           0           7
1037 Klipfontein    Male 2004     8           0           6

but not

> head(subset(datNew, District=="Eastern" & Gender=="Female" & AgeGroupNew =="0"))
[1] District    Gender      Year        Month       AgeGroupNew TotalDeaths
<0 rows> (or 0-length row.names)

Any reason why District is causing this? It's absolutely wrong that there are 0 rows with that combination of the subset - there's enough data to my knowledge.

I've tried experimenting - and from other posts, this is a baby step closer to what I want to achieve, but still not working:

> head(subset(datNew,datNew[[1]] %in% District[1] & Gender=="Female" & AgeGroupNew=="0"))
   District Gender Year Month AgeGroupNew TotalDeaths
1  Eastern  Female 2003     1           0           4
5  Eastern  Female 2003     2           0           4
9  Eastern  Female 2003     3           0           5
13 Eastern  Female 2003     4           0          12
17 Eastern  Female 2003     5           0           7
21 Eastern  Female 2003     6           0          13

With this I am unable to choose from the other Districts, such as "Southern", "Khayelitsha", etc. No matter what I change datNew[[1 or 2 or 3]] and Di开发者_开发百科strict[[1 or 2 or 3]]. I don't really know what %in% does above?

I am so stuck. Any help asseblief.


Prediction: Give us the results str(datNew$District[1]) and all will be revealed. I predict there is a non-printing character that will show up, perhaps a trailing space (or two).

So with the results of str(...) the correct code would be:

subset(datNew, District=="Eastern " & Gender=="Female" & AgeGroupNew =="0")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜