How to add a new column and aggregate values in R
I am completely new to gnuplot and am only trying this because I need to learn it. I have a values in three columns where the first represents the filename (date and time, one hour interval) and the remaining two columns represent two different entities Prop1 and Prop2.
Datetime Prop1 Prop2
20110101_0000.开发者_JAVA百科txt 2 5
20110101_0100.txt 2 5
20110101_0200.txt 2 5
...
20110101_2300.txt 2 5
20110201_0000.txt 2 5
20110101_0100.txt 2 5
...
20110201_2300.txt 2 5
...
I need to aggregate the data by the hour of the day (the **_0100) which is the last four numeric digits. So, I want to create another column called hour which tells me the hour of the day. It means 0000 = 0h, 0100 = 1h, ...... 2200 = 22h
etc.
I then want to get the sum of Prop1 and Prop2 for each hour, so in the end get something like.
Hour Prop1 Prop2
0h 120 104
1h 230 160
...
10h 90 110
...
23h 100 200
and the get a line plot of Prop1 and Prop2.
A general solution with gsub :
Data$Hour <- gsub(".+_(\\d+).txt","\\1",Data$Datetime)
EDIT :
You can use Data$Hour <- substr(Data$Hour,1,2)
to get just the hour. As said in the comments, if you always have exactly the same structure in Datetime, you could use substr()
immediately:
Data$Hour <- substr(Data$Datetime,10,11)
Then you can use aggregate
, tapply
, by
, ... whatever to do what you want. To sum both Prop1 and Prop2, you can use aggregate, eg:
aggregate(Data[2:3],list(Data$Hour),sum)
with the dataset :
zz<-textConnection("Datetime Prop1 Prop2
20110101_0000.txt 2 5
20110101_0100.txt 2 5
20110101_0200.txt 2 5
20110101_2300.txt 2 5
20110201_0000.txt 2 5
20110201_0100.txt 2 5
20110201_0200.txt 2 5
20110201_2300.txt 2 5")
Data <- read.table(zz,header=T,as.is=T)
精彩评论