开发者

Extract Date in R

I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.

I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.

> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"

I "want" it to be.

> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18开发者_开发百科" "10-20"
> class((ds$date))
[1] "Date"

If possible, I would love to set the first date to be October 1st instead of January 1st.

Any help you can provide will be greatly appreciated.

EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.


> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"


Is this what you are looking for?

library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)

## make up some score data  
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))

## create a data frame   
df <- data.frame(a.date, b.date, a.score, b.score)
df

## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt

Then, I am using ggplot2 for plotting the data:

## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +   
geom_line() + facet_wrap(~variable, ncol = 1) + 
scale_x_date("Date", format = "%m-%d")

If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.

Extract Date in R


You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :

> str(Date)
Class 'Date'  num [1:10] 14245 14360 14475 14590 14705 ...

This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.

Now if you want to set the first date of a year as October 1st, you can construct some year index like this :

redefine.year <- function(x,start="10-1"){
  year <- as.numeric(strftime(x,"%Y"))
  yearstart <- as.Date(paste(year,start,sep="-"))

  year + (x >= yearstart) - min(year) + 1
}

Testing code :

Start <- as.Date("2009-1-1")    
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)

data.frame( Date=as.character(Date),
            year=redefine.year(Date))

gives

         Date year
1  2009-01-01    1
2  2009-04-25    1
3  2009-08-18    1
4  2009-12-11    2
5  2010-04-05    2
6  2010-07-29    2
7  2010-11-21    3
8  2011-03-16    3
9  2011-07-09    3
10 2011-11-01    4
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜