Last Observation Carried Forward (na.locf) on Panel (cross section) Time Series
Is there a way to use na开发者_如何学Python.locf functions to fill in NA values in a cross section (panel) time series data.
I have a panel dataset setup similarly to years of data, setup similarly to the following:
library(zoo) #actual [r] code and data!
library(plm)
data(Produc)
a<-data.frame(Produc)
b<-subset(a,state=="WYOMING"|state=="WISCONSIN",select = state:hwy) #limit to an easy subset)
The data has suppression (ie missing values not released by the government data agency) and I'd like to just pull next observations to fill in NA values.
b[[2,4]]<-NA
b[[17,4]]<-NA
b[[18,3]]<-NA
c<-na.locf(b,na.rm=FALSE,fromLast=FALSE)
Using the na.locf function will fill the NA's but nothing will stop it from pulling data incorrectly to fill in a city's last year with the next city's first year data.I am beginning to think that I need to split the dataframe into individual city frames.
Building on AzadA's comment
ddply identifies a specific subset within a data frame and applies the desired function to all of the pieces in that subset by levels of the variable(s) you choose.
library(plyr) new.data<-a$(vars you want to apply the function to, and vars needed for order and subset) format: ddply(data.frame,var(s) to be subset,function, further function commands)
new.data<- ddply(a,a$city,na.locf) #apply na.locf by city in vector order to all vars
a$b <- new.data$b # do this for each variable to swap in new information for old
For more info: http://cran.r-project.org/web/packages/plyr/plyr.pdf
As you suspect, the easiest way will be to have either to partition your data frame into separate data frames along either the City or Year dimension (using eg split
), use na.locf
, and then unsplit
.
Alternatively it might be easier if you structure your data with City being the rownames, and Year the colnames (or vice-versa), and have a list of matrices for each datum. In this case you just use apply to forward-fill across the appropriate dimension.
精彩评论