开发者

xtpcse from Stata - how to rewrite in R

I am currently learning R. I have no previous knowledge of STATA.

I want to reanalyze a study which was done in Stata (xtpcse linear regression with panel-corrected standard errors). I could not find the model or more detailed code in Stata or any other hint how to rewrite this in R. I have the plm package for econometrics installed for R. That's as far as I got.

The first lines of the .do file from STATA are copied below (I just saw that it's pretty unreadable. Here is a link to the txt file in which I copied the .do content: http://dl.dropbox.com/u/4004629/This%20was%20in%20the%20.do%20file.txt). I have no idea of how to go about this in a better way. I tried google-ing STATA and R comparison and the like but it did not work.

All data for the study I want to replicate are here:

https://umdrive.memphis.edu/rblanton/public/ISQ_data

---STATA---
Group variable:   c_code                        Number of obs      =       265
Time variable:    year                          Number of groups   =        27
Panels:           correlated (unbalanced)       Obs per group: min =         3
Autocorrelation:  common AR(1)                                 avg =  9.814815
Sigma computed by pairwise selection                           max =        14
Estimated covariances      =       378          R-squared          =    0.8604
Estimated autocorrelations =         1          Wald chi2(11)      =   8321.15
Estimated coefficients     =        15          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
             |           Panel-corrected
        food |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lag_food |   .8449038    .062589    13.50   0.000     .7222316     .967576
        ciri |   -.010843   .0222419    -0.49   0.626    -.0544364    .0327504
   human_cap |   .0398406   .0142954     2.79   0.005     .0118222    .0678591
  worker_rts |  -.1132705   .0917999    -1.23   0.217    -.2931951     .066654
    polity_4 |   .0113995    .014002     0.81   0.416    -.0160439    .0388429
 market_size |   .0322474   .0696538     0.46   0.643    -.1042716    .1687665
      income |   .0382918   .0979499     0.39   0.696    -.1536865    .2302701
 econ_growth |   .0145589   .0105009     1.39   0.166    -.0060224    .0351402
   log_trade |  -.3062828   .1039597    -2.95   0.003    -.5100401   -.1025256
  fix_dollar |  -.0351874   .1129316    -0.31   0.755    -.2565293    .1861545
    fixed_xr |  -.4941214   .2059608    -2.40   0.016     -.897797   -.0904457
    xr_fluct |   .0019044   .0106668     0.18   0.858    -.0190021    .0228109
  lab_growth |   .0396278   .0277936     1.43   0.154    -.0148466    .0941022
     english |  -.1594438   .1963916    -0.81   0.417    -.5443641    .2254766
       _cons |   .4179213   1.656229     0.25   0.801    -2.828227     3.66407
-------------+----------------------------------------------------------------
         rho |   .0819359
------------------------------------------------------------------------------

. xtpcse fab_metal lag_fab_metal ciri human_cap worker_rts polity_4 market
> income econ_growth log_trade fix_dollar fixed_xr xr_fluct lab_growth
> english, pairwise corr(ar1)

Update:

I just tried Vincent's code. I tried the pcse2 and vcovBK code, and they both worked (even though I'm not sure what to do with the correlation matrix that comes out of vcocBK).

However, I still have troubles reproducing the estimates of the regression coefficients in the paper I'm reanalyzing. I'm following their recipe as good as I can, the only step I'm missing is, I think, the part where in Stata "Autocorrelation: common AR(1)" is done. The paper I'm analyzing says: "OLS regression using panel corrected standard errors (Beck/Katz '95), control for first order correlation within each panel (corr AR1 option in Stata)."

How do I control for first order correlation within each panel in R?

Here is what I did so far on my data:

## run lm 
res.lm <- lm(total_FDI ~ ciri + human_cap + worker_rts + polity开发者_JAVA百科_4 + lag_total + market_size + income + econ_growth + log_trade + fixed_xr + fix_dollar + xr_fluct + english + lab_growth, data=D)
## run pcse
res.pcse <- pcse2(res.lm,groupN="c_code",groupT="year",pairwise=TRUE)


As Ramnath mentioned, the pcse package will do what Stata's xtpcse does. Alternatively, you could use the vcovBK() function from the plm package. If you opt for the latter option, make sure you use the cluster='time' option, which is what the Beck & Katz (1995) article suggests and what the Stata command implements.

The pcse package works well, but there are some issues that makes a lot of intuitive user inputs unacceptable, especially if your dataset is unbalanced. You might want to try this re-write of the function that I coded a while ago. Just load the pcse package, load the pcse2 function, and use it by following the instructions in the pcse documentation. IMHO, the function pasted below is cleaner, more flexible and more robust than the one provided by the pcse folks. Simple benchmarks also suggest that my version may be 5 to 10 times faster than theirs, which may matter for big datasets.

Good luck!

library(Matrix)
pcse2 <- function(object, groupN, groupT, pairwise=TRUE){
  ## Extract basic model info
  groupT <- tail(as.character((match.call()$groupT)), 1)
  groupN <- tail(as.character((match.call()$groupN)), 1)
  dat <- eval(parse(text=object$call$data))

  ## Sanity checks
  if(!"lm" %in% class(object)){stop("Formula object must be of class 'lm'.")}
  if(!groupT %in% colnames(dat)){stop(paste(groupT, 'was not found in data', object$call$data))}
  if(!groupN %in% colnames(dat)){stop(paste(groupN, 'was not found in data', object$call$data))}
  if(anyDuplicated(paste(dat[,groupN], dat[,groupT]))>0){stop(paste('There are duplicate groupN-groupT observations in', object$call$data))}
  if(length(dat[is.na(dat[,groupT]),groupT])>0){stop('There are missing unit indices in the data.')}
  if(length(dat[is.na(dat[,groupN]),groupN])>0){stop('There are missing time indices in the data.')}

  ## Expand model frame to include groupT, groupN, resid columns.
  f <- as.formula(object$call$formula)
  f.expanded <- update.formula(f, paste(". ~ .", groupN, groupT, sep=" + "))
  dat.pcse <- model.frame(f.expanded, dat) 
  dat.pcse$e <- resid(object)  

  ## Extract basic model info (part II)
  N <- length(unique(dat.pcse[,groupN]))
  T <- length(unique(dat.pcse[,groupT]))
  nobs <- nrow(dat.pcse)
  is.balanced <- length(resid(object)) == N * T

  ## If balanced dataset, calculate as in Beck & Katz (1995)
  if(is.balanced){
    dat.pcse <- dat.pcse[order(dat.pcse[,groupN], dat.pcse[,groupT]),]
    X <- model.matrix(f, dat.pcse)
    E <- t(matrix(dat.pcse$e, N, T, byrow=TRUE))
    Omega <- kronecker((crossprod(E) / T), Matrix(diag(1, T)) )

  ## If unbalanced and pairwise, calculate as in Franzese (1996)
  }else if(pairwise==TRUE){
    ## Rectangularize
    rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
    names(rectangle) <- c(groupN, groupT)
    rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
    rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
    valid <- ifelse(is.na(rectangle$e),0,1) 
    rectangle[is.na(rectangle)] <- 0
    X <- model.matrix(f, rectangle)
    X[valid==0,1] <- 0

    ## Calculate pcse
    E <- crossprod(t(matrix(rectangle$e, N, T, byrow=TRUE)))
    V <- crossprod(t(matrix(valid, N, T, byrow=TRUE)))
    if (length(V[V==0]) > 0){stop("Error! A CS-unit exists without any obs or without any obs in a common period with another CS-unit. You must remove that unit from the data passed to pcse().")}
    Omega <-  kronecker(E/V, Matrix(diag(1, T)))

  ## If unbalanced and casewise, caluate based on largest rectangular subset of data
  }else{ 
    ## Rectangularize
    rectangle <- expand.grid(unique(dat.pcse[,groupN]), unique(dat.pcse[,groupT]))
    names(rectangle) <- c(groupN, groupT)
    rectangle <- merge(rectangle, dat.pcse, all.x=TRUE)
    rectangle <- rectangle[order(rectangle[,groupN], rectangle[,groupT]),]
    valid <- ifelse(is.na(rectangle$e),0,1) 
    rectangle[is.na(rectangle)] <- 0
    X <- model.matrix(f, rectangle)
    X[valid==0,1] <- 0

    ## Keep only years for which we have the max number of observations
    large.panels <- by(dat.pcse, dat.pcse[,groupT], nrow) # How many valid observations per year?
    if(max(large.panels) < N){warning('There is no time period during which all units are observed. Consider using pairwise estimation.')}
    T.balanced <- names(large.panels[large.panels==max(large.panels)]) # Which years have max(valid observations)?
    T.casewise <- length(T.balanced)
    dat.balanced <- dat.pcse[dat.pcse[,groupT] %in% T.balanced,] # Extract biggest rectangular subset
    dat.balanced <- dat.balanced[order(dat.balanced[,groupN], dat.balanced[,groupT]),]
    e <- dat.balanced$e

    ## Calculate pcse as in Beck & Katz (1995)
    E <- t(matrix(dat.balanced$e, N, T.casewise, byrow=TRUE))
    Omega <- kronecker((crossprod(E) / T.casewise), Matrix(diag(1, T)))
  }

  ## Finish evaluation, clean and output
  salami <- t(X) %*% Omega %*% X
  bread <- solve(crossprod(X))
  sandwich <- bread %*% salami %*% bread
  colnames(sandwich) <- names(coef(object))
  row.names(sandwich) <- names(coef(object))
  pcse <- sqrt(diag(sandwich))
  b <- coef(object)
  tstats <- b/pcse
  df <- nobs - ncol(X)
  pval <- 2*pt(abs(tstats), df, lower.tail=FALSE)
  res <- list(vcov=sandwich, pcse=pcse, b=b, tstats=tstats, df=df, pval=pval, pairwise=pairwise, 
              nobs=nobs, nmiss=(N*T)-nobs, call=match.call())
  class(res) <- "pcse"
  return(res)
}


Look at the pcse package, which considers panel corrected standard errors. You certainly have to look at the documentation in STATA to figure out the assumptions made and cross check that with pcse.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜