Reshaping Data in "chains" format (stata .DTA file)
I've got data in "chain" format where there are subjects that get a treatment "locks" and subjects, or "links", that are recruited from each "lock". Therefore, my data are shaped both widely and long - how can I write a Stata .DTA program to reshape for running models? My data start like this
idlock idlink1 idlink2 ...
1 10 11 ...
2 20 21 ...
21 30 开发者_Go百科 31 ...
and a link can be come a lock later on, but it is still a part of the chain of the original lock. So, 21 is a link in the chain that starts with 1. There are up to 5 links for each new lock (idlink1-idlink5)
More details on what you want to do with the data are needed, but the first thing I would do is create some vars that summarize the number of links per lock (or describe the chains). Then you can treat the data as long panel data with the initial lock as the panelid and the timevar as the number of links or nodes in the chain. I assume you have some more variables in the dataset that you want to model (I've generated them as a random DV and some IVs), then you can model whatever it is you want to model using the suite of -xt- commands in Stata (some examples are provided below):
*******************************! BEGIN EXAMPLE
//this first part will input the dataset into stata//
clear
inp id link0 link1 link2 link3 link4
1 1 2 3 4 5
1000 97 98 99 . .
3 . . . . .
4 . . . . .
5 6 7 8 9 10
6 . . . . .
7 . . . . .
8 11 12 13 14 15
9 . . . . .
10 . . . . .
11 . . . . .
12 . . . . .
13 . . . . .
14 . . . . .
15 . . . . .
99 100 . . . . .
100 101 . . . .
101 . . . . .
end
//grab local macro with variables of interest//
unab cou: link*
di "`cou'"
//1. DETERMINE THE INITIAL LOCK//
tempvar pn
g `pn' = .
forval z=0/4{
forval x=1/`=_N' {
replace `pn'= id[_n-`x'] if id==link`z'[_n-`x']
}
}
gen ilock=.
lab var ilock "Initial Lock #"
replace ilock=1 if mi(`pn')
order ilock
l ilock
//2. Links assoc. with each ilock //
**count those with no links established**
count if mi(link0)
//ilocks//
levelsof id if ilock==1, local(ilocks)
foreach n in `ilocks' {
//initial step//
preserve
keep if id==`n'
global s`n' "`=link0' `=link1' `=link2' `=link3' `=link4'"
di "${s`n'}"
global s`n':subinstr global s`n' "." "", all
di "${s`n'}"
restore
}
macro li
//branches off each ilock//
foreach n in `ilocks' {
//branches//
di in red "Branch `b' for macro s`n'"
di as err "${s`n'}"
forval b = 1/10 {
qui token `"${s`n'}"'
while "`1'" != "" {
*di in y "`1'"
preserve
keep if id==`1'
if _N==1 {
global s`n' ${s`n'} `=link0' `=link1' `=link2' `=link3' `=link4'
di "${s`n'}"
global s`n':subinstr global s`n' "." "", all
di in yellow "${s`n'}"
global s`n':list uniq global(s`n')
}
restore
mac shift
}
}
}
//g ilock_number = ilock number if ilocks==branches//
g ilock_number = .
foreach n in `ilocks' {
replace ilock_number = id if id==`n'
di in y "${s`n'}"
global s`n':list uniq global(s`n')
qui token `"${s`n'}"'
while "`1'" != "" {
di in y "`1'"
replace ilock_number = `n' if id==`1'
mac shift
}
}
order ilock_number
sort ilock_number id
count if mi(ilock)
**Decriptives:Count # OF linknodes**
sort ilock id
bys ilock_number: count if mi(ilock)
sort id ilock
bys ilock_number, rc0: g linknodes = _n
order id link* linknodes ilock_n
l id link* ilock linknodes ilock_n, ta clean div
**descriptives**
ta ilock
ta ilock linknodes
**here are all the chains in your data**
levelsof ilock_number, loc(al)
foreach v in `al' {
macro list s`v'
}
// Running models //
**what kind of model do you want to run?**
**assume using ids to identify panels-->
**create fake dv/iv's for models**
drawnorm iv1-iv5
g dv = abs(int(rbinomial(10, .5)))
xtset ilock_number linknodes
xtreg dv iv*, re
**or model some link/lock info like the #links**
bys ilock_number: g ttl_nodes = _N
xtpoisson ttl_nodes iv* dv , re
*******************************! END EXAMPLE
^note: watch for wrapping issues in the code above!
精彩评论