multiple missing types from SAS to SQL server
I have a SAS dataset that I am converting to sql server 2005. In the sas dataset there are numeric 开发者_JAVA技巧variables that allow for multiple missing values.
For example column AGE is specified as
0-124 = number
. = missing .A = Invalid .B = UnavailableNow I want to use a number for the sql server datatype and need to maintain the integrity of the data. My first thought is to create a reference table for the actual value from sas:
-1 = .A = Invalid
-2 = .B = UnavailableThis seems messy since age would require entry for each age 1-124. Curious if anyone else has run across this and has a better solution.
I imagine that most of the time you're not going to care about why values are missing, just that they are missing - but by encoding it as negative integers, you'll have to be sure to filter those out every time you do anything with that variable. That's backwards: you should do something special to incorporate the kind-of-missingness information, not to run basic math and statistics.
Rather than trying to recreate in SQL Server a data type that it doesn't have, why not work with what it's got? I would probably just send the missing values as NULL to SQL Server and then encode the type of missingness in a second variable. When you actually need to know why data are missing, it's still there; but when you don't, SQL Server will handle the missing values in your variable using its default methods for dealing with them.
You could use a format to recode the missings to negatives while leaving the non-missings unchanged.
proc format;
value chgmiss
.=.
.a=-1
.b=-2
.c=-3
;
run;
data a;
input original;
new=put(original, chgmiss3.);
put original= new=;
datalines;
.
9
102
3
2
15
90
.a
.b
.
78
;
run;
精彩评论