开发者

multiple missing types from SAS to SQL server

I have a SAS dataset that I am converting to sql server 2005. In the sas dataset there are numeric 开发者_JAVA技巧variables that allow for multiple missing values.

For example column AGE is specified as

0-124 = number

. = missing

.A = Invalid

.B = Unavailable

Now I want to use a number for the sql server datatype and need to maintain the integrity of the data. My first thought is to create a reference table for the actual value from sas:

-1 = .A = Invalid

-2 = .B = Unavailable

This seems messy since age would require entry for each age 1-124. Curious if anyone else has run across this and has a better solution.


I imagine that most of the time you're not going to care about why values are missing, just that they are missing - but by encoding it as negative integers, you'll have to be sure to filter those out every time you do anything with that variable. That's backwards: you should do something special to incorporate the kind-of-missingness information, not to run basic math and statistics.

Rather than trying to recreate in SQL Server a data type that it doesn't have, why not work with what it's got? I would probably just send the missing values as NULL to SQL Server and then encode the type of missingness in a second variable. When you actually need to know why data are missing, it's still there; but when you don't, SQL Server will handle the missing values in your variable using its default methods for dealing with them.


You could use a format to recode the missings to negatives while leaving the non-missings unchanged.

proc format;
  value chgmiss
    .=.
    .a=-1
    .b=-2
    .c=-3
;
run;

data a;
input original;
new=put(original, chgmiss3.);
put original=  new=;
datalines;
.
9
102
3
2
15
90
.a
.b
.
78
;
run;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜