SQL to standardize PostalCode data
I have a SQL Server table with one column that stores Canadian Postal Codes. Sometimes they're in the proper format H0H 0H0, and othertimes they're written like H0H0H0 or H0H-0H0. I'm looking to be able to update all records to standardize the format as H0H 0H0 but I cannot figure out the SQ开发者_高级运维L. Any help would be appreciated.
UPDATE PostalCodes SET PostalCode = LEFT(PostalCode, 3) + ' ' + RIGHT(PostalCode, 3 )
This assumes that the postal code field has already been trimmed on both the left and right sides. It literally grabs the left 3 digits, places a space and then the right 3 digits.
Assuming that the following constraints hold:
- the field in question is always at least 6 characters in length.
- the first segment of the postal code is in columns 1-3 of the field
- the last segment is in the rightmost 3 columns of the field.
- There are zero or more extraneous characters located at column 4.
This should do the trick, for any number of characters in the field.:
update PostalCodes
set PostalCode = stuff( PostalCode , 4 , len(PostalCode) - 6 , ' ' )
Alternatively, you could do something like this — a case statement to look for all the different ways people have found to create junk postal codes and treat each according to its needs:
update PostalCodes
set PostalCode = case
when PostalCode like '[A-Z][0-9][A-Z][ -][0-9][A-Z][0-9]' then ...
when PostalCode like '![A-Z][0-9][A-Z][0-9][A-Z][0-9]' then ...
...
else PostalCode
end
A select statement like
select PostalCode,count(*) from PostalCodes where PostalCode not like '[A-Z][0-9][A-Z] [0-9][A-Z][0-9]' group by PostalCode order by 2 desc
should show you the junk data.
精彩评论