开发者

StringToDecimal Function - T-SQL Question

I have some dirty input data that is being imported into a raw source table within SQL Server (2008 R2). Fields that are defined as decimal(9,2) or decimal(4,2) by the input provider are coming in as strings, however, the strings do not always conform to the data definition (go figure!).

We import the data from flat files into the raw tables,then apply some conversion scripts to insert the 'cleaned' data into tables with the proper data types assigned to columns.

For instance:

raw_table
TotalAmount varchar(12)

clean_table
TotalAmount decimal(9,2)

Now, my question is this. If I want to do some 'basic' cleanup on this, I would want to do it in a function along the lines of:

CREATE FUNCTION [dbo].[StringToDecimal]
(
    @conversionString VARCHAR(12)
)   
RETURNS DECIMAL(9,2)
AS
BEGIN   

    DECLARE @rsp DECIMAL(9,2)

    IF ISNUMERIC( LTRIM(RTRIM(REPLACE(@conversionString,' ',''))) ) = 1
         BEGIN
             SET @rsp = ISNULL( CONVERT( decim开发者_如何学JAVAal(17,6), NULLIF( LTRIM(RTRIM(REPLACE(@conversionString,' ',''))),'') ), 0 )
         END
    ELSE
         BEGIN
             SET @rsp = 0 -- or we can return NULL here
         END

    RETURN @rsp
END

However, how could one go about supporting various sized decimals in this mix? Is there a way to parametrize the response type? I considered just returning a decimal of the largest size we generally see, then converting it again on the other end, however, you run into arithmetic overflow issues.

Would appreciate any thoughts/insight into solving this one!


Is there a way to parametrize the response type?

It's simpler than you think. Just return as a VARCHAR and do the casting to decimal(x,y) from the VARCHAR. You don't even need to cast - you can directly assign a VARCHAR (as long as it holds valid decimal data) to a decimal column/variable.

I will create 2 functions instead. StringToDecimal2 does the actual conversion, but returns one of 6 "error codes". You can use it to check why a string is invalid. Or use the wrapper dbo.StringToDecimal which just turns the invalid codes into NULL.

CREATE FUNCTION [dbo].[StringToDecimal2]
(
    @conversionString VARCHAR(12),
    @precision int,  -- total digits
    @scale int  -- after decimal point
)   
RETURNS VARCHAR(100)
AS
BEGIN
    -- remove spaces, we'll allow this error. no need to trim
    set @conversionString = REPLACE(@conversionString,' ','')
    -- note: 1,234.56 (thousands separated) will be invalid, so will 1,234,56 (European decimals)
    -- well, ok, let's clean up the thousands separators. BUT! It will incorrectly scale European decimals
    set @conversionString = REPLACE(@conversionString,',','')

    -- we don't support scientific notation either, so 1e4 (10,000) is out

    if @conversionString like '%[^0-9.+-]%' return 'INVALID1' -- only digits and decimal are valid (plus +-)
    if @conversionString like '%.%.%' return 'INVALID2' -- too many decimals
    if @conversionString like '_%[+-]%' return 'INVALID3' -- +- symbol not in the first position
    if @conversionString like '[.+-]' return 'INVALID4' -- a single character from "+-."
    if @conversionString like '[+-].' return 'INVALID5' -- symbol and decimal only

    -- add a decimal place so it is easier to work with below
    if @conversionString not like '%.%'
        set @conversionString = @conversionString + '.'

    -- allow decimal places to go only as far as scale
    set @conversionString = left(@conversionString, charindex('.', @conversionString)+@scale)

    -- ensure the data is within precision number of digits in total
    if charindex('.', @conversionString) > @precision - @scale + 1
        return 'INVALID6' -- too many digits before decimal

    RETURN @conversionString
END
GO

CREATE FUNCTION [dbo].[StringToDecimal]
(
    @conversionString VARCHAR(12),
    @precision int,  -- total digits
    @scale int  -- after decimal point
)
RETURNS VARCHAR(100)
AS
BEGIN
RETURN case when [dbo].[StringToDecimal2](@conversionString, @precision, @scale) like 'INVALID%'
then null else [dbo].[StringToDecimal2](@conversionString, @precision, @scale) end
END
GO

Some tests:

select [dbo].[StringToDecimal2]('12342342', 9,2)

select convert(decimal(9,2),[dbo].[StringToDecimal]('1234234', 9,2))
select convert(decimal(9,2),[dbo].[StringToDecimal]('12342342', 9,2))
select convert(decimal(9,2),[dbo].[StringToDecimal]('123423.3333', 9,2))
select convert(decimal(20,10),[dbo].[StringToDecimal]('123423sd.3333', 20,10))
select convert(decimal(20,10),[dbo].[StringToDecimal]('123423sd..3333', 20,10))
select convert(decimal(20,10),[dbo].[StringToDecimal]('-123423.3333', 20,10))
select convert(decimal(20,10),[dbo].[StringToDecimal]('+123423..3333', 20,10))


Thanks for the extra information. It sounds like you have three steps:

  1. Remove all characters from the string that are not digits or a decimal point (do you ever get multiple points in one string?)
  2. Convert to (9,5) or (4,1) as appropriate (how do you decide this? is there rounding? does 10X.781 become 10.78100 or 10.7 or 10.8?)
  3. Insert/update the final value somewhere

Based on point 1 alone, I would immediately avoid TSQL and think about an external script or CLR procedure. A CLR function could do the parsing, but you still have the problem of returning different data types.

Since this appears to be some kind of ETL task, in my environment I would probably implement it as a script component in an SSIS package. The component would do the parsing and send the clean data to different outputs for further processing. If it was a one-time task I would use a Python script to parse the input data and generate INSERT or UPDATE statements.

I don't know if any of those solutions are suitable for you, but maybe it'll give you some ideas. And you should probably avoid the ISNUMERIC() function; search this site or Google to find some of the 'strange' input that it considers to be numeric.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜