开发者

T-SQL Query to identify varchar fields that consist of a single repeating char/digit?

I need to clean phone numbers stored as varchar. There is bad data where unknown phone numbers are stored as a sequence of a single digit. Eventually more complex (area code & prefix matching) will be done but I want a simply query to obviously bad records.

So for examp开发者_Go百科le:

Valid Phone Number: 3289903829

Invalid Phone Number: 1111111111

Now if the bogus Product ID are the proper length ( 10 digits) it is easy to parse and clean.

    SELECT phone
    FROM customers 
    SET phone = NULL 
    WHERE phone IN ('0000000000','9999999999',''8888888888','7777777777','6666666666','5555555555','4444444444','3333333333','2222222222','1111111111') 

However sometimes the bogus phones are of arbitrary length (likely due to typos) so 11 ones or 9 ones, or n ones.

How can I ID strings that consists of all of the same char/digit?

1111111 - match
4444 - match
1112 - no match
4445555 - no match 


You can get the first character and replicate it:

where phone = replicate(left(phone,1), len(phone))
    and phone is not null


Depending on how fast you need it to run, your other option is to populate a temp table and then join your phone number on it. If you are doing it multiple times, you could even create a real table so you don't have to re-create it each run. To make it faster you could also index the field. Your may mileage may vary on fast you need to it to be compared to the number of records you have to compare.

CREATE TABLE #Numbers
(
    PhoneNumber VARCHAR(13) NOT NULL
)

DECLARE @run BIT
SET @run = 1

DECLARE @number INT
SET @number = 1

DECLARE @Counter INT 
SET @Counter = 1

WHILE(@run = 1)
BEGIN 

WHILE(@Counter < 13)
BEGIN 
    INSERT INTO #Numbers
    SELECT REPLICATE(@number,@counter)

    SET @Counter = @Counter + 1
END


SET @Counter = 1
SET @number = @number + 1

IF(@number > 9)
BEGIN 
    SET @run = 0
END

END

SELECT * FROM Phone p JOIN #numbers n ON p.PhoneNumber = n.PhoneNumber

This way you don't have to recalculate the field you are comparing the number to each time.


Maybe you could create a SQL function to do this.

I think the guts of it would look something like this:

DECLARE @field varchar(10) 
SET @field = '11111'

DECLARE @len INT
SET @len = LEN(@field)

DECLARE @counter INT
SET @counter = 1

DECLARE @firstChar VARCHAR(1)
SET @firstChar = NULL

DECLARE @currentChar VARCHAR(1)
SET @currentChar = NULL

DECLARE @allSameNumber BIT
SET @allSameNumber = 1

WHILE @counter <= @len AND @allSameNumber = 1 BEGIN

    SET @currentChar = SUBSTRING(@field,@counter,1) 
    IF @firstChar IS NULL BEGIN
        SET @firstChar = @currentChar
    END 
    IF NOT ISNUMERIC(@currentChar) = 1 OR NOT @currentChar = @firstChar BEGIN
        SET @allSameNumber = 0
    END
    SET @counter = @counter + 1

END

SELECT @allSameNumber
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜