开发者

Select records with a substring from another table

I have this two tables:

data    
id   |email    
_   
1    |xxx@gmail.com    开发者_如何学Go
2    |yyy@gmial.com    
3    |zzzgimail.com 

errors    
_    
error    |correct    
@gmial.com|@gmail.com    
gimail.com|@gmail.com    

How can I select from data all the records with an email error? Thanks.


SELECT d.id, d.email
FROM data d
    INNER JOIN errors e ON d.email LIKE '%' + e.error

Would do it, however doing a LIKE with a wildcard at the start of the value being matched on will prevent an index from being used so you may see poor performance.

An optimal approach would be to define a computed column on the data table, that is the REVERSE of the email field and index it. This would turn the above query into a LIKE condition with the wildcard at the end like so:

SELECT d.id, d.email
FROM data d
    INNER JOIN errors e ON d.emailreversed LIKE REVERSE(e.error) + '%'

In this case, performance would be better as it would allow an index to be used.

I blogged a full write up on this approach a while ago here.


Assuming the error is always at the end of the string:

declare @data table (
    id int,
    email varchar(100)
)

insert into @data
    (id, email)
    select 1, 'xxx@gmail.com' union all
    select 2, 'yyy@gmial.com' union all
    select 3, 'zzzgimail.com'

declare @errors table (
    error varchar(100),
    correct varchar(100)
)

insert into @errors
    (error, correct)
    select '@gmial.com', '@gmail.com' union all
    select 'gimail.com', '@gmail.com'   

select d.id, 
       d.email, 
       isnull(replace(d.email, e.error, e.correct), d.email) as CorrectedEmail
    from @data d
        left join @errors e
            on right(d.email, LEN(e.error)) = e.error


Well, in reality you can't with the info you have provided.

In SQL you would need to maintain a table of "correct" domains. With that you could do a simple query to find non-matches.

You could use some "non" SQL functionality in SQL Server to do a regular expression check, however that kind of logic does not below in SQL (IMO).


select * from 
(select 1 as id, 'xxx@gmail.com' as email union
 select 2 as id, 'yyy@gmial.com' as email union
 select 3 as id, 'zzzgimail.com' as email) data join

(select '@gmial.com' as error, '@gmail.com' as correct union
 select 'gimail.com' as error, '@gmail.com' as correct ) errors

 on data.email like '%' + error + '%' 

I think ... that if you didn't use a wildcard at the beginning but anywhere after, it could benefit from an index. If you used a full text search, it could benefit too.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜