Select records with a substring from another table
I have this two tables:
data
id |email
_
1 |xxx@gmail.com 开发者_如何学Go
2 |yyy@gmial.com
3 |zzzgimail.com
errors
_
error |correct
@gmial.com|@gmail.com
gimail.com|@gmail.com
How can I select from data
all the records with an email error? Thanks.
SELECT d.id, d.email
FROM data d
INNER JOIN errors e ON d.email LIKE '%' + e.error
Would do it, however doing a LIKE with a wildcard at the start of the value being matched on will prevent an index from being used so you may see poor performance.
An optimal approach would be to define a computed column on the data table, that is the REVERSE of the email field and index it. This would turn the above query into a LIKE condition with the wildcard at the end like so:
SELECT d.id, d.email
FROM data d
INNER JOIN errors e ON d.emailreversed LIKE REVERSE(e.error) + '%'
In this case, performance would be better as it would allow an index to be used.
I blogged a full write up on this approach a while ago here.
Assuming the error is always at the end of the string:
declare @data table (
id int,
email varchar(100)
)
insert into @data
(id, email)
select 1, 'xxx@gmail.com' union all
select 2, 'yyy@gmial.com' union all
select 3, 'zzzgimail.com'
declare @errors table (
error varchar(100),
correct varchar(100)
)
insert into @errors
(error, correct)
select '@gmial.com', '@gmail.com' union all
select 'gimail.com', '@gmail.com'
select d.id,
d.email,
isnull(replace(d.email, e.error, e.correct), d.email) as CorrectedEmail
from @data d
left join @errors e
on right(d.email, LEN(e.error)) = e.error
Well, in reality you can't with the info you have provided.
In SQL you would need to maintain a table of "correct" domains. With that you could do a simple query to find non-matches.
You could use some "non" SQL functionality in SQL Server to do a regular expression check, however that kind of logic does not below in SQL (IMO).
select * from
(select 1 as id, 'xxx@gmail.com' as email union
select 2 as id, 'yyy@gmial.com' as email union
select 3 as id, 'zzzgimail.com' as email) data join
(select '@gmial.com' as error, '@gmail.com' as correct union
select 'gimail.com' as error, '@gmail.com' as correct ) errors
on data.email like '%' + error + '%'
I think ... that if you didn't use a wildcard at the beginning but anywhere after, it could benefit from an index. If you used a full text search, it could benefit too.
精彩评论