开发者

update duplicate or similar records in the same table

I have a table that allows records to be inserted which are very similar but differ by a single column value or two. For example the following records are in the table:

ID 开发者_开发技巧   TITLE                                 URL                             COUNTRY

1494  Hollywood Reporter                    http://www.hollywoodreporter.com    USA
1497  Hollywood Reporter via Drudge Report  http://www.hollywoodreporter.com    NULL
2158  Hollywood Reporter via                http://www.hollywoodreporter.com    NULL

I would like to update the country column in the last two records where the url is the same. Also I would also like to know how to sort my table by url column so that all duplicate urls are grouped together or even if they are similar as you have in some cases like:

http://www.hollywoodreporter.com       http://www.blog.hollywoodreporter.com

Thanks in advance.

Tried the following and it worked where the urls were the same

UPDATE t1

SET t1.country = t2.country FROM Sources AS t1 JOIN sources AS t2 ON t1.url = t2.url WHERE t1.url= t2.url;

just got to figure out the rest. thanks everyone


UPDATE

I was able to edit the records that had matching urls but for the ones that are similar like http://www.pantagraph.com http://pantagraph.com http://pantagraph.com/titles

were not updated. When using the sort and select I can view all these records but when trying to update them it does not work. I even just tried this simple version of:

 select * from Sources s
 where s.url like url 

which shows the records but when updating it did not work.

update Sources 
set country = s.country 
from Sources s 
 where s.url like url


You could use a subquery:

update  yt
set     country = (
                  select  distinct country 
                  from    YourTable yt2 
                  where   yt.url = yt2.url 
                          and yt2.country is not null
                  )
from    YourTable yt

This should give an error if there is an URL with different countries: in that case you should adapt the query to choose one of the countries.

Per your comment, to find rows with conflicting countries:

select  url
,       count(distinct country) as NumberOfCountries
from    YourTable
where   country is not null
group by
        url
,       country
having  count(distinct country) > 1


For sorting the URL, try adding another column for just "domain name". You would extract the domain name (hollywoodreporter.com), store it, and sort on that column. Alternatively, you could use regular expressions in your ORDER BY clause, but that could lead to performance problems.


update table
set
    table.country = t2.country
from
    table t2
where
    table.url = t2.url
    and t2.country is not null

You're joining the table to itself based on url and updating from rows that have something in the country field.

But, if you've got that much duplicated data, your design is probably bad. Try normalizing the db if you can.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜