开发者

SQL Performance (Replace)

I have a table that has about开发者_开发技巧 400,000+ rows. I am writing some pattern matching code but need to clean up a column before I do so. This boils down to doing a replace-like operation.

I tried listing them all out one at a time...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ')
 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ')

Since I have over 500 of these...it took too long.

Now I am trying to nest them...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(REPLACE(ADDR_LINEONE,' southwest ',' sw '),' southeast ',' se ')

But this is still painfully slow. I need to make this code work on tables of all sizes (1 record to 5 million records).

Anyone have any advice? I am using SQL Server by the way.


You have to always scan the table end-to-end no matter how fancy you do the REPLACE. This is what is killing performance, and it cannot be changed since you have to way of indexing the ADDR_LINEONE field in any sensible manner.

Since this should be a one-time only operation, the long time should not matter.

If this is a repeated operation, then your problem is not here, is in how you load the data into the table: do the transformation before you save the data, otherwise you stand no chance.


Write a CLR procedure. TSQL is not great at (or designed for) handling large numbers of string manipulations.

Regular Expressions Make Pattern Matching And Data Extraction Easier


Create #TEMP table with new values then inner join Something like this

create table #TempValues
(oldAres varchar(12),newadres varchar(2))

insert into #TempValues
select 'southeast','se'
union all 
select 'southwest','sw'

update T_ADDRESS
set addr_lineone=t.newadres
from T_ADDRESS inner join #TempValues t on T_ADDRESS.addr_lineone=t.oldAdres


You could write the update statement to use a CASE statement in place of the multiple REPLACEs although I've no idea whether this will execute any faster for you.


Old question, but still useful. Your query updates all the columns of the table, with a large penalty if this column is indexed. You can improve the speed by filtering the fields that do not need to be updated as follows:

Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southeast %'
Update T set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ') FROM T_ADDRESS T WHERE ADDR_LINEONE LIKE '% southwest %'

Unless all your addresses contain southeast or southwest, this will be an order of magnitude faster!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜