Selectively deleting large amounts data from Sybase table
Using Sybase ASE 15 for this - I have about a large amount of rows (up to 10 mil) to delete from a table on a regular basis, but I want to keep a selection of the latest added data to the table, so that rules out using truncate directly on the table.
delete from master_table where...
Using the above delete is very slow, so my strategy is to move the data I want to keep into a temp table, truncate the master table and move the data back in again from the temp table i.e.
1) select * into #temp_table from master_table where date_updated > dateadd(mi, -15, getdate()) and node_type != 'X'
2) truncate table master_table
3) insert into master_table select * from #temp_table
This is almost good enough - 1 & 2 have great performance, but the insert back into the master is too slow.
So my question rea开发者_运维百科lly boils down to whether there a quick way of doing either of:
delete from master_table where...
insert into xyz select * from...
Or I'm open to alternative approaches!
Probably your best solution is to use partitioning.
I do not know the details of partitioning on Sybase, however, if you can create time-based partitions, you can potentially drop them by altering partitions.
You will however, need something which creates future partitions and drops old ones - this is a piece of software which you have to maintain (it may be a stored procedure or script which runs on the database server or elsewhere on a "cron" job etc).
You'll also need to ensure that those with node_type='X' are deleted correctly.
Maybe you can create two sets of daily partitions, one for node_type='X' and another for other node_types, and create new partitions each day (for tomorrow, and maybe the day after) and drop the old ones you don't need, or merge them if you do need the data.
Copying out of the master_table (into temp_table) is fast, but copying those rows back to the master_table is slow. So, you have indexes, constraints and triggers on the master_table. You may need to look at those to see if they are really needed for a table that is used in bulk inserts and deletes "on a regular basis", and find alternatives based on your business need.
Below solution assumes master_table does not have any dependencies or constraints. Since you are doing this "on a regular basis", and you delete most of the master_table rows anyway, it will be faster to use a permanent temp_table.
-- Create the copy table, for the first run
-- if not exists
create table master_table_copy -- This is your permanent temp_table
as select * from master_table where 1=2
-- Copy rows you want to keep
insert into master_table_copy
select * from master_table
where date_updated > dateadd(mi, -15, getdate())
and node_type != 'X'
truncate table master_table
-- Rename the tables
exec sp_rename 'master_table', 'master_table_orig'
exec sp_rename 'master_table_copy', 'master_table'
exec sp_rename 'master_table_orig', 'master_table_copy' -- use for next time
Depending on your circumstances, fast bcp could work for making the insert run quickly. That will change your overall design to need a shell script (or batch file), but it could work (if your table is designed to allow fast BCP)
The other thing to look at is why the insert is slow. Is it disk issues? Too many indexes needing to be updated? It might be the case that certain tweaking of your database structure could speed it up.
精彩评论