Oracle performance issue with massive inserts and truncates (AWR attached)
I'm using Oracle to synchronize events between two nodes of my application server (never mind why/if that's the best way/etc. This is a given).
To do so, I'm using an "events" table that one node (the "active") writes new events to and the other node (the "passive")开发者_如何学Python reads from. The table looks like:
Event UUID (UUID) || Event ID (long) || Event Data (several columns of different types)
The event ID is a number constantly increasing (application controlled, not a sequence) that signifies the revision the internal model would be at after applying the event data. The Event UUID has unique constraint. I have a single index on the event ID to facilitate the select SQL - "Select ... where Event_ID > XXX order by Event_ID" where the XXX is the internal revision number of the passive node. Once in a while I truncate the table (using "truncate reuse storage").
[Actually, I use three such tables in a round-robin order so I could always truncate the one I'm about to write to the my passive node can have time to "catch up"].After several hours of inserting and truncating where everything is fine I start getting a lot of "noise" from the database and response time drops dramatically. This can go on for an hour or two (or even more), then all of the sudden it stops and response time return to its normal level. The AWR reports point toward the truncate statement and a bit toward the insert statements. I suspect something is going on with DBWR - but I can't pinpoint. Note that this performance degradation happens even when the secondary node (the one running the "SELECT" statements) is off - so it's pure insert/truncate. Note2: This issue does NOT reproduce on MSSQL.
The question: why is this happening? What can I do to stop it? Are there alternatives to this design (the closer the alternative to the current design the better).
Update 1: I might have mislead with the title. This is not a single massive insert but a trickle of inserts as the events are generated in the application server.
Update 2: AWR compare of a sample from the first period (good) and the second period (bad) is at http://pastehtml.com/view/1eirn20.html
Update 3: new AWR diff at http://pastehtml.com/view/1eirn20.html
Update 4: Solved. Apparently it WAS the storage (thanks ik_zelf!). Just goes to show - abstractions aren't really abstract. At the end, it's a magnetized spinning plate.
In the awr reports is a clear indication that the io time doubles in the bad period compared to the first period. Check the storage usage. It could very well be that the storage is shared between systems and that - for example - an other system is taking a backup and causes a bad period. Check the logs of all connected systems/backups an try to connect the times to your test findings.
精彩评论