bulk insert to SQL Server, insert statements vs one insert XML statement?
The application layer (ColdFusion) needs to insert multiple rows into SQL Server 2005.
I was thinking of using loop in app layer to construct multiple input statements and send to SQL Server over JDBC in a single connection.
My coll开发者_C百科eague however suggests to construct an XML and bulk insert the XML instead.
Which one is the better method?
For inserts of millions of rows, I've used BULK INSERT, writing data to a CSV file that the SQL Server instance has access to. This outperforms any type of insert via JDBC, but at the cost of reduced flexibility. For smaller numbers of rows, using JDBC's Statement.addBatch()
and Statement.executeBatch()
can be used to avoid the overhead of sending many small commands.
Depending upon your requirements, you may have to put all this in one transaction, or you may be able to split up into several transactions, if full ACID guarantees for the entire data set are not required.
Here is an article discussing bulk insert of XML. I have no data to base any conclusion on, but at a guess I would imagine BULK INSERT the raw row data will be faster since there is no OPENXML transformation required. Of course, if your data is already in XML then this makes sense, but if not, then staying with tabular data is probably simplest, and possibly most performant.
The XML will be the better approach IMO, although I'd also recommend you create a stored procedure to handle the actual processing instead of running a inline query.
Basically, you'll send the XML data as a single argument, then inside the SP you'll run INSERT INTO SELECT statement, selecting from the XML into some table or group of tables.
DECLARE @FOO xml;
SET @FOO = '<things><thing><id>1</id></thing><thing><id>2</id></thing><thing><id>3</id></thing><thing><id>4</id></thing></things>';
SELECT
ParamValues.id.value('.', 'int') AS thing_id
FROM
@FOO.nodes('/things/thing/id') AS ParamValues(id)
This will create a table with a single column "thing_id". Now all you have to do is something like
INSERT INTO someTable (someID)
SELECT
ParamValues.id.value('.', 'int') AS thing_id
FROM
@FOO.nodes('/things/thing/id') AS ParamValues(id)
and you've got a single INSERT to handle however many rows of XML you have.
One INSERT at a time will be unnecessarily slow due to all that network latency.
A better performing way would be to send several transactions as a single batch in one network round trip and commit them as a single unit of work.
If you have a very large number of records you might want to consider a hybrid approach: loop over a number of batches and send them to be committed as a unit of work. This will perform, even for large transactions, and you won't force your database to maintain a large rollback log before committing the whole transaction.
I'm not a fan if your XML solution means inserting the raw XML stream as a CLOB. How will you query it once it's in the database? You lose everything that SQL gives you: the ability to query. All you can do is XPath for particular values if you store the raw XML. And updates mean replacing the entire CLOB.
精彩评论