开发者

How do I add parameters to a bulk insert statement to protect against SQL injection and increase performance?

I am importing a file from a user on the web that contains 150K rows and has to be broken up resulting in about 1.6M items that will be added to the database.开发者_如何转开发

At the moment I add the primary record first and then add the children after with the key that was provided after the first record.

While I can precompile the query and re-use it, I'd like to group the lot of them together but I'm concerned that I won't be able to parameterize the queries at that point.

At the moment I'm only importing at around 300 rows or 3000 queries/sec via the query method.


I'm not sure what your constraints are for how you can load the data, but there are a few good avenues to get good bulk import rates in to the database->Performing Bulk Copy Operations. When working with with a data import process, I always found it helpful to break it up into phases:

  1. Import phase - various different bulk methodologies available depending on your situation
  2. Staging phase - process work; e.g. data validations, key relations building, data scrubbing, etc
  3. Final Insert into "live" tables. (hopefully set based insert)

It can be very efficient to pick all of the data up in the first pass with little to no logic work, and move it to a staging area en mass; either going into temp tables or permanent staging tables for that purpose. Then you can do any processing work on the data to have everything properly structured and cleaned, before mass inserting them into their final home in the live tables. This could also give you a layer of insulation from any malicious data or sql injection attacks by having one or more intermediary steps.

This separation allows the bulk import task to be as fast as you can make it, because there is hopefully very little logic required to do a mass import to a big staging dumping ground. Then you can apply whatever logic is required to slice up the data however it is appropriate. Additionally, if you have several steps that need to be done in the staging phase, you can break this up into however many smaller steps are needed and focus on optimizing the biggest/slowest parts.

In your situation, if there is a way that you can structure the data after getting to the staging phase to match what is in the live tables, you might be able to insert it as one big set. Being able to build PK->ForeignKey relations in the staging phase before the final insert, (as well as handling any other data processing work), can allow you to go from iterative inserts to one big bulk set insert..set based is usually a very good thing. That is, oh course, if your system/constraints allow for you to do as such.

I'm not sure if any of that applies to your situation or not, of if I'm way off base from what you were asking; but hopefully there is something in there that can be useful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜