Using c# to generate and 100000s of records into a postgres database
I am programming a project in c# where many records are generated and need to be stored into a database. At the moment what I do (which is VERY slow) is store all of these results as a list of structs. Then at the end iterate through this struct and add all of the records to an sql query string. The issue with this is it takes ages to iterate through a list when it contains 100000s of items. A similar size inserts needs to be performed s开发者_运维问答everal times in the simulation. I've considered just storing the string from the off and rather then storing the records in a list put it into the string directly. Also perhaps storing them in a temporary file and using sql copy. I don't really have much experience with dealing with this amount of data so your feedback will be appreciated.
Thanks in advance
What you should try is populating a file with your data then using the built in COPY command. This is the recommended method of populating a database. http://www.postgresql.org/docs/8.3/interactive/sql-copy.html
When building the CSV temp file take care to follow the CSV spec. If your column data contains new lines (\n \r), commas (,) or quotes (") then escape the quotes (") with quotes
data=data.Replace("\"", "\"\"");
and surround the data with quotes
data="\""+data+"\"";
Something like
public String CSVEscape(String columnData)
{
if(columnData.Contains("\n") || columnData.Contains("\r") || columnData.Contains("\"") || columnData.Contains(","))
{
return "\"" + columnData.Replace("\"", "\"\"") + "\"";
}
return columnData;
}
If I'm reading your question correctly, you're sending the PostgreSQL server a string that looks something like this:
INSERT INTO mytable (x, y, z) VALUES (1,2,3), (4,5,6), ...
What you should do instead is
- start a transaction
- prepare the statement
INSERT INTO mytable (x, y, z) VALUES ($1, $2, $3)
- for each struct in your list, execute the prepared statement with the appropriate fields
- commit the transaction
(Sorry, no code, because I don't know C#'s DB APIs.)
I wouldn't bother figuring out COPY IN
unless the approach I described above is still way too slow. I get nervous when inserting data into a database requires any sort of text munging on my part.
If you have a low performance by using OOP approach, so using structs/ classes, first thing to do, is to measure and optimize code as much as possible.
If the performance even after optimization not good in your specific context, I would leave OOP approach and pass to raw SQL.
One of solution can be, like you said in post, during generation of string for every single entity, immediately add it to big file, where at the end of generation you will find complete huge SQL string. The problem here is testability of solution.
But, you know, somewhere you need to "pay". You can not have a comfott and a performance contemporary on such scale.
精彩评论