Optimising a lot of inserts with JDBC and MySQL

2023-04-07 03:05 问答作者：

I have to perform a large number of inserts (in this instance 27k) and I want to find an optimal to do this. Right now this is the code that I have. As you can see I'm using prepared statement and batches and I'm executing every 1000 ( I have also tried with a lesser number such as 10 and 100 but the time was again way to long). One thing which is omitted from the query is that there is an auto-generated ID if it is of any matter to the issue:

private void parseIndividualReads(String file, DBAccessor db) {
    BufferedReader reader;
    try {
        Connection con = db.getCon();
        PreparedStatement statement = null;
        statement = con.prepareStatement("INSERT INTO `vgsan01_process_log`.`contigs_and_large_singletons` (`seq_id` ,`length` ,`ws_id` ,`num_of_reads`) VALUES (?, ?, ?, ?)");
        long count = 0;
        reader = new BufferedReader(new FileReader(logDir + "/" + file));
        String line;
        while ((line = reader.readLine()) != null) {
            if(count != 0 && count % 1000 == 0)
                statement.executeBatch();

            if (line.startsWith(">")) {
                count++;
                String res[] = parseHeader(line);
                statement.setString(1, res[0]);
                statement.setInt(2, Integer.parseInt(res[1]) );
                statement.setInt(3, id);
                statement.setInt(4, -1);
                statement.addBatch(); 
            }
        }

        statement.executeBatch();
    } catch (FileNotFoundException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error opening file: " + file, ex);
    } catch (IOException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error reading from file: " + file, ex);
    } catch (SQLException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error inserting individual statistics " + file, ex);
    }
}

Any other tips regarding what might be changed in order to speed up the process. I mean a single insert statement doesn't have much information - I'd say no more than 50 characters for all 4 columns

EDIT:

Okay following the advice given I have restructured the method as follows. The speed up is immense. You could even try and play with the 1000 value which might yield better results:

private void parseIndividualReads(String file, DBAccessor db) {
    BufferedReader reader;
    PrintWriter writer;
    try {
        Connection con = db.getCon();
        con.setAutoCommit(false);
        Statement st = con.createStatement();
        StringBuilder sb = new StringBuilder(10000);

        reader = new BufferedReader(new FileReader(logDir + "/" + file));
        writer = new PrintWriter(new BufferedWriter(new FileWriter(logDir + "/velvet-temp-contigs", true)), true);
        String line;
        long count = 0;
        while ((line = reader.readLine()) != null) {

            if (count != 0 && count % 1000 == 0) {
                sb.deleteCharAt(sb.length() - 1);
                st.executeUpdate("INSERT INTO `vgsan01_process_log`.`contigs_and_large_singletons` (`seq_id` ,`length` ,`ws_id` ,`num_of_reads`) VALUES " + sb);
                sb.delete(0, sb.capacity());
               开发者_开发问答 count = 0;
            }
            //we basically build a giant VALUES (),(),()... string that we use for insert
            if (line.startsWith(">")) {
                count++;
                String res[] = parseHeader(line);
                sb.append("('" + res[0] + "','" + res[1] + "','" + id + "','" + "-1'" + "),");
            }
        }

        //insert all the remaining stuff
        sb.deleteCharAt(sb.length() - 1);
        st.executeUpdate("INSERT INTO `vgsan01_process_log`.`contigs_and_large_singletons` (`seq_id` ,`length` ,`ws_id` ,`num_of_reads`) VALUES " + sb);
        con.commit();
    } catch (FileNotFoundException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error opening file: " + file, ex);
    } catch (IOException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error reading from file: " + file, ex);
    } catch (SQLException ex) {
        Logger.getLogger(VelvetStats.class.getName()).log(Level.SEVERE, "Error working with mysql", ex);
    }
}

You have other solutions.

Use LOAD DATE INFILE from mySQL Documentation.
If you want to do it in Java, use only one Statement that inserts 1000 values in 2 order : "INSERT INTO mytable (col1,col2) VALUES (val1,val2),(val3,val4),..."

But I would recommend the 1st solution.

The fastest way to do what you want to do is to load directly from the file (http://dev.mysql.com/doc/refman/5.5/en/load-data.html).

There are some problems with loading from a file - firstly, the file needs to be readable by the server, which often isn't the case. Error handling can be a pain, and you can end up with inconsistent or incomplete data if the data in your file does not meet the schema's expectations.

It also depends on what the actual bottleneck is - if the table you're inserting to has lots going on, you may be better using a combination of insert delayed (http://dev.mysql.com/doc/refman/5.5/en/insert-delayed.html).

The official line on speeding up inserts is here: http://dev.mysql.com/doc/refman/5.5/en/insert-speed.html

Depending on the structure of your data you have a potential bug in your "execute batch every 1000 iterations" logic.

If the frequency of lines beginning with ">" is low, then you could have an instance where the following happens (loading for lots of unnecessary executeBatch calls:

line in data file      events in program
-----------------------------------------------------------
> some data            (count=999)
> some more data       (count=1000)
another line           (execute batch, count=1000)
more unprocessed       (execute batch, count=1000)
some more              (execute batch, count=1000)

So I would move the if(count != 0 && count % 1000 == 0) inside the if (line.startsWith(">")) block.

Note, I'm note sure whether this can happen in your data or how much of a speed-up it would be.

继续阅读：insert

Optimising a lot of inserts with JDBC and MySQL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？