many queries in a task to generate json

2023-02-14 18:00 问答作者：

So I've got a task to build which is going to archive a ton of data in our DB into JSON.

To give you a better idea of what is happening; X has 100s of Ys, and Y has 100s of Zs and so on. I'm creating a json file for every X, Y, and Z. But every X json file has an array of ids for the child Ys of X, and likewise the Ys store an array of child Zs..

It more complicated than that in many cases, but you should get an idea of the complexity involved from that example I think.

I was using ColdFusion but it seems to be a bad choice for this task because it is crashing due to memory errors. It seems to me that if it were removing queries from memory that are no longer referenced while running the task (ie: garbage collecting) then the task should have enough memory, but afaict ColdFusion isn't doing any garbage collection at all, and must be doing it after a request is complete.

So I'm looking either for advice on how to better achieve my task in CF, o开发者_开发问答r for recommendations on other languages to use..

Thanks.

1) If you have debugging enabled, coldfusion will hold on to your queries until the page is done. Turn it off!

2) You may need to structDelete() the query variable to allow it to be garbage collected, otherwise it may persist as long as the scope that has a reference to it exists. eg., <cfset structDelete(variables,'myQuery') />

3) A cfquery pulls the entire ResultSet into memory. Most of the time this is fine. But for reporting on a large result set, you don't want this. Some JDBC drivers support setting the fetchSize, which in a forward, read only fashion, will let you get a few results at a time. This way you can deal with thousands and thousands of rows, without swamping memory. I just generated a 1GB csv file in ~80 seconds, using less than 100mb of heap. This requires dropping out to Java. But it kills two birds with one stone. It reduces the amount of data brought in at a time by the JDBC driver, and since you're working directly with the ResultSet, you don't hit the cfloop problem @orangepips mentioned. Granted, it's not for those without some Java chops.

You can do it something like this (you need cfusion.jar in your build path):

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.sql.ResultSet;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;

import au.com.bytecode.opencsv.CSVWriter;
import coldfusion.server.ServiceFactory;

public class CSVExport {
    public static void export(String dsn,String query,String fileName) {
        Connection conn = null;
        Statement stmt = null;
        ResultSet rs = null;
        FileWriter fw = null;
        BufferedWriter bw = null;


        try {
            DataSource ds = ServiceFactory.getDataSourceService().getDatasource(dsn);
            conn = ds.getConnection();
            // we want a forward-only, read-only result.
            // you may want need to use a PreparedStatement instead.
            stmt = conn.createStatement(
                ResultSet.TYPE_FORWARD_ONLY,
                ResultSet.CONCUR_READ_ONLY
            );
            // we only want to go forward!
            stmt.setFetchDirect(ResultSet.FETCH_FORWARD);
            // how many records to pull back at a time.
            // the hard part is balancing memory usage, and round trips to the database.
            // basically sacrificing speed for a lower memory hit.
            stmt.setFetchSize(256);
            rs = stmt.executeQuery(query);
            // do something with the ResultSet, for example write to csv using opencsv
            // the key is to stream it. you don't want it stored in memory.
            // so excel spreadsheets and pdf files are out, but text formats like 
            // like csv, json, html, and some binary formats like MDB (via jackcess)
            // that support streaming are in.
            fw = new FileWriter(fileName);
            bw = new BufferedWriter(fw);
            CSVWriter writer = new CSVWriter(bw);
            writer.writeAll(rs,true);
        }
        catch (Exception e) {
            // handle your exception.
            // maybe try ServiceFactory.getLoggingService() if you want to do a cflog.
            e.printStackTrace();
        }
        finally() {
            try {rs.close()} catch (Exception e) {}
            try {stmt.close()} catch (Exception e) {}
            try {conn.close()} catch (Exception e) {}
            try {bw.close()} catch (Exception e) {}
            try {fw.close()} catch (Exception e) {}
        }
    }
}

Figuring out how to pass parameters, logging, turning this into a background process (hint: extend Thread) etc. are separate issues, but if you grok this code, it shouldn't be too difficult.

4) Perhaps look at Jackson for generating your json. It supports streaming, and combined with the fetchSize, and a BufferedOutputStream, you should be able to keep the memory usage way down.

Eric, you are absolutely correct about ColdFusion garbage collection not removing query information from memory until request end and I've documented it fairly extensively in another SO question. In short, you hit OoM Exceptions when you loop over queries. You can prove it using a tool like VisualVM to generate a heap dump while the process is running and then running the resulting dump through Eclipse Memory Analyzer Tool (MAT). What MAT would show you is a large hierarchy, starting with an object named (I'm not making this up) CFDummyContent that holds, among other things, references to cfquery and cfqueryparam tags. Note, attempting to change it up to stored procs or even doing the database interaction via JDBC does not make difference.

So. What. To. Do?

This took me a while to figure out, but you've got 3 options in increasing order of complexity:

<cthread/>
asynchronous CFML gateway
daisy chain http requests

Using cfthread looks like this:

<cfloop ...>
    <cfset threadName = "thread" & createUuid()>
    <cfthread name="#threadName#" input="#value#">
        <!--- do query stuff --->
        <!--- code has access to passed attributes (e.g. #attributes.input#) --->
        <cfset thread.passOutOfThread = somethingGeneratedInTheThread>
    </cfthread>
    <cfthread action="join" name="#threadName#">
    <cfset passedOutOfThread = cfthread["#threadName#"].passOutOfThread>
</cfloop>

Note, this code is not taking advantage of asynchronous processing, thus the immediate join after each thread call, but rather the side effect that cfthread runs in its own request-like scope independent of the page.

I'll not cover ColdFusion gateways here. HTTP daisy chaining means executing an increment of the work, and at the end of the increment launching a request to the same algorithm telling it to execute the next increment.

Basically, all three approaches allow those memory references to be collected mid process.

And yes, for whoever asks, bugs have been raised with Adobe, see the question referenced. Also, I believe this issue is specific to Adobe ColdFusion, but have not tested Railo or OpenDB.

Finally, have to rant. I've spent a lot of time tracking this one down, fixing it in my own large code base, and several others listed in the question referenced have as well. AFAIK Adobe has not acknowledge the issue much-the-less committed to fixing it. And, yes it's a bug, plain and simple.

继续阅读：coldfusion

many queries in a task to generate json

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？