Ruby or Python for heavy import script?

2023-02-02 03:44 问答作者：

I have an application I wrote in PHP (on symfony) that imports large CSV files (up to 100,000 lines). It has a real memory usage problem. Once it gets through about 15,000 rows, it grinds to a halt.

I know there are measures I could take within PHP but I'm kind of done with PHP, anyway.

If I wanted to write an app th开发者_Python百科at imports CSV files, do you think there would be any significant difference between Ruby and Python? Is either one of them geared to more import-related tasks? I realize I'm asking a question based on very little information. Feel free to ask me to clarify things, or just speak really generally.

If it makes any difference, I really like Lisp and I would prefer the Lispier of the two languages, if possible.

What are you importing the CSV file into? Couldn't you parse the CSV file in a way that doesn't load the whole thing into memory at once (i.e. work with one line at a time)?

If so, then you can use Ruby's standard CSV library to do something like the following"

CSV.open('csvfile.csv', 'r') do |row|
  #executes once for each row
  p row
end

Now don't take this answer as an immediate reason to switch to Ruby. I'd be very surprised if PHP didn't have a similar functionality in its CSV library, so you should investigate PHP more thoroughly before deciding that you need to switch languages.

What are you importing the CSV file into? Couldn't you parse the CSV file in a way that doesn't load the whole thing into memory at once (i.e. work with one line at a time)?

If so, then you can use Python's standard csv library to do something like the following

import csv
with open('csvfile.csv', 'rb') as source:
    rdr= csv.reader( source )
    for row in rdr:
        # do whatever with row

Now don't take this answer as an immediate reason to switch to Python. I'd be very surprised if PHP didn't have a similar functionality in its CSV library, etc.

The equivalent in python (wait for it):

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
    print row

This code does not load the entire csv file in memory first but, instead, parses it line by line with iterators. I bet your problem is happening "after" the line is read, where you are somehow buffering the data (by storing it in a dictionary or array of some sort).

When dealing with bigdata, you need to discard of the data as fast as you can and buffer a little as possible. In the example above "print" is doing just that, performing some operation on the line of data but not storing/buffering any of it so python's GC can do away with that reference as soon as the loop scope ends.

I hope this helps.

I think the problem is that you are loading the csv in memory at once. If that is the case then I am sure that also python/ruby is going to blow up on you. I am a big fan of python, but that is just a personal opinion.

继续阅读：import lisp php python ruby

Ruby or Python for heavy import script?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？