开发者

Ruby - Array.join versus String Concatenation (Efficiency)

I recall getting a scolding for concatenating Strings in Python once upon a time. I was told that it is more efficient to cre开发者_如何转开发ate an List of Strings in Python and join them later. I carried this practice over into JavaScript and Ruby although I am unsure if this has the same benefit in latter.

Can anyone tell me if it is more efficient (resource and execution-wise) to join a Array of Strings and call :join on them or to concatenate a string as needed in the Ruby programming language?

Thanks.


Try it yourself with the Benchmark class.

require "benchmark"

n = 1000000
Benchmark.bmbm do |x|
  x.report("concatenation") do
    foo = ""
    n.times do
      foo << "foobar"
    end
  end

  x.report("using lists") do
    foo = []
    n.times do
      foo << "foobar"
    end
    string = foo.join
  end
end

This produces the following output:

Rehearsal -------------------------------------------------
concatenation   0.300000   0.010000   0.310000 (  0.317457)
using lists     0.380000   0.050000   0.430000 (  0.442691)
---------------------------------------- total: 0.740000sec

                    user     system      total        real
concatenation   0.260000   0.010000   0.270000 (  0.309520)
using lists     0.310000   0.020000   0.330000 (  0.363102)

So it looks like concatenation is a little faster in this case. Benchmark on your system for your use-case.


Funny, benchmarking gives surprising results (unless I'm doing something wrong):

require 'benchmark'

N = 1_000_000
Benchmark.bm(20) do |rep|

  rep.report('+') do
    N.times do
      res = 'foo' + 'bar' + 'baz'
    end
  end

  rep.report('join') do
    N.times do
      res = ['foo', 'bar', 'baz'].join
    end
  end

  rep.report('<<') do
    N.times do
      res = 'foo' << 'bar' << 'baz'
    end
  end
end

gives

jablan@poneti:~/dev/rb$ ruby concat.rb 
                          user     system      total        real
+                     1.760000   0.000000   1.760000 (  1.791334)
join                  2.410000   0.000000   2.410000 (  2.412974)
<<                    1.380000   0.000000   1.380000 (  1.376663)

join turns out to be the slowest. It might have to do with creating the array, but that's what you would have to do anyway.

Oh BTW,

jablan@poneti:~/dev/rb$ ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i486-linux]


Yes, it's the same principle. I remember a ProjectEuler puzzle where I tried it both ways, calling join is much faster.

If you check out the Ruby source, join is implemented all in C, it's going to be a lot faster than concatenating strings (no intermediate object creation, no garbage collection):

/*
 *  call-seq:
 *     array.join(sep=$,)    -> str
 *  
 *  Returns a string created by converting each element of the array to
 *  a string, separated by <i>sep</i>.
 *     
 *     [ "a", "b", "c" ].join        #=> "abc"
 *     [ "a", "b", "c" ].join("-")   #=> "a-b-c"
 */

static VALUE
rb_ary_join_m(argc, argv, ary)
    int argc;
    VALUE *argv;
    VALUE ary;
{
    VALUE sep;

    rb_scan_args(argc, argv, "01", &sep);
    if (NIL_P(sep)) sep = rb_output_fs;

    return rb_ary_join(ary, sep);
}

where rb_ary_join is:

 VALUE rb_ary_join(ary, sep)
     VALUE ary, sep;
 {
     long len = 1, i;
     int taint = Qfalse;
     VALUE result, tmp;

     if (RARRAY(ary)->len == 0) return rb_str_new(0, 0);
     if (OBJ_TAINTED(ary) || OBJ_TAINTED(sep)) taint = Qtrue;

     for (i=0; i<RARRAY(ary)->len; i++) {
     tmp = rb_check_string_type(RARRAY(ary)->ptr[i]);
     len += NIL_P(tmp) ? 10 : RSTRING(tmp)->len;
     }
     if (!NIL_P(sep)) {
     StringValue(sep);
     len += RSTRING(sep)->len * (RARRAY(ary)->len - 1);
     }
     result = rb_str_buf_new(len);
     for (i=0; i<RARRAY(ary)->len; i++) {
     tmp = RARRAY(ary)->ptr[i];
     switch (TYPE(tmp)) {
       case T_STRING:
         break;
       case T_ARRAY:
         if (tmp == ary || rb_inspecting_p(tmp)) {
         tmp = rb_str_new2("[...]");
         }
         else {
         VALUE args[2];

         args[0] = tmp;
         args[1] = sep;
         tmp = rb_protect_inspect(inspect_join, ary, (VALUE)args);
         }
         break;
       default:
         tmp = rb_obj_as_string(tmp);
     }
     if (i > 0 && !NIL_P(sep))
         rb_str_buf_append(result, sep);
     rb_str_buf_append(result, tmp);
     if (OBJ_TAINTED(tmp)) taint = Qtrue;
     }

     if (taint) OBJ_TAINT(result);
     return result;
}


I was just reading about this. Attahced is a link talking about it.

Building-a-String-from-Parts

From what I understand, in Python and Java strings are immutable objects unlike arrays, while in Ruby both strings and arrays are as mutable as each other. There might be a minimal difference in speed between using a String.concat or << method to form a string versus Array.join but it doesn't seem to be a big issue.

I think the link will explain this a lot better than i did.

Thanks,

Martin


" The problem is the pile of data as a whole. In his first situation, he had two types of data stockpiling: (1) a temporary string for each row in his CSV file, with fixed quotations and such things, and (2) the giant string containing everything. If each string is 1k and there are 5,000 rows...

Scenario One: build a big string from little strings

temporary strings: 5 megs (5,000k) massive string: 5 megs (5,000k) TOTAL: 10 megs (10,000k) Dave's improved script swapped the massive string for an array. He kept the temporary strings, but stored them in an array. The array will only end up costing 5000 * sizeof(VALUE) rather than the full size of each string. And generally, a VALUE is four bytes.

Scenario Two: storing strings in an array

strings: 5 megs (5,000k) massive array: 20k

Then, when we need to make a big string, we call join. Now we're up to ten megs and suddenly all those strings become temporary strings and they can all be released at once. It's a huge cost at the end, but it's a lot more efficient than a gradual crescendo that eats resources the whole time. "

http://viewsourcecode.org/why/hacking/theFullyUpturnedBin.html

^It's actually better to in the for memory/garbage collection performance to delay the operation until the end just like I was taught to in Python. The reason begin that you get one huge chunk of allocation towards the end and an instant release of objects.


@jergason's answer shows that concatenation is slightly faster, but this is because the shovel operator << is allowed to modify the original string.

If we run the same benchmark with frozen_string_literal: true at the top, you get this result:

Rehearsal -------------------------------------------------
using lists     0.140621   0.015146   0.155767 (  0.308191)
concatenation Traceback (most recent call last):
    8: from main.rb:5:in `<main>'
    7: from /usr/lib/ruby/2.5.0/benchmark.rb:255:in `bmbm'
    6: from /usr/lib/ruby/2.5.0/benchmark.rb:255:in `inject'
    5: from /usr/lib/ruby/2.5.0/benchmark.rb:255:in `each'
    4: from /usr/lib/ruby/2.5.0/benchmark.rb:257:in `block in bmbm'
    3: from /usr/lib/ruby/2.5.0/benchmark.rb:293:in `measure'
    2: from main.rb:16:in `block (2 levels) in <main>'
    1: from main.rb:16:in `times'
main.rb:17:in `block (3 levels) in <main>': can't modify frozen String (FrozenError)

And if you update the concatenation benchmark to use += instead of <<, you'll find that the concatenation benchmark never terminates.

Therefore, Array#join is faster than calling += multiple times.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜