Is #inject on hashes considered good style?

2023-01-06 17:35 问答作者：

Inside the Rails code, people tend to use the Enumerable#inject method to create hashes, like this:

somme_enum.inject({}) do |hash, element|
  hash[element.foo] = element.bar
  hash
 end

While this appears to have become a common idiom, does anyone see an advantage over the "naive" version, which would go like:

hash = {}
some_enum.each { |element| hash[element.foo] = element.bar }

The only advantage I see for the first version is that you do it in a closed block and you don't (explicitly) initialize开发者_运维百科 the hash. Otherwise it abuses a method unexpectedly, is harder to understand and harder to read. So why is it so popular?

As Aleksey points out, Hash#update() is slower than Hash#store(), but that got me thinking about the overall efficiency of #inject() vs a straight #each loop, so I benchmarked a few things:

require 'benchmark'
module HashInject
  extend self

  PAIRS = 1000.times.map {|i| [sprintf("s%05d",i).to_sym, i]}

  def inject_store
    PAIRS.inject({}) {|hash, sym, val| hash[sym] = val ; hash }
  end

  def inject_update
    PAIRS.inject({}) {|hash, sym, val| hash.update(val => hash) }
  end

  def each_store
    hash = {}
    PAIRS.each {|sym, val| hash[sym] = val }
    hash
  end

  def each_update
    hash = {}
    PAIRS.each {|sym, val| hash.update(val => hash) }
    hash
  end

  def each_with_object_store
    PAIRS.each_with_object({}) {|pair, hash| hash[pair[0]] = pair[1]}
  end

  def each_with_object_update
    PAIRS.each_with_object({}) {|pair, hash| hash.update(pair[0] => pair[1])}
  end

  def by_initialization
    Hash[PAIRS]
  end

  def tap_store
    {}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
  end

  def tap_update
    {}.tap {|hash| PAIRS.each {|sym, val| hash.update(sym => val)}}
  end

  N = 10000

  Benchmark.bmbm do |x|
    x.report("inject_store") { N.times { inject_store }}
    x.report("inject_update") { N.times { inject_update }}
    x.report("each_store") { N.times {each_store }}
    x.report("each_update") { N.times {each_update }}
    x.report("each_with_object_store") { N.times {each_with_object_store }}
    x.report("each_with_object_update") { N.times {each_with_object_update }}
    x.report("by_initialization") { N.times {by_initialization}}
    x.report("tap_store") { N.times {tap_store }}
    x.report("tap_update") { N.times {tap_update }}
  end

end

And the results:

Rehearsal -----------------------------------------------------------
inject_store             10.510000   0.120000  10.630000 ( 10.659169)
inject_update             8.490000   0.190000   8.680000 (  8.696176)
each_store                4.290000   0.110000   4.400000 (  4.414936)
each_update              12.800000   0.340000  13.140000 ( 13.188187)
each_with_object_store    5.250000   0.110000   5.360000 (  5.369417)
each_with_object_update  13.770000   0.340000  14.110000 ( 14.166009)
by_initialization         3.040000   0.110000   3.150000 (  3.166201)
tap_store                 4.470000   0.110000   4.580000 (  4.594880)
tap_update               12.750000   0.340000  13.090000 ( 13.114379)
------------------------------------------------- total: 77.140000sec

                              user     system      total        real
inject_store             10.540000   0.110000  10.650000 ( 10.674739)
inject_update             8.620000   0.190000   8.810000 (  8.826045)
each_store                4.610000   0.110000   4.720000 (  4.732155)
each_update              12.630000   0.330000  12.960000 ( 13.016104)
each_with_object_store    5.220000   0.110000   5.330000 (  5.338678)
each_with_object_update  13.730000   0.340000  14.070000 ( 14.102297)
by_initialization         3.010000   0.100000   3.110000 (  3.123804)
tap_store                 4.430000   0.110000   4.540000 (  4.552919)
tap_update               12.850000   0.330000  13.180000 ( 13.217637)
=> true

Enumerable#each is faster than Enumerable#inject, and Hash#store is faster than Hash#update. But the fastest of all is to pass an array in at initialization time:

Hash[PAIRS]

If you're adding elements after the hash has been created, the winning version is exactly what the OP was suggesting:

hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash

But in that case, if you're a purist who wants a single lexical form, you can use #tap and #each and get the same speed:

{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}

For those not familiar with tap, it creates a binding of the receiver (the new hash) inside the body, and finally returns the receiver (the same hash). If you know Lisp, think of it as Ruby's version of LET binding.

Since people have asked, here's the testing environment:

# Ruby version    ruby 2.0.0p247 (2013-06-27) [x86_64-darwin12.4.0]
# OS              Mac OS X 10.9.2
# Processor/RAM   2.6GHz Intel Core i7 / 8GB 1067 MHz DDR3

Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.

As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.

inject (aka reduce) has a long and respected place in functional programming languages. If you're ready to take the plunge, and want to understand a lot of Matz's inspiration for Ruby, you should read the seminal Structure and Interpretation of Computer Programs, available online at http://mitpress.mit.edu/sicp/.

Some programmers find it stylistically cleaner to have everything in one lexical package. In your hash example, using inject means you don't have to create an empty hash in a separate statement. What's more, the inject statement returns the result directly -- you don't have to remember that it's in the hash variable. To make that really clear, consider:

[1, 2, 3, 5, 8].inject(:+)

total = 0
[1, 2, 3, 5, 8].each {|x| total += x}

The first version returns the sum. The second version stores the sum in total, and as a programmer, you have to remember to use total rather than the value returned by the .each statement.

One tiny addendum (and purely idomatic -- not about inject): your example might be better written:

some_enum.inject({}) {|hash, element| hash.update(element.foo => element.bar) }

...since hash.update() returns the hash itself, you don't need the extra hash statement at the end.

update

@Aleksey has shamed me into benchmarking the various combinations. See my benchmarking reply elsewhere here. Short form:

hash = {}
some_enum.each {|x| hash[x.foo] = x.bar}
hash

is the fastest, but can be recast slightly more elegantly -- and it's just as fast -- as:

{}.tap {|hash| some_enum.each {|x| hash[x.foo] = x.bar}}

I have just found in Ruby inject with initial being a hash a suggestion to use each_with_object instead of inject:

hash = some_enum.each_with_object({}) do |element, h|
  h[element.foo] = element.bar
end

Seems natural to me.

Another way, using tap:

hash = {}.tap do |h|
  some_enum.each do |element|
    h[element.foo] = element.bar
  end
end

If you are returning a hash, using merge can keep it cleaner so you don't have to return the hash afterward.

some_enum.inject({}){|h,e| h.merge(e.foo => e.bar) }

If your enum is a hash, you can get key and value nicely with the (k,v).

some_hash.inject({}){|h,(k,v)| h.merge(k => do_something(v)) }

I think it has to do with people not fully understanding when to use reduce. I agree with you, each is the way it should be

继续阅读：ruby ruby-on-rails

Is #inject on hashes considered good style?

update

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

update

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？