开发者

Arrays as separate type

Some scripting languages, such as Python and Javascript, have arrays (aka lists) as a separate datatype from hash tables (aka dictionaries, maps, objects). In other scripting languages, such as PHP and Lua, an array is merely a hash table whose keys happen to be integers. (The implementation may be optimized for that special case, as is done in the current version of Lua, but that's transparent to the language semantics.)

Which is the better approach?

  1. The unified approach is more elegant in the sense of having one thing rather than two, thoug开发者_开发技巧h the gain isn't quite as large as it might seem at first glance, since you still need to have the notion of iterating over the numeric keys specifically.

  2. The unified approach is arguably more flexible. You can start off with nested arrays, find you need to annotate them with other stuff, and just add the annotations, without having to rework the data structures to interleave the arrays with hash tables.

  3. In terms of efficiency, it seems to be pretty much a wash (provided the implementation optimizes for the special case, as Lua does).

What am I missing? Does the separate approach have any advantages?


Having separate types means that you can make guarantees about performance, and you know that you will have "normal" semantics for things like array slicing. If you have a unified system, you need to work out what all operations, such as slicing, mean on sparse arrays.


An array is more than a table intentionally restricted to consecutive integer keys. It's a sequence, a collection of n items (not key-value pairs, just the values) with a well-defined order. This is, in my opinion, a data structure that has no place for additional data in the form of non-integer keys. It's conceptually simpler.

Also, implementing the two seperately may be simpler, especially when considering the addition of an optimization (which is apparently obscure enough that a performance-oriented language like Lua didn't implement it for many many years) which makes arrays perform well.

Also, the flexibility point is arguable. If the need for more complex annotation arises, it's quite possible that you'll soon also need polymorphism, in which case you should just switch to objects with an array among other attributes.


As mentioned, there are speed and complexity issues involved in having two separate types. However, one of the things that I find important about having two types is that it expresses the intent of the datastore.

  • A list is a an ordered list of items. The items and their order ARE the data, the keys only exist in a conceptual manner to describe the order of the items.
  • A map is a mapping of keys to values. The keys and the values they represent ARE the data.

The point to note that the keys are part of the data for a map, they're not for a list... conceptually. When you choose one data type over the other, you're specifying your intent.

I'll add as an aside that every language that shares a data type for lists and maps has certain... annoyances that come along with it. There are always certain concessions that need to be made to allow the combination, and they can bite you sometimes. It's generally not a big deal, but it can be annoying.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜