开发者

Why can't I store string keys in an Associative Array?

I'm new to D programming language, just started reading The D Programming Language book.

I run into error when trying one associative array example code

#!/usr/bin/rdmd
import std.stdio, std.string;

void main() {
    uint[string] dict;
    foreach (line; stdin.byLine()) {
        foreach (word; splitter(strip(line))) {
            if (word in dict) continue;
            auto newId = dict.length;
            dict[word] = newId;
            writeln(newId, '\t', word);
        }   
    }   
}

DMD shows this Error message:

./vocab.d(11): Error: associative arrays can only be assigned values with immutable keys, not char[]

I'm using DMD compile 2.051

I was guessing the rules for associative arrays has changed since the TDPL book.

How should I use Associative arrays with string keys?

Thanks.

Update:

I found the solution i开发者_开发百科n later parts of the book.

use string.idup to make a duplicate immutable value before putting into the array.

so

dict[word.idup] = newId;

would do the job.

But is that efficient ?


Associative arrays require that their keys be immutable. It makes sense when you think about the fact that if it's not immutable, then it might change, which means that its hash changes, which means that when you go to get the value out again, the computer won't find it. And if you go to replace it, you'll end up with another value added to the associative array (so, you'll have one with the correct hash and one with an incorrect hash). However, if the key is immutable, it cannot change, and so there is no such problem.

Prior to dmd 2.051, the example worked (which was a bug). It has now been fixed though, so the example in TDPL is no longer correct. However, it's not so much the case that the rules for associative arrays have changed as that there was a bug in them which was not caught. The example compiled when it shouldn't have, and Andrei missed it. It's listed in the official errata for TDPL and should be fixed in future printings.

The corrected code should use either dictionary[word.idup] or dictionary[to!string(word)]. word.idup creates a duplicate of word which is immutable. to!string(word), on the other hand converts word to a string in the most appropriate manner. As word is a char[] in this case, that would be to use idup. However, if word were already a string, then it would simply return the value which was passed in and not needlessly copy it. So, in the general case, to!string(word) is the better choice (particularly in templated functions), but in this case, either works just fine (to!() is in std.conv).

It is technically possible to cast a char[] to a string, but it's generally a bad idea. If you know that the char[] will never change, then you can get away with it, but in the general case, you're risking problems, since the compiler will then assume that the resulting string can never change, and it could generate code which is incorrect. It may even segfault. So, don't do it unless profiling shows that you really need the extra efficiency of avoiding the copy, you can't otherwise avoid the copy by doing something like just using a string in the first place (so no conversion would be necessary), and you know that the string will never be changed.

In general, I wouldn't worry too much of the efficiency of copying strings. Generally, you should be using string instead of char[], so you can copy them around (that is copy their reference around (e.g. str1 = str2;) rather than copying their entire contents like dup and idup do) without worrying about it being particularly inefficient. The problem with the example is that stdin.byLine() returns a char[] rather than a string (presumably to avoid copying the data if its not necessary). So, splitter() returns a char[], and so word is a char[] instead of a string. Now, you could do splitter(strip(line.idup)) or splitter(strip(line).idup) instead of iduping the key. That way, splitter() would return a string rather than char[], but that's probably essentially just as efficient as iduping word. Regardless, because of where the text is coming from originally, it's a char[] instead of a string, which forces you to idup it somewhere along the line if you intend to use it as a key in an associative array. In the general case, however, it's better to just use string and not char[]. Then you don't need to idup anything.

EDIT:
Actually, even if you find a situation where casting from char[] to string seems both safe and necessary, consider using std.exception.assumeUnique() (documentation). It's essentially the preferred way of converting a mutable array to an immutable one when you need to and know that you can. It would typically be done in cases where you've constructed an array which you couldn't make immutable because you had to do it in pieces but which has no other references, and you don't want to create a deep copy of it. It wouldn't be useful in situations like the example that you're asking about though, since you really do need to copy the array.


No, it's not efficient, since it obviously duplicates the string. If you can guarantee that the string you create will never be modified in memory, feel free to explicitly use a cast cast(immutable)str on it, instead of duplicating it.

(Although, I've noticed that the garbage collector works well, so I suggest you don't actually try that unless you see a bottleneck, since you might decide to change the string later. Just place a comment in your code to help you find the bottleneck later, if it exists.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜