开发者

Do compilers usually have special optimizations for strings?

Often times you see things like

std::map<std::string, somethingelse> m_named_objects;

or

std::string state;

//...

if(state == "EXIT")
   exit();
else if(state == "california")
   hot();

where people use strings purely to make something more readable. The same thing could easily be achieved with something like integer-IDs.

Can modern compilers (msvc, g++, etc.) usually employ special optimizations for these types of cas开发者_如何转开发es? Or should this be avoided because of bad performance or for other reasons?


Can modern compilers (msvc, g++, etc.) usually employ special optimizations for these types of cases?

As far as I know, compilers don't make those kinds of optimizations. It's definitely not a "standard" optimization.

...where people use strings purely to make something more readable.

At least for your second case, it seems to me that enumerations are more readable and can be faster (since integer comparisons are rather cheap relative to string comparison).

enum State
{
    Alabama,
    Alaska,
    Arizona,
    Arkansas, 
    California,
    Colorado,
    Connecticut,
    Delaware,
    // ... More
};

// ...

State state = California;
if(state == California) { /* true */ }


Libraries do.

Compilers might optimize by aliasing shared/identical static strings (assuming that they really are treated as constants).

All C++ standard library implementation I'm currently aware of, sport a 'small string optimization', meaning that no extra heap allocation needs to occur for small strings; I.e.

std::string a("small");

will be fully auto (stack) allocated - in highly optimized cases perhaps even register allocated(?)


If you need blazingly fast string lookups and can afford some time spent building your datastructure, look at Tries (WP: Trie, Radix_tree)

As far as drop-in replacements go usually a lot can be gained by using a properly tuned hash map instead of a RB-tree based one:

std::map<std::string, somethingelse> m_named_objects;

replace by

std::unordered_map<std::string, somethingelse> m_named_objects;

Be happy


In the examples given the compiler generally cannot optimize because the content is runtime dependent.

std::map<std::string, int> does not have the most desirable performance characteristics as operator<() on a std::string is relatively expensive.


Optimizations for strings are for libraries, not compilers. If you want string-like identifiers, enums are one possibility. But a better one, particularly for printing and debugging, is a fixed-length identifier string class.

It would be convertible to const char * and std::string, but it would have zero memory allocations. Instead, it would just be a wrapper around a 32-character (or whatever you want) array.

The best part is that, since it's an identifier, you don't care about ASCII character-by-character comparisons. operator< can just read the 32-character array as 8 uint32_ts, or even as 4 uint64_ts. All you need is an ordering, not a specific ordering. operator== can do similar tests.

It's a pretty simple class to write. If you want case-insensitive comparisons, you could just convert the string to lowercase when you copy it into the object.

If you need strings longer than 31 bytes (one for the \0 terminator), then I would suggest truncating the string down to size. But truncate from the middle of the given string, not the end. The beginnings and end of identifiers tend to be more unique than the middle. You could even put some special characters in a truncated string to identify that it is a truncated version.

It is also possible to take this idea and put a hash in the string. So the first 4 bytes would be a hash of the original string, not of the truncation. Comparison tests would just use the hash, and the other 28 bytes are there to make it human-readable.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜