C++ interpreter conceptual problem
I've built an interpreter in C++ for a language created by me.
One main problem in the design was that I had two different types in the language: number and string. So I have to pass aroun开发者_Go百科d a struct like:
class myInterpreterValue
{
myInterpreterType type;
int intValue;
string strValue;
}
Objects of this class are passed around million times a second during e.g.: a countdown loop in my language.
Profiling pointed out: 85% of the performance is eaten by the allocation function of the string template.
This is pretty clear to me: My interpreter has bad design and doesn't use pointers enough. Yet, I don't have an option: I can't use pointers in most cases as I just have to make copies.
How to do something against this? Is a class like this a better idea?
vector<string> strTable;
vector<int> intTable;
class myInterpreterValue
{
myInterpreterType type;
int locationInTable;
}
So the class only knows what type it represents and the position in the table
This however again has disadvantages: I'd have to add temporary values to the string/int vector table and then remove them again, this would eat a lot of performance again.
- Help, how do interpreters of languages like Python or Ruby do that? They somehow need a struct that represents a value in the language like something that can either be int or string.
I suspect many values aren't strings. So the first thing you can do is to get rid of the string
object if you don't need it. Put it into an union. Another thing is that probably many of your strings are only small, thus you can get rid of heap allocation if you save small strings in the object itself. LLVM has the SmallString
template for that. And then you can use string interning, as another answer says too. LLVM has the StringPool
class for that: Call intern("foo")
and get a smart pointer refering to a shared string potentially used by other myInterpreterValue
objects too.
The union can be written like this
class myInterpreterValue {
boost::variant<int, string> value;
};
boost::variant
does the type tagging for you. You can implement it like this, if you don't have boost. The alignment can't be gotten portably in C++ yet, so we push some types that possibly require some large alignment into the storage union.
class myInterpreterValue {
union Storage {
// for getting alignment
long double ld_;
long long ll_;
// for getting size
int i1;
char s1[sizeof(string)];
// for access
char c;
};
enum type { IntValue, StringValue } m_type;
Storage m_store;
int *getIntP() { return reinterpret_cast<int*>(&m_store.c); }
string *getStringP() { return reinterpret_cast<string*>(&m_store.c); }
public:
myInterpreterValue(string const& str) {
m_type = StringValue;
new (static_cast<void*>(&m_store.c)) string(str);
}
myInterpreterValue(int i) {
m_type = IntValue;
new (static_cast<void*>(&m_store.c)) int(i);
}
~myInterpreterValue() {
if(m_type == StringValue) {
getStringP()->~string(); // call destructor
}
}
string &asString() { return *getStringP(); }
int &asInt() { return *getIntP(); }
};
You get the idea.
I think some dynamic languages cache all equivalent strings at runtime with a hash lookup and only store pointers. In each iteration of the loop where the string is staying the same, therefore, there would be just a pointer assigment or at most a string hashing function. I know some languages (Smalltalk, I think?) do this with not only strings but small numbers. See Flyweight Pattern.
IANAE on this one. If that doesn't help, you should give the loop code and walk us through how it's being interpreted.
In both Python and Ruby, integers are objects. So it's not a question of a "value" being either an integer or a string, it can be anything at all. Furthermore, everything in both of those languages is garbage collected. There's no need for copying of objects, pointers can be used internally so long as they are safely stored somewhere the garbage collector will see them.
So, one solution to your problem would be:
class myInterpreterValue {
virtual ~myInterpreterValue() {}
// example of a possible member function
virtual string toString() const = 0;
};
class myInterpreterStringValue : public myInterpreterValue {
string value;
virtual string toString() const { return value; }
};
class myInterpreterIntValue : public myInterpreterValue {
int value;
virtual string toString() const {
char buf[12]; // yeah, int might be more than 32 bits. Whatever.
sprintf(buf, "%d", value);
return buf;
}
};
Then use virtual calls and dynamic_cast
to switch on or check types, instead of comparing against values of myInterpreterType.
The usual thing to do at this point is worry that virtual function calls and dynamic cast might be slow. Both Ruby and Python use virtual function calls all over the place. Albeit not C++ virtual calls: for both languages their "standard" implementation is in C with custom mechanisms for polymorphism. But there's no reason in principle to assume that "virtual" means "performance out the window".
That said, I expect they probably both have some clever optimisations for certain uses of integers, including as loop counters. But if you're currently seeing that most of your time is spent copying empty strings, then virtual function calls by comparison are near-instantaneous.
The real worry is how you're going to do resource-management - depending what your plans are for your interpreted language, garbage collection might be more trouble than you want to go to.
The easiest way to solve that would be to make it a pointer to string, and only allocate it when you create the string value. You can also use union to save on memory.
class myInterpreterValue
{
myInterpreterType type;
union {
int asInt;
string* asString;
} value;
}
精彩评论