Questions Regarding the Implementation of a Simple CPU Emulator
Background Information: Ultimately, I would like to write an emulator of a real machine such as the original Nintendo or Gameboy. However, I decided that I need to start somewhere much, much simpler. My computer science advisor/professor offered me the specifications for a very simple imaginary processor that he created to emulate first. There is one register (the accumulator) and 16 opcodes. Each instruction consists of 16 bits, the first 4 of which contain the opcode, the rest of which is the operand. The instructions are given as strings in binary format, e.g., "0101 0101 0000 1111".
My Question: In C++, what is the best way to parse the instructions for processing? Please keep my ultimate goal in mind. Here are some points I've considered:
I can't just process and execute the instructions as I read them because the code is self-modifying: an instruction can change a later instruction. The only way I can see to get around this would be to store all changes and for each instruction to check whether a change needs to be applied. This could lead to a massive amounts of comparisons with the execution of each instruction, which isn't good. And so, I think I have to recompile the instructions in another format.
Although I could parse the opcode as a string and process it, there are instances where the instruction as a whole has to be taken as a number. The increment opcode, for example, could modify even the opcode section of an instruction.
If I were to convert the instructions to integers, I'm not sure then how I could parse just the opcode or operand section of the int. Even if I were to recompile each instruction into three parts, the whole inst开发者_开发百科ruction as an int, the opcode as an int, and the operand as an int, that still wouldn't solve the problem, as I might have to increment an entire instruction and later parse the affected opcode or operand. Moreover, would I have to write a function to perform this conversion, or is there some library for C++ that has a function convert a string in "binary format" to an integer (like Integer.parseInt(str1, 2) in Java)?
Also, I would like to be able to perform operations such as shifting bits. I'm not sure how that can be achieved, but that might affect how I implement this recompilation.
Thank you for any help or advice you can offer!
Parse the original code into an array of integers. This array is your computer's memory.
Use bitwise operations to extract the various fields. For instance, this:
unsigned int x = 0xfeed;
unsigned int opcode = (x >> 12) & 0xf;
will extract the topmost four bits (0xf
, here) from a 16-bit value stored in an unsigned int
. You can then use e.g. switch()
to inspect the opcode and take the proper action:
enum { ADD = 0 };
unsigned int execute(int *memory, unsigned int pc)
{
const unsigned int opcode = (memory[pc++] >> 12) & 0xf;
switch(opcode)
{
case OP_ADD:
/* Do whatever the ADD instruction's definition mandates. */
return pc;
default:
fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1);
}
return pc;
}
Modifying memory is just a case of writing into your array of integers, perhaps also using some bitwise math if needed.
I think the best approach is to read the instructions, convert them to unsigned integers, and store them into memory, then execute them from memory.
Once you've parsed the instructions and stored them to memory, self-modification is much easier than storing a list of changes for each instruction. You can just change the memory at that location (assuming you don't ever need to know what the old instruction was).
Since you're converting the instructions to integers, this problem is moot.
To parse the opcode and operand sections, you'll need to use bit shifting and masking. For example, to get the op code, you mask off the upper 4 bits and shift down by 12 bits (
instruction >> 12
). You can use a mask to get the operand too.You mean your machine has instructions that shift bits? That shouldn't affect how you store the operands. When you get to executing one of those instructions, you can just use the C++ bit-shifting operators
<<
and>>
.
Just in case it helps, here's the last CPU emulator I wrote in C++. Actually, it's the only emulator I've written in C++.
The spec's language is slightly idiosyncratic but it's a perfectly respectable, simple VM description, possibly quite similar to your prof's VM:
http://www.boundvariable.org/um-spec.txt
Here's my (somewhat over-engineered) code, which should give you some ideas. For instance it shows how to implement mathematical operators, in the Giant Switch Statement in um.cpp:
http://www.eschatonic.org/misc/um.zip
You can maybe find other implementations for comparison with a web search, since plenty of people entered the contest (I wasn't one of them: I did it much later). Although not many in C++ I'd guess.
If I were you, I'd only store the instructions as strings to start with, if that's the way that your virtual machine specification defines operations on them. Then convert them to integers as needed, every time you want to execute them. It'll be slow, but so what? Yours isn't a real VM that you're going to be using to run time-critical programs, and a dog-slow interpreter still illustrates the important points you need to know at this stage.
It's possible though that the VM actually defines everything in terms of integers, and the strings are just there to describe the program when it's loaded into the machine. In that case, convert the program to integers at the start. If the VM stores programs and data together, with the same operations acting on both, then this is the way to go.
The way to choose between them is to look at the opcode which is used to modify the program. Is the new instruction supplied to it as an integer, or as a string? Whichever it is, the simplest thing to start with is probably to store the program in that format. You can always change later once it's working.
In the case of the UM described above, the machine is defined in terms of "platters" with space for 32 bits. Clearly these can be represented in C++ as 32-bit integers, so that's what my implementation does.
I created an emulator for a custom cryptographic processor. I exploited the polymorphism of C++ by creating a tree of base classes:
struct Instruction // Contains common methods & data to all instructions.
{
virtual void execute(void) = 0;
virtual size_t get_instruction_size(void) const = 0;
virtual unsigned int get_opcode(void) const = 0;
virtual const std::string& get_instruction_name(void) = 0;
};
class Math_Instruction
: public Instruction
{
// Operations common to all math instructions;
};
class Branch_Instruction
: public Instruction
{
// Operations common to all branch instructions;
};
class Add_Instruction
: public Math_Instruction
{
};
I also had a couple of factories. At least two would be useful:
- Factory to create instruction from text.
- Factory to create instruction from opcode
The instruction classes should have methods to load their data from an input source (e.g. std::istream
) or text (std::string
). The corollary methods of output should also be supported (such as instruction name and opcode).
I had the application create objects, from an input file, and place them into a vector of Instruction
. The executor method would run the 'execute()` method of each instruction in the array. This action trickled down to the instruction leaf object which performed the detailed execution.
There are other global objects that may need emulation as well. In my case some included the data bus, registers, ALU and memory locations.
Please spend more time designing and thinking about the project before you code it. I found it quite a challenge, especially implementing a single-step
capable debugger and GUI.
Good Luck!
精彩评论