Questions Regarding the Implementation of a Simple CPU Emulator

2022-12-22 03:56 问答作者：

Background Information: Ultimately, I would like to write an emulator of a real machine such as the original Nintendo or Gameboy. However, I decided that I need to start somewhere much, much simpler. My computer science advisor/professor offered me the specifications for a very simple imaginary processor that he created to emulate first. There is one register (the accumulator) and 16 opcodes. Each instruction consists of 16 bits, the first 4 of which contain the opcode, the rest of which is the operand. The instructions are given as strings in binary format, e.g., "0101 0101 0000 1111".

My Question: In C++, what is the best way to parse the instructions for processing? Please keep my ultimate goal in mind. Here are some points I've considered:

I can't just process and execute the instructions as I read them because the code is self-modifying: an instruction can change a later instruction. The only way I can see to get around this would be to store all changes and for each instruction to check whether a change needs to be applied. This could lead to a massive amounts of comparisons with the execution of each instruction, which isn't good. And so, I think I have to recompile the instructions in another format.
Although I could parse the opcode as a string and process it, there are instances where the instruction as a whole has to be taken as a number. The increment opcode, for example, could modify even the opcode section of an instruction.
If I were to convert the instructions to integers, I'm not sure then how I could parse just the opcode or operand section of the int. Even if I were to recompile each instruction into three parts, the whole inst开发者_开发百科ruction as an int, the opcode as an int, and the operand as an int, that still wouldn't solve the problem, as I might have to increment an entire instruction and later parse the affected opcode or operand. Moreover, would I have to write a function to perform this conversion, or is there some library for C++ that has a function convert a string in "binary format" to an integer (like Integer.parseInt(str1, 2) in Java)?
Also, I would like to be able to perform operations such as shifting bits. I'm not sure how that can be achieved, but that might affect how I implement this recompilation.

Thank you for any help or advice you can offer!

Parse the original code into an array of integers. This array is your computer's memory.

Use bitwise operations to extract the various fields. For instance, this:

unsigned int x = 0xfeed;
unsigned int opcode = (x >> 12) & 0xf;

will extract the topmost four bits (0xf, here) from a 16-bit value stored in an unsigned int. You can then use e.g. switch() to inspect the opcode and take the proper action:

enum { ADD = 0 };

unsigned int execute(int *memory, unsigned int pc)
{
  const unsigned int opcode = (memory[pc++] >> 12) & 0xf;

  switch(opcode)
  {
  case OP_ADD:
    /* Do whatever the ADD instruction's definition mandates. */
    return pc;
  default:
    fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1);
  }
  return pc;
}

Modifying memory is just a case of writing into your array of integers, perhaps also using some bitwise math if needed.

I think the best approach is to read the instructions, convert them to unsigned integers, and store them into memory, then execute them from memory.

Once you've parsed the instructions and stored them to memory, self-modification is much easier than storing a list of changes for each instruction. You can just change the memory at that location (assuming you don't ever need to know what the old instruction was).
Since you're converting the instructions to integers, this problem is moot.
To parse the opcode and operand sections, you'll need to use bit shifting and masking. For example, to get the op code, you mask off the upper 4 bits and shift down by 12 bits (instruction >> 12). You can use a mask to get the operand too.
You mean your machine has instructions that shift bits? That shouldn't affect how you store the operands. When you get to executing one of those instructions, you can just use the C++ bit-shifting operators << and >>.

Just in case it helps, here's the last CPU emulator I wrote in C++. Actually, it's the only emulator I've written in C++.

The spec's language is slightly idiosyncratic but it's a perfectly respectable, simple VM description, possibly quite similar to your prof's VM:

http://www.boundvariable.org/um-spec.txt

Here's my (somewhat over-engineered) code, which should give you some ideas. For instance it shows how to implement mathematical operators, in the Giant Switch Statement in um.cpp:

http://www.eschatonic.org/misc/um.zip

You can maybe find other implementations for comparison with a web search, since plenty of people entered the contest (I wasn't one of them: I did it much later). Although not many in C++ I'd guess.

If I were you, I'd only store the instructions as strings to start with, if that's the way that your virtual machine specification defines operations on them. Then convert them to integers as needed, every time you want to execute them. It'll be slow, but so what? Yours isn't a real VM that you're going to be using to run time-critical programs, and a dog-slow interpreter still illustrates the important points you need to know at this stage.

It's possible though that the VM actually defines everything in terms of integers, and the strings are just there to describe the program when it's loaded into the machine. In that case, convert the program to integers at the start. If the VM stores programs and data together, with the same operations acting on both, then this is the way to go.

The way to choose between them is to look at the opcode which is used to modify the program. Is the new instruction supplied to it as an integer, or as a string? Whichever it is, the simplest thing to start with is probably to store the program in that format. You can always change later once it's working.

In the case of the UM described above, the machine is defined in terms of "platters" with space for 32 bits. Clearly these can be represented in C++ as 32-bit integers, so that's what my implementation does.

I created an emulator for a custom cryptographic processor. I exploited the polymorphism of C++ by creating a tree of base classes:

struct Instruction  // Contains common methods & data to all instructions.
{
    virtual void execute(void) = 0;
    virtual size_t get_instruction_size(void) const = 0;
    virtual unsigned int get_opcode(void) const = 0;
    virtual const std::string& get_instruction_name(void) = 0;
};

class Math_Instruction
:  public Instruction
{
  // Operations common to all math instructions;
};

class Branch_Instruction
:  public Instruction
{
  // Operations common to all branch instructions;
};

class Add_Instruction
:  public Math_Instruction
{
};

I also had a couple of factories. At least two would be useful:

Factory to create instruction from text.
Factory to create instruction from opcode

The instruction classes should have methods to load their data from an input source (e.g. std::istream) or text (std::string). The corollary methods of output should also be supported (such as instruction name and opcode).

I had the application create objects, from an input file, and place them into a vector of Instruction. The executor method would run the 'execute()` method of each instruction in the array. This action trickled down to the instruction leaf object which performed the detailed execution.

There are other global objects that may need emulation as well. In my case some included the data bus, registers, ALU and memory locations.

Please spend more time designing and thinking about the project before you code it. I found it quite a challenge, especially implementing a single-step capable debugger and GUI.

Good Luck!

继续阅读：binary emulation machine-code

Questions Regarding the Implementation of a Simple CPU Emulator

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？