开发者

Make a compiled binary run at native speed flawlessly without recompiling from source on a another system?

I know that many people, at a first glance of the question, may immediately yell out "Java", but no, I know Java's qualities. Allow me to elaborate my question first.

Normally, when we want our program to run at a native speed on a system, whether it be Windows, Mac OS X, or Linux, we need to compile from source codes. If you want to run a program of another system in your system, you need to use a virtual machine or an emulator. While these tools allow you to use the program you need on the non-native OS, they sometimes have problems of performance and glitches.

We also have a newer compiler called "JIT Compiler", where the compiler will parse the bytecode program to native machine language before execution. The performance may increase to a very good extent with JIT Compiler, but the performance is still not the same as running it on a native system.

Another program on Linux, WINE, is also a good tool for running Windows program on Linux system. I have tried running Team Fortress 2 on it, and tried experiment with some settings. I got ~40 fps on Windows at its mid-high setting on 1280 x 1024. On Linux, I need to turn everything low at 1280 x 1024 to get ~40 fps. There are 2 notable things though:

  1. Polygon model settings do not seem to affect framerate whether I set it low or high.
  2. When there are post-processing effects or some special effects that require manipulation of drawn pixels of the current frame, the framerate will drop to 10-20 fps.

From this point, I can see that normal polygon rendering is just fine, but when it comes to newer rendering methods that requires graphic card to the job, it slows down to a crawl.

开发者_如何学Go

Anyway, this question is rather theoretical. Is there anything we can do at all? I see that WINE can run STEAM and Team Fortress 2. Although there are flaws, they can run at lower setting. Or perhaps, I should also ask, "is it possible to translate one whole program on a system to another system without recompiling from source and get native speed?" I see that we also have AOT Compiler, is it possible to use it for something like this? Or there are so many constraints (such as DirectX call or differences in software architecture) that make it impossible to have a flawless and not native to the system program that runs at native speed?


The first step to running the same compiled body of code on multiple systems at native speed without recompiling is to choose one processor instruction set and throw out all other systems. If you pick Intel, then you must throw out ARM, MIPS, PowerPC, and so forth because the native machine code instructions for one architecture are completely unintelligible to other processors.

Ok. So now the task is to run the same body of compiled native code on multiple systems (all using the same processor architecture) at native speed without recompiling. So basically, you want to run the same code under different operating systems on the same hardware.

If the hardware is the same and the only difference is the operating system, then the trivial answer is yes, you can do it if you can write your code without making any calls to the operating system. No memory allocation. No console output. No file I/O. No network I/O. No fun.

Furthermore, your code will have to be written in such a way that the code does not require address relocation fixups, since each operating system has different ways to represent relocatable code. One way to do that is to arrange your code on disk exactly as it would appear in memory, including reserving space to use for writable data (global variables, stack, and heap). Then all you have to do to run the code is copy the file bytes into memory at a predefined base address, and jump to the starting address.

The MSDOS .com executable file format has been doing this since at least 1981, and CP/M for long before that.

However, MSDOS didn't have today's virus scanners to contend with back then. Virus scanners get very excited when anyone other than the host OS loads file data into memory and attempts to execute that memory. Because, ya know, that's exactly what viruses do.

Since each OS has its own executable file format, you'll also need to figure out how to get your block of "flawless" native code into memory on all these different operating systems. You will need at a minimum one program loader compiled for each operating system you want to run your block of native code in. While you're writing a program loader for each OS you want to target, you could also define your own file I/O functions that map to the OS native equivalents so that your block of native code can do file I/O on any system. Ditto for console I/O or graphics output.

Oh wait - that's exactly what WINE does.

That's also why the frame rates you see in WINE are so much lower than the same operations in the host OS - WINE is translating Win32 GDI graphics calls into something provided by the native host OS (Linux -> XWindows), and where there isn't an exact function match or where there is an operation semantic mismatch (which is frequently the case), WINE has to implement all the functionality itself, sometimes at great cost.

But given the ubiquity of standardized hardware like IDE drives, USB devices, and BIOS functions, maybe you don't need to go to all the trouble of mapping your own portable APIs onto whatever the OS has built in. Just write a little code to do file I/O to IDE devices, do graphics output using VESA BIOS functions. If you abstract the code a little bit, you can support multiple kinds of hardware and pick the appropriate function pointer to use based on what hardware you find at runtime.

Then you could truly run your block of native code on any system (using one particular processor architecture) at native speed without recompiling.

Oh wait - you just wrote your own OS. ;>


Yes, it is technically possible to translate a binary executable program written for one processor architecture and operating system into a binary executable program that will run on another processor and operating system. It's also an unholy amount of work.

There is a problem with the "native code execution speed" terminology. You can compile a program to native code with optimizations disabled, and the resulting code will be native executable code running at "native code execution speed" but it will probably run slower than the same source code compiled with optimizations enabled. Both are running "native code execution speed", but they are running different quantities and quality of machine code to achieve the same core algorithm.

Machine instructions are much more primitive than higher level source programming languages. When compiling source code into machine code, a lot of information is lost. Data types, for example, are usually reduced by a compiler down to a handful of machine primitives - pointer, integer, float. A string is a pointer to memory. A char is an integer. An object instance is a pointer.

When you translate one machine instruction set into another machine instruction set, you are handicapped because you don't have as much information about the data as a source code compiler has. Compiling from source code, the compiler can see relationships and optimizations in the data that would be very difficult to discover just by looking at the machine code alone.

Story time: Digital Equipment Corporation created a system called FX!32 that took native compiled Win32 Intel x86 executables, decompiled them, and translated the logic into native Alpha AXP processor instructions running Windows NT AXP. In this case, the OSes were at least cut from the same cloth, but one was 32 bit and the other was 64 bit, and at the machine code level they had radically different calling conventions. Nevertheless, it worked, and it worked remarkably well. Due to the differences in hardware, an Intel x86 app running on AXP could eventually run faster than the same app running on Intel hardware. (FX!32 used profiling to reoptimize the AXP code after the Intel app was run a few times, so performance usually started out pretty bad but improved each time you ran the app)

However, even though everything was executing native AXP instructions, the FX!32 translated app never ran as fast as taking the source code and recompiling it specifically for the AXP instruction set. The FX!32 translated native AXP instruction stream was bulked up by the necessity to fully represent the semantics of the original Intel x86 instructions even if the (unseen) higher level algorithm didn't require all aspects of those semantics.

When doing machine instruction to machine instruction translation, you can see/hear every note in the symphony but you may have trouble picking out which ones define the melody.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜