Why There is a difference between assembly languages like Windows, Linux?
I am relatively new to all this low level stuff,assembly language.. and want to learn more detail. Why there is a difference between Linux, Windows Assembly languages?
As I understand when I compile a C code Operating system does not really produce pure machine or assembly code, it produces OS dependent bin开发者_开发百科ary code.But why ?
For example when I use a x86 system, CPU only understands x86 ASM am I right?.So Why we dont write pure x86 assembly code and why there are different assembly variations based on Operating system? If we would write pure ASM or OS produce pure ASM there wouldn't be binary compatilibty issues between Operating systems or Not ?
I am really wondering all reasons behind them. Any detailed answer, article, book would be great. Thanks.
There is no difference. The assembly code is the same if the processor is the same. x86 code compiled on Windows is binary compatible with x86 code on Linux. The compiler does not produce OS-dependent binary code, but it may package the code in a different format (e.g. PE vs. ELF).
The difference is in which libraries are used. In order to use OS stuff (I/O for example) you must link against the operating system's libraries. Unsurprisingly, Windows system libraries are not available on a Linux machine (unless you have Wine of course) and vice-versa.
Well, you don't run straight assembly. The code has to be in some sort of executable format: windows uses PE, most Unices use ELF now (although there have been others, like a.out).
The base assembly instructions are the same, and the functions you create with them are the same.
The problem comes with access to other resources. The processor is really good at calculation, but can't access the hard disk, or print a character to the screen, or connect to a Bluetooth phone. These elements are always in some way operating system dependent. They are implemented in terms of syscalls, where the processor signals the operating system to perform a certain task. Task number 17 on linux isn't necessarily task 17 on windows; they may not even have equivalents.
Since most libraries have some syscalls at their lowest levels, this is why code can't just be recompiled in every case.
In addition to other answers.
OS dictates its Application Binary Interface (ABI), which includes format of executable objects. These are Executable and Linkable Format (ELF) for Linux (and many other Unix-like systems), and Portable Executable (PE) on Windows. See this table for other formats.
Unless you are using an embedded system development environment, you are compiling with compilers that are targeted to a particular runtime. That runtime defines the conventions for the use of the hardware: argument passing, exception handling, etc. These conventions interact with the operating system, or at least with the available runtime libraries that the program needs to link with.
Historically Linux assembly tends to be done using AT&T syntax, since this is what the GNU Assembler supports. Likewise, Windows assemblers tend to use the Intel syntax, as with MASM and NASM.
All x86 assemblers produce the same output -- that is, x86 machine code. And you can use NASM or the GNU Assembler on Linux to program under Intel syntax, and the GNU Assembler on Windows to program under AT&T syntax.
Assembly language is related to CPU architecture not with O.S., but O.S. have a series of system function compiled in binary that your assembly program can invoke, by interrupt calling. For example standard input output , operation ecc....
The OS determines two things: (1) the calling convention, which defines how parameters go on the stack and therefore impacts the assembly code, and (2) the run-time libraries that implement common functions like memory allocation, input/output, higher-level math, etc.
So while x+y
compiles to the same assembly code under Windows or Linux on an x86 processor, y = sin(x)
will be different due to a different calling convention and different math library.
Beyond that, the assembly language itself is dependent on the processor. x86, x86_64, ARM, PowerPC, each have their own assembly language.
There's no difference in the assembly languages (although there may be differences between assemblers, and hence the notations used), provided we're sticking to x86. Both Linux and Microsoft Windows do run on other architectures, more so in the case of Linux.
However, an operating system nowadays doesn't just load a program into memory and let it go. It provides a large amount of services. Since it also protects programs from each other, it imposes restrictions. To do anything other than basic computation, it is usually necessary to go through the operating system. (This was less true of older operating systems, like MS-DOS and CP/M, which could load programs that would run independently, but nowadays pretty much every non-embedded system has a modern OS.)
Nor are programs stored as plain binary blobs. It's normally necessary to link with other libraries, often as the program is loaded for execution (that's how DLLs work, for example), and it is necessary to link with the OS. There may be other information the OS requires, and therefore there has to be some sort of information about the binary blob in the executable file. This varies between OSes.
Therefore, executable files have to be in a format to be loaded into memory, and this varies from OS to OS. To do anything useful, they have to make OS calls, which are different between systems. That's why you can't take a Windows executable and associated libraries and run it on Linux.
There exist a few assemblers for various platforms which, given a source file, will produce an output binary file directly which is designed to be loaded at a particular address. Such assemblers have been popular for some small microcontrollers, or for some historical processors like the 6502 and Z80. When assembling the program, it would be necessary to know the address where it would be expected to reside; using a different address would require re-assembling the program. On the other hand, assembly in such a system was a single-step process. Run the assembler on the source code and get an executable output. In some cases, it would be possible to have the source code, assembler, and output all in memory at once (on my Commodore 64, I used an assembler which was published in Compute's Gazette magazine that worked like that).
Although reassembling everything any time its address changes might have been practical for a program that will "take over the machine", in many cases it's desirable to use a multi-step process where source files are processed into object-code files, which contain the assembled instructions but also contain various kinds of "symbolic" information about them; these files are then processed in various ways so as to either yield a memory image that may be loaded directly into memory, or else a combined relocatable object file which an operating system's loader will know how to adjust for any address to which it might be loaded.
In order for an object-linking system to be useful, it must allow certain kinds of address computation to be deferred until a program is linked or loaded. Some systems only allow extremely simple computations to be performed at link/load time, while others allow for more complicated computations. The simpler schemes may be more efficient when they're workable, but their limitations may force workarounds. As an example, a routine which will be using BX to loop through a data structure with less than 256 bytes might be written as something like:
mov bx,StartAddr
lp: mov al,[bx] ... do some computation inc bx cmp bl,<(StartAddr+Length) ; < prefix operator means "LSB of" jnz lp
It would be possible to use cmp bx,(StartAddr+Length)
, but if the compilation tools can support it, comparing just the low byte would be faster. On the other hand, some kinds of 16-bit assembly/linking tools might require that all address fixups be done with 16-bit addresses stored in code.
Because different systems allow for different features in their object-code formats, they require different features in their assembly languages to control them. The instruction sets may be specified by the chip manufacturer, but features for expressing relocatable address computation generally are not.
精彩评论