Question regarding Assembly and computer programs
I read this article: http://en.wikipedia.org/wiki/Assembly_language
It says:
Take, for example, the instruction that tells an x86/IA-32 processor to move an immediate 8-bit value into a register. The binary code for this instruction is 10110 followed by a 3-bit identifier for which register to use. The identifier for the AL register is 000, so the following machine code loads the AL register with the data 01100001.[4]
10110000 01100001
It explains how it is easier to write it as:
MOV AL, 61h ; Load AL with 97 decimal (61 hex)
Now here are my question(s).
So, computer programs/executables are just binary data (0's and 1's)?
When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?
If I开发者_StackOverflow社区 have this 10110000 01100001
program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001
figures?
How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?
So, computer programs/executables are just binary data (0's and 1's)?
Yes like images, videos and other data.
When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?
Yes, in this exact case it will always be correct as mov al, 61h
is always assembled to 0xB0 0x61
(in Intel 64 and IA-32 Architectures Software Developer's Manuals and other places usually written as B0 61
) in 16-, 32- and 64-bit mode. Note that 0xB0 0x61
= 0b10110000 0b01100001
.
You can find the encoding for different instructions in Volume 2A. For example here it is "B0+ rb MOV r8, imm8 E Valid Valid Move imm8 to r8." on page 3-644.
Other instructions have different meanings depend on whether they are interpreted in 16/32 or 64-bit mode. Consider this short sequence of bytes: 66 83 C0 04 41 80 C0 05
In 16-bit mode they mean:
00000000 6683C004 add eax,byte +0x4
00000004 41 inc cx
00000005 80C005 add al,0x5
In 32-bit mode they mean:
00000000 6683C004 add ax,byte +0x4
00000004 41 inc ecx
00000005 80C005 add al,0x5
And finally in 64-bit mode:
00000000 6683C004 add ax,byte +0x4
00000004 4180C005 add r8b,0x5
So the instructions cannot always be disassembled correctly without knowing the context (this is not even taking into account that other things than code can reside in the text segment and the code can do nasty stuff like generate code on the fly or self-modify).
If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?
Yes, in the sense that if the application contains the mov al, 61h
instruction the file will contain the bytes 0xB0
and 0x61
.
How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"? Can I do that in C#/C++ directly?
After loading the code into memory (and the memory is correctly setup permission-wise) it can just jump to or call it and have it run. One thing you have to realize even though the operating system is just another program it is a special program since it got to the processor first! It runs in a special supervisor (or hypervisor) mode that allows it to things normal (user) programs aren't allowed to. Like set up preemptive multitasking that makes sure processes are automatically yielded.
The first processor is also responsible for waking up the other cores/processors on a multi-core/multi-processor machine. See this SO question.
To call code you load yourself directly in C++ (I don't think it is possible in C# without resorting to unsafe/native code) requires platform specific tricks. For Windows you probably want to look at VirtualProtect
, and under linux mprotect(2)
. Or perhaps more realistically from a file which is the mapped using either this process for Windows or mmap(2)
for linux.
that are a lot of questions:
Yes, computer programs/executables are just binary data 0/1s.
Yes, the disassembler tries to make sense of 0/1s... and it uses additional knowledge about the file format (EXE follows usually the PE spec, COM is different spec etc.) and the OS the binary is supposed to run on and the APIs available etc. .
These two bytes (one instruction with a parameter) would read exactly like that... although it depends on program they are part of - as mentioned different file types follow different specifications.
Usually the OS loads the file and processes its content according to the specification - for example rearranges some memory areas etc. . Then it marks the memory areas that contains executable code as executable and does a JMP or CALL to the address of the first instruction of the so-called entry-point (again this differs depending on the file format / specification at hand).
In C# you don't deal with assembly as a language but with "byte code" (IL instructions)... you can emit thos or load thos via Framework methos etc. In c++ you could deal directly with assembly if you really want to but that is not portable and could get complicated... so you usually only do that when the gain is really worth it (like a needed performance boost by factor 10).
So, computer programs/executables are just binary data (0's and 1's)?
YES.
When viewed with a disassembler like OllyDbg it just tries to revert those 0's and 1's back to some Assembly (Intel?) language and the output is mostly correct?
YES. Except that if the binary data represents code for the cpu the disassembler is designed for, the the output will be totally correct, not just 'mostly' correct.
If I have this 10110000 01100001 program on my SSD and I write a C#/PHP/wtvr application that reads the contents of the file and output them as bits, will I see these exact 10110000 01100001 figures?
YES
How does the operating system do the actual "execution"? How does it tell the processor that "hey, take these bits and run them"?
The operating system is just a program like any other, it is instructions being executed on the processor. Simplistically when the operating system executes the code, all it does is jump to the start address of where the code is situated and hence the processor now begins executing whatever code is at that location.
Can I do that in C#/C++ directly?
Don't forget that C is compiled down to assembly language when it executes and at the point it is executed, it is no different to any other program that could run on a given CPU. Yes, you can use inline assembly for example to jump to a given memory location and execute the code.
精彩评论