Just a quick little tutorial on getting started writing assembly language for Intel and Intel-compatible (i.e. AMD) processors. This will cover IA-32 only for now.
First thing you need to know about the processor is that you don't really have a lot to work with - you can't just create your own variables and use them at will. Fortunately, you're given some very fast little variables called registers. Some of them are general-purpose, some are dedicated to performing a specific task. There are six 32-bit registers you can use for almost anything, as well as two more that are dedicated to use on the stack (I'll get to that later):
EAX, EBX, ECX, EDX, ESI, EDI
To retain backwards compatibility with 16-bit predecessors, the above 32-bit registers have the low 16 bits aliased to these registers:
AX, BX, CX, DX, SI, DI
If you were to store 0xCAFEBABE* in EAX, then the value of AX would be 0xBABE. Changing AX to 0xDEAD means that EAX now contains the value 0xCAFEDEAD.
The registers AX, BX, CX, and DX are further subdivided into two registers each. So AX is composed of AH and AL, aliased to the high and low bytes of the word respectively. (E)SI and (E)DI don't do this because they're intended for use as pointer registers, but you can use them for anything, really.
Perhaps an example in C++ would help:
Well, this is all well and good, but how does one do anything to these registers? The most common command is mov. Take a look:Code:uint32_t eax; //AX points to low word of EAX. Since Intel processors //are little-endian, we don't have to do any pointer //arithmetic. uint16_t& ax = (uint16_t *)&eax; //same for AL uint8_t& al = (uint8_t *)&ax; //AH points to the HIGH byte, so we have to add one //byte to get the proper offset into AX. uint8_t& ah = ((uint8_t *)&ax) + 1;
(By the way, a semicolon starts a comment, just like // in C++.)
Unlike higher-level programming languages, you are not allowed to assign a small register to a larger one. The source and destination must be the same size.Code:;eax = 0xCAFEBABE mov eax, 0xCAFEBABE ;ecx = eax mov ecx, eax ;edx = ax ;WRONG - DIFFERENT SIZE mov edx, ax ;al = 0x0BAD ;WRONG - DIFFERENT SIZE mov ax, 0x0BAD
There is a trick to getting around this by using the movzx and movsx instructions. movzx clears all the upper bits in the target register, and movsz copies the sign bit over.
Right now you can't accomplish much like this, but later on I'll show you how to read and write from RAM, and do some arithmetic.Code:unsigned long a; unsigned short b = 5; //movzx a, b //now a = 5. signed long c; signed short d = -4; // movsx c, d //now c = -4. //if we were to use movzx then c would equal 65,532 - oops.
*The magic number for compiled Java class files.
Next In This Series
Intro to Intel Assembly Language: Part 2
Last edited by dargueta; 11-30-2010 at 12:18 PM.
sudo rm -rf /
A nice, gentle introduction! Perfect. +rep
Thanks! I've got more coming soon. I think.
sudo rm -rf /
Nicely done. From the bits I've seen, assembly isn't hard, but you really don't get all the comforts you're used to in "high-level" languages like C![]()
No, you don't, but I like to ride on the edge anyway.
sudo rm -rf /
speed and power. Gotta love it.
...premature hair loss and long nights in with a cup of Ramen...
Last edited by dargueta; 11-30-2010 at 12:19 PM.
sudo rm -rf /
I cant wait for the next one, Assembly is one of those things that have always been on the end of my to-do list.
Is Assembly different per processor?
Yes and no. The instruction set for the Intel 8086 was incredibly small, only about 256 instructions or so. The modern Intel instruction set has over 400. Changes in architecture also affect the language, i.e. adding new registers, new memory addressing modes, changing the behavior of functions, etc. Because it's so highly reliant on the processor architecture, there can't be a single assembly language. Due to backwards compatibility and legacy emulation modes, you can write a program for an 80386 and it'll (probably) run on a Pentium IV with no problems (disregarding the operating system, of course). The language is also typically compatible across a generation (sometimes even more), so you can write a program for a Xeon and expect it to run on a Celeron unmodified. The differences, if any, are usually minor, and usually just involves adding an instruction or three.
sudo rm -rf /
I would dare to say that assigning variables is the most basic and important statement you can do in C#. Therefore the fact that your tutorial is based on mov instructions seems to be... a very good idea.+rep
Do you think that writing most desktop applications and even operating systems will be become a good idea? .NET and Mono assemblies are always compiled into native code eventually, with the highest (best) instruction set available.
proudly presenting my personal website and game website: F1Simulation. a thrilling Managed DirectX racing game... also my Ask Me
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks