Jump to content

Intro to Intel Assembly Language: Part 1

- - - - -

  • Please log in to reply
27 replies to this topic

#1
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Just a quick little tutorial on getting started writing assembly language for Intel and Intel-compatible (i.e. AMD) processors. This will cover IA-32 only for now.

First thing you need to know about the processor is that you don't really have a lot to work with - you can't just create your own variables and use them at will. Fortunately, you're given some very fast little variables called registers. Some of them are general-purpose, some are dedicated to performing a specific task. There are six 32-bit registers you can use for almost anything, as well as two more that are dedicated to use on the stack (I'll get to that later):

EAX, EBX, ECX, EDX, ESI, EDI

To retain backwards compatibility with 16-bit predecessors, the above 32-bit registers have the low 16 bits aliased to these registers:

AX, BX, CX, DX, SI, DI

If you were to store 0xCAFEBABE* in EAX, then the value of AX would be 0xBABE. Changing AX to 0xDEAD means that EAX now contains the value 0xCAFEDEAD.

The registers AX, BX, CX, and DX are further subdivided into two registers each. So AX is composed of AH and AL, aliased to the high and low bytes of the word respectively. (E)SI and (E)DI don't do this because they're intended for use as pointer registers, but you can use them for anything, really.

Perhaps an example in C++ would help:
uint32_t eax;

//AX points to low word of EAX. Since Intel processors
//are little-endian, we don't have to do any pointer
//arithmetic.
uint16_t& ax = (uint16_t *)&eax;

//same for AL
uint8_t& al = (uint8_t *)&ax;

//AH points to the HIGH byte, so we have to add one
//byte to get the proper offset into AX.
uint8_t& ah = ((uint8_t *)&ax) + 1;
Well, this is all well and good, but how does one do anything to these registers? The most common command is mov. Take a look:

(By the way, a semicolon starts a comment, just like // in C++.)
;eax = 0xCAFEBABE
mov    eax, 0xCAFEBABE

;ecx = eax
mov    ecx, eax

;edx = ax
;WRONG - DIFFERENT SIZE
mov    edx, ax

;al = 0x0BAD
;WRONG - DIFFERENT SIZE
mov    ax, 0x0BAD
Unlike higher-level programming languages, you are not allowed to assign a small register to a larger one. The source and destination must be the same size.

There is a trick to getting around this by using the movzx and movsx instructions. movzx clears all the upper bits in the target register, and movsz copies the sign bit over.

unsigned long a;
unsigned short b = 5;
//[B]movzx    a, b[/B]
//now a = 5.

signed long c;
signed short d = -4;
// [B]movsx    c, d[/B]
//now c = -4.
//if we were to use movzx then c would equal 65,532 - oops.
Right now you can't accomplish much like this, but later on I'll show you how to read and write from RAM, and do some arithmetic.

*The magic number for compiled Java class files.

Next In This Series
http://forum.codecal...e-part-2-a.html

Edited by dargueta, 30 November 2010 - 12:18 PM.

sudo rm -rf /

#2
Guest_Jordan_*

Guest_Jordan_*
  • Guests
A nice, gentle introduction! Perfect. +rep

#3
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Thanks! I've got more coming soon. I think.
sudo rm -rf /

#4
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,822 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
Nicely done. From the bits I've seen, assembly isn't hard, but you really don't get all the comforts you're used to in "high-level" languages like C :)
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#5
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
No, you don't, but I like to ride on the edge anyway.
sudo rm -rf /

#6
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,822 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
speed and power. Gotta love it.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#7
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
...premature hair loss and long nights in with a cup of Ramen...

Edited by dargueta, 30 November 2010 - 12:19 PM.

sudo rm -rf /

#8
BlaineSch

BlaineSch

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,448 posts
I cant wait for the next one, Assembly is one of those things that have always been on the end of my to-do list.

Is Assembly different per processor?

#9
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
Yes and no. The instruction set for the Intel 8086 was incredibly small, only about 256 instructions or so. The modern Intel instruction set has over 400. Changes in architecture also affect the language, i.e. adding new registers, new memory addressing modes, changing the behavior of functions, etc. Because it's so highly reliant on the processor architecture, there can't be a single assembly language. Due to backwards compatibility and legacy emulation modes, you can write a program for an 80386 and it'll (probably) run on a Pentium IV with no problems (disregarding the operating system, of course). The language is also typically compatible across a generation (sometimes even more), so you can write a program for a Xeon and expect it to run on a Celeron unmodified. The differences, if any, are usually minor, and usually just involves adding an instruction or three.
sudo rm -rf /

#10
ArekBulski

ArekBulski

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,376 posts
I would dare to say that assigning variables is the most basic and important statement you can do in C#. Therefore the fact that your tutorial is based on mov instructions seems to be... a very good idea. ;) +rep

Do you think that writing most desktop applications and even operating systems will be become a good idea? .NET and Mono assemblies are always compiled into native code eventually, with the highest (best) instruction set available.

#11
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript

Quote

Do you think that writing most desktop applications and even operating systems will be become a good idea?

In assembly language? Dear God, no. You should use it for writing device drivers, self-modifying code, viruses, BIOS routines, code for embedded systems, and things that require high speed or have severe memory constraints. Writing a modern operating system in assembly language would be a daunting task, but it has been done.

MenuetOS
sudo rm -rf /

#12
ArekBulski

ArekBulski

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,376 posts
Uh-oh, sorry, I have eatten a word. :w00t:

Do you think that writing most desktop applications in managed code and even operating systems will be become a good idea? .NET and Mono assemblies are always compiled into native code eventually, with the highest (best) instruction set available.

Therefore managed code is versatile, while also with the best performance. As long as memory constraints are not as limiting as in a handwatch. :lol:




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users