Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Intro to Intel Assembly Language: Part 1

assembly

  • Please log in to reply
28 replies to this topic

#1 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 05:13 PM

Just a quick little tutorial on getting started writing assembly language for Intel and Intel-compatible (i.e. AMD) processors. This will cover IA-32 only for now.

First thing you need to know about the processor is that you don't really have a lot to work with - you can't just create your own variables and use them at will. Fortunately, you're given some very fast little variables called registers. Some of them are general-purpose, some are dedicated to performing a specific task. There are six 32-bit registers you can use for almost anything, as well as two more that are dedicated to use on the stack (I'll get to that later):

EAX, EBX, ECX, EDX, ESI, EDI

To retain backwards compatibility with 16-bit predecessors, the above 32-bit registers have the low 16 bits aliased to these registers:

AX, BX, CX, DX, SI, DI

If you were to store 0xCAFEBABE* in EAX, then the value of AX would be 0xBABE. Changing AX to 0xDEAD means that EAX now contains the value 0xCAFEDEAD.

The registers AX, BX, CX, and DX are further subdivided into two registers each. So AX is composed of AH and AL, aliased to the high and low bytes of the word respectively. (E)SI and (E)DI don't do this because they're intended for use as pointer registers, but you can use them for anything, really.

Perhaps an example in C++ would help:
uint32_t eax;

//AX points to low word of EAX. Since Intel processors
//are little-endian, we don't have to do any pointer
//arithmetic.
uint16_t& ax = (uint16_t *)&eax;

//same for AL
uint8_t& al = (uint8_t *)&ax;

//AH points to the HIGH byte, so we have to add one
//byte to get the proper offset into AX.
uint8_t& ah = ((uint8_t *)&ax) + 1;
Well, this is all well and good, but how does one do anything to these registers? The most common command is mov. Take a look:

(By the way, a semicolon starts a comment, just like // in C++.)
;eax = 0xCAFEBABE
mov    eax, 0xCAFEBABE

;ecx = eax
mov    ecx, eax

;edx = ax
;WRONG - DIFFERENT SIZE
mov    edx, ax

;al = 0x0BAD
;WRONG - DIFFERENT SIZE
mov    ax, 0x0BAD
Unlike higher-level programming languages, you are not allowed to assign a small register to a larger one. The source and destination must be the same size.

There is a trick to getting around this by using the movzx and movsx instructions. movzx clears all the upper bits in the target register, and movsz copies the sign bit over.

unsigned long a;
unsigned short b = 5;
//[B]movzx    a, b[/B]
//now a = 5.

signed long c;
signed short d = -4;
// [B]movsx    c, d[/B]
//now c = -4.
//if we were to use movzx then c would equal 65,532 - oops.
Right now you can't accomplish much like this, but later on I'll show you how to read and write from RAM, and do some arithmetic.

*The magic number for compiled Java class files.

Next In This Series
http://forum.codecal...e-part-2-a.html

Edited by dargueta, 30 November 2010 - 12:18 PM.

  • 3

sudo rm -rf / && echo $'Sanitize your inputs!'


#2 Guest_Jordan_*

Guest_Jordan_*
  • Guest

Posted 28 August 2009 - 05:15 PM

A nice, gentle introduction! Perfect. +rep
  • 0

#3 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 05:19 PM

Thanks! I've got more coming soon. I think.
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#4 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 28 August 2009 - 06:19 PM

Nicely done. From the bits I've seen, assembly isn't hard, but you really don't get all the comforts you're used to in "high-level" languages like C :)
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#5 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 06:21 PM

No, you don't, but I like to ride on the edge anyway.
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#6 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 28 August 2009 - 06:39 PM

speed and power. Gotta love it.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#7 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 06:40 PM

...premature hair loss and long nights in with a cup of Ramen...

Edited by dargueta, 30 November 2010 - 12:19 PM.

  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#8 BlaineSch

BlaineSch

    CC Leader

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1559 posts

Posted 28 August 2009 - 07:00 PM

I cant wait for the next one, Assembly is one of those things that have always been on the end of my to-do list.

Is Assembly different per processor?
  • 0

#9 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 07:07 PM

Yes and no. The instruction set for the Intel 8086 was incredibly small, only about 256 instructions or so. The modern Intel instruction set has over 400. Changes in architecture also affect the language, i.e. adding new registers, new memory addressing modes, changing the behavior of functions, etc. Because it's so highly reliant on the processor architecture, there can't be a single assembly language. Due to backwards compatibility and legacy emulation modes, you can write a program for an 80386 and it'll (probably) run on a Pentium IV with no problems (disregarding the operating system, of course). The language is also typically compatible across a generation (sometimes even more), so you can write a program for a Xeon and expect it to run on a Celeron unmodified. The differences, if any, are usually minor, and usually just involves adding an instruction or three.
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#10 ArekBulski

ArekBulski

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 480 posts

Posted 28 August 2009 - 08:48 PM

I would dare to say that assigning variables is the most basic and important statement you can do in C#. Therefore the fact that your tutorial is based on mov instructions seems to be... a very good idea. ;) +rep

Do you think that writing most desktop applications and even operating systems will be become a good idea? .NET and Mono assemblies are always compiled into native code eventually, with the highest (best) instruction set available.
  • 0

#11 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 28 August 2009 - 09:56 PM

Do you think that writing most desktop applications and even operating systems will be become a good idea?


In assembly language? Dear God, no. You should use it for writing device drivers, self-modifying code, viruses, BIOS routines, code for embedded systems, and things that require high speed or have severe memory constraints. Writing a modern operating system in assembly language would be a daunting task, but it has been done.

MenuetOS
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#12 ArekBulski

ArekBulski

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 480 posts

Posted 28 August 2009 - 10:09 PM

Uh-oh, sorry, I have eatten a word. :w00t:

Do you think that writing most desktop applications in managed code and even operating systems will be become a good idea? .NET and Mono assemblies are always compiled into native code eventually, with the highest (best) instruction set available.

Therefore managed code is versatile, while also with the best performance. As long as memory constraints are not as limiting as in a handwatch. :lol:
  • 0





Also tagged with one or more of these keywords: assembly