+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: Intro to Intel Assembly Language: Part 2

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Intro to Intel Assembly Language: Part 2

    With my last tutorial I showed you how to move bytes around in the registers with the mov, movzx, and movsx instructions. That's great, but it doesn't do much, especially since you can't even touch memory. In this tutorial I'm going to teach you the basic addressing modes and how to use them, and when you can't. Warning: I'm going by IA32 architecture here. Most of this will apply to IA64 as well, but not everything. When I remember I'll point out the differences, but we won't hit the first for another tutorial or so. Don't even try this on a 16-bit processor, it'll explode.

    Another thing I forgot to mention: I'm using Intel syntax for this. AT&T syntax reverses the order of the operands and obfuscates addressing for the programmer who uses it, in order to make life easier for the clown who writes the compiler for it.

    Memory Models
    The modern Intel processor can be programmed to view memory in several different ways - two and a half, actually. Each of these memory models dictates an algorithm for how the processor accesses memory. The two and a half models are:

    Segmented (Virtual 8086 Emulation Mode)
    Designed to emulate the 16-bit 8086; this is also known as real mode. All Intel processors start in this mode when they power up; it's the operating system's job to switch into protected mode later. (Hold on...I'll explain in a sec.) Without getting into too much detail, real mode is quite bare as far as modern processor features go - no memory protection, paging, multitasking, not even instruction privilege levels. There's no distinction between user code and operating system code here. What does this have to do with addressing? Well, the processor behaves like an 8086 in one key way: the way addresses are calculated.
    The 8086 view of memory is segmented, i.e. memory is divided into blocks of up to 64K each. (I say up to because the programmer could mess with it to make it smaller.) Because the 8086 was a 16-bit processor, this meant that a memory pointer would consist of a 16-bit segment and a 16-bit offset. Oops, the designers thought. That'd give us access to 4GB of memory! (Remember, this was at least five years before Bill Gates allegedly made his infamous 640K comment.) So what did they decide to do? Scrunch the 32 bits into 20. How? Like this:

    ADDRESS = (SEGMENT << 4) + OFFSET

    Immediately we have problems here...this means that 0000:1000, 0001:0FF0, 0002:0FE0, ... 0010:0000 all point to the exact same byte. In fact, certain bytes on segment boundaries can have up to 4096 aliases! No wonder that didn't last too long as the main addressing scheme.

    Flat Model
    In this mode, memory is treated like one honking big array of bytes. No segments, no shifting, no duplicate addresses - just pure and simple. It requires some more work on the part of the operating system and the processor designers to get this to work on a multitasked system, though, because if unmodified that would mean that application A could easily overwrite application B, maliciously or accidentally.

    Paged Model (Protected Mode)
    (Told you I'd explain.) Protected mode is halfway between the flat model and the segmented model. Memory is segmented, but the segment (still 16 bits even today) merely points to a table that contains a base offset into memory, as well as protection bits describing what is contained in the segment (code or data), access rights (read, write, execute) and the privileges needed to access the segment. DEP and other high-level features use this. For example:

    49C0:8598E31F --> processor looks at the segment table entry 49C0 and finds:
    PRIVILEGE: 3 (this is a user-level segment)
    ACCESS: RW
    TYPE: DATA
    BASE: 0x2138
    LIMIT: 0x80000000 (highest valid address)

    The final address sent to the bus is the base address plus the offset, (0x00002138 << 4) + 0x8598E31F. Notice that we still have the shifting going on, but since we're not restricted to 16 bits anymore, addresses don't wrap around like they used to, so we can access in theory up to 64GB, if the address bus is wide enough to allow it. Even then, some operating systems won't be able to handle the extra memory because they use pure 32-bit pointers.

    Accessing Memory
    The IA32 architecture supports about eight distinct addressing modes. As you've probably guessed, these are different ways of accessing memory. You can choose to use one or another depending on what best suits your application.

    Immediate Addressing
    16-bit? Yes (16-bit addresses only) | 32-bit? Yes | 64-bit? Yes

    This is by far the simplest - moving data to and from a hard-coded address. Hard-coded addresses are typically found in BIOS interrupt routines and firmware, where code and data can be relied upon to be where they need to be. For this kind of access, the default segment is specified by the DS register unless explicitly specified otherwise. This is important to remember in real and protected mode, as it can mean the difference between your application working or overwriting something it shouldn't and crashing.

    Code:
    ;  *((uint32_t *)0xDEADBEEF) = eax;
    mov    [0xDEADBEEF], eax
    
    ;  dx = *((uint16_t *)0x8003C58E;
    ; note that we're using the ES segment register here, like I said earlier.
    mov    dx, es:[0x8003C58E]
    
    ;  *((uint16_t *) 0x01234567) = 1337;
    mov    WORD PTR [0x01234567], 1337
    
    ;  1337 = *((uint16_t *) 0x76543210;
    ;don't even think of trying that.
    Did you notice something strange with the third example? What's with the WORD PTR stuff? This indicates to the compiler that we intend to represent 1337 in 16 bits, as opposed to 32, 64, or 80 bits. (Yes, 80. I'll get to that much later.) I didn't have to do this in either of the first two examples because the compiler knows the sizes of the registers; since the mov instruction requires operands to be the same size, it can figure everything out. But 1337 could be 0x0539, 0x00000539, or 0x0000000000000539, for all it knows. Hence we tell it that we want 16 bits. We just as easily could've put:

    Code:
    mov    DWORD PTR [0x01234567], 1337
    This would force 1337 to be represented as a 32-bit integer. The size directives are:

    BYTE PTR (8 bits)
    WORD PTR (16 bits)
    DWORD PTR (32 bits)
    QWORD PTR (64 bits)
    TBYTE PTR (80 bits - only used in floating-point code.)

    *Belated side note: Intel assembly language is entirely case-insensitive, i.e. mov, Mov, and MOV are all the same thing.

    Register-Indirect Mode
    16-bit? Somewhat (Fewer registers than on IA32/IA64) | 32-bit? Yes | 64-bit? Yes

    Register-indirect mode is like using a pointer variable in C/C++; the register serves as the offset, and the corresponding default segment register is the segment, unless overridden. The default segments are:

    EAX,EBX,ECX,EDX,ESI,EDI --> default to DS
    EBP,ESP --> default to SS

    Note that on 16-bit processors, you are limited to only BX, BP, SI and DI. The default segment registers are still the same.

    Code:
    ; eax = (ss << 4) + ebx
    mov    eax, ss:[ebx]
    
    ; *((uint8_t *)(ds << 4) + edi) = 5
    mov    BYTE PTR [edi], 0x05
    
    ;You can't do this!
    mov    WORD PTR [edi], [0x0001E185]
    
    ;Nor this...
    mov    WORD PTR [0x0001E185], [edi]
    
    ;And this is just as illegal.
    mov    DWORD PTR [eax], [edx]
    Wait...why can't we copy data from one memory location to another? Unfortunately, no. Initially it was because of architecture limitations, and later the way the instructions are encoded, that simply don't allow this. You have to copy from memory into a register, and then from that register back out to memory.

    Code:
    mov    eax, [esi]
    mov    [edi], eax
    I think this tutorial is long enough already (plus I'm sleepy). Next time I'll show you some more addressing modes and some new instructions to use. Right now you'll just have to content yourself with the fact that you can move bytes around in memory now.

    Next In This Series
    Intro to Intel Assembly Language: Part 3
    Last edited by dargueta; 11-30-2010 at 12:23 PM.
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Jordan Guest

    Re: Intro to Intel Assembly Language: Part 2

    I am glad to see more assembly tutorials. I like the way you format your tutorials as well, they are very easy to read. +rep!

  4. #3
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Intro to Intel Assembly Language: Part 2

    It'd be nice if I could count, though. I made like three arithmetic errors. (Fixed them, though).
    sudo rm -rf /

  5. #4
    Join Date
    Jul 2006
    Posts
    16,491
    Blog Entries
    75
    Rep Power
    143

    Re: Intro to Intel Assembly Language: Part 2

    Nicely done. +rep
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  6. #5
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Intro to Intel Assembly Language: Part 2

    Ooh, thanks...16 more rep and 145 more posts and I'm a Code Warrior!
    sudo rm -rf /

  7. #6
    Join Date
    Mar 2009
    Posts
    1,375
    Rep Power
    24

    Re: Intro to Intel Assembly Language: Part 2

    Then take mine. Nice tutorial, and one of not many too. +rep

  8. #7
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Intro to Intel Assembly Language: Part 2

    Well thank you, ArekBulski. I'm going to continue this series until I tell everything I know or Jordan begs me to stop.
    sudo rm -rf /

  9. #8
    Jordan Guest

    Re: Intro to Intel Assembly Language: Part 2

    That will never happen (probably for both but specifically "Jordan begs me to stop"). On with the tutorials!

  10. #9
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Intro to Intel Assembly Language: Part 2

    Good, then. I'll get to work.
    sudo rm -rf /

  11. #10
    Jordan Guest

    Re: Intro to Intel Assembly Language: Part 2

    Great! I've a lot to learn from you and I'm sure others do as well.

+ Reply to Thread
Page 1 of 2 12 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intro to Intel Assembly Language: Part 9A
    By dargueta in forum Assembly Tutorials
    Replies: 4
    Last Post: 08-16-2010, 09:06 AM
  2. Intro to Intel Assembly Language: Part 8
    By dargueta in forum Assembly Tutorials
    Replies: 6
    Last Post: 07-15-2010, 09:21 AM
  3. Intro to Intel Assembly Language: Part 5
    By dargueta in forum Assembly Tutorials
    Replies: 6
    Last Post: 03-07-2010, 12:45 PM
  4. Intro to Intel Assembly Language: Part 7
    By dargueta in forum Assembly Tutorials
    Replies: 5
    Last Post: 12-30-2009, 02:17 PM
  5. Intro to Intel Assembly Language: Part 6
    By dargueta in forum Assembly Tutorials
    Replies: 3
    Last Post: 12-14-2009, 06:06 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts