Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Assembly, Local Variables and Functions (Win32, NASM)

hello world assembly

  • Please log in to reply
No replies to this topic

#1 RhetoricalRuvim

RhetoricalRuvim

    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1310 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 13 August 2011 - 10:40 PM

In my previous tutorial we went over how to use variables, call Win32 API functions, and some other stuff. But that's not enough to make bigger and more advanced programs. After the introduction to assembly language, I decided that the best topic for the next tutorial will be local variables and functions.

Note: If you're not familiar with the concepts in my previous tutorial, I recommend reading that before you continue.

Let's start out by looking at a piece of C++ code:
#include <iostream> 

using namespace std; 

int main(int argc, char * argv[]){ 
    char yourname[512]; 
    char howyoudo[512]; 
    int somenumber; 
    cout<<"Hello, please enter your name: \r\n"; 
    cin>>yourname; 
    cout<<"Hello "<<yourname<<", how are you doing today? \r\n"; 
    cin>>howyoudo; 
    cout<<"Anyway, I've got to go, now. \r\nSee you later! \r\n"; 
    for (somenumber= 0; somenumber < 10; somenumber++) cin.get(); 
}
The part we want to focus on is the function and the local variables.

A local variable is a variable that is stored in the stack.
A variable that's stored in a program's data or bss section is called a global variable.

Looking at the above C++ function, what's to be noticed concerning variables?

One thing is that the variables are defined within the function.
The other thing is that there are variables defined inside the parentheses, right after the function name; those variables are called arguments.

A Basic Function
Let's see how a basic function works:
call my_function 

;; some more code...  

my_function: 
;; .....  
ret
In the above code, the EIP is pushed to the stack and control is transferred to my_function. Then, upon the RET instruction, the top 4 bytes are poped from the stack into the EIP register.
In other words, the code at my_function is called, and then the processor returns back to the code that called the function, when the RET instruction is executed.

Calling Conventions
(Note that arguments can also be called parameters.)

First of all, to call a function you push all the arguments, in backwards order, onto the stack.

But wouldn't the stack pointer change if you push the arguments? That's why either you, or the function you called, have/has to change the stack pointer back to how it was before.

How do we do that? It depends on the calling convention used. The calling convention is the method of calling functions.

Standard Calling Convention
The standard calling convetion exploits the fact that the RET instruction can take an optional operand.



RET - Return
The return instruction has one, optional, operand.

Format:
RET n

Action:
POP EIP
ESP= ESP + n

If you provide an operand, n, then n will be added to the ESP register after returning, to clean up the stack.

Otherwise ESP would remain unchanged after the return.

Note that n does NOT include the 4 bytes used for the return instruction pointer.


So you would use this method for calling a function:
push dword 0 
call my_function 

;; some more code...  

my_function: 
;; .....  
ret 4

Besides the standard calling convention, there's also the C calling convention.

C Calling Convention
In the C calling convention, the caller has to clean up the stack. This convention is useful when there is a variable number of arguments.

So to call a function with the C calling convention, you would do something like this:
push dword 0 
call my_function 
add esp, 4 

;; some more code...  

my_function: 
;; .....  
ret

Now that we can call functions, let's learn how to access arguments.

Accessing Arguments
Most Win32 API functions use the standard calling convention.

I like the standard calling convention better, and would be using that most of the time.

One thing to remember when you're accessing arguments is, NEVER use the POP instruction to get an argument, unless you REALLY know what you're doing. Even then, you'd probably prefer the easier MOV syntax.

The following code returns, in EAX, the first argument:
my_function: 
mov eax, dword [esp+4] 
ret 4

Since the value at [esp] contains the return instruction pointer, it would be wise to not mistake it for the first argument.

After the caller pushes the argument to the stack, ESP points to the argument. But when the caller calls the function, EIP (which is 4 bytes in size) is also pushed to the stack, which means that ESP now points to the instruction pointer, so we use ESP+4 to skip over the instruction pointer to the first argument.

That method is fine for small functions, but there's a better method if you have a bigger function.


Setting Up A Stack Frame
You might be wondering what this EBP register is.

See if you can kind of understand from this code:
;; the prologue: 
push ebp     ;; save ebp 
mov ebp, esp     ;; now save esp 
;; note that, at this point, previous ebp = [ [ ebp ] ] 
;; that, obviously, is not allowed, but there is a way to 
;; restore ebp using one or two instructions. 
sub esp, 4 
;; reserve 4 bytes on the stack for local variables. 


;; let's say we call another function and that function does this: 
;; the prologue: 
push ebp ;; save ebp 
mov ebp, esp ;; save esp 
sub esp, 16  ;; reserve 16 bytes on the stack for local variables 

;; some code ...  

;; the epilogue: 
mov esp, ebp  ;; restore esp 
pop ebp         ;; restore ebp 

;; then it's time for our first function to exit 
;; the epilogue: 
mov esp, ebp  ;; restore esp 
pop ebp         ;; restore ebp
Think about the above code for a minute.

When we push EBP, we save EBP. Then we save ESP, by putting it into EBP.
Later on, we retriev ESP again from EBP, after what we can recover the previous EBP.

The prologue is the method of setting up a new stack frame.
The epilogue is the method of switching back to the previous stack frame.

I hope that, if you thought about the above piece of code, you understood how the EBP register is used.

Setting Up A Stack Frame - Another Method
Another method uses the ENTER and LEAVE instructions.

enter 16, 0
is the same as
push ebp
mov ebp, esp
sub esp, 16


I don't remember what operand2 does, but you can read about it in the Intel Architecture Software Developer's Manual volume 1 (Document Download Page).

leave
is the same as
mov esp, ebp
pop ebp


However, this method might not necessarily be faster than the other method, because of how some instructions, such as ENTER and LEAVE, work. But I like this method better anyway, because it's simpler and more prominent.

So how do we use the stack frame?


Using A Stack Frame
Knowing how to set up a stack frame wouldn't help much if we don't know how to use one.

To use a stack frame, we need to use effective addressing, with the EBP register.

When our function is called, the return instruction pointer is pushed to the stack; that's 4 bytes. Then, when we set up the stack frame, the EBP register is pushed to the stack; that's 4 more bytes.
So now we have to skip over a total of 8 bytes, to reach the first argument; but this time we use the EBP register, instead of the ESP register, as in the last example.

Let's say, as in Win32 API functions, we have 4 bytes per argument.
five_argument_function: 
enter 0, 0 

;; Get the first argument. 
mov eax, dword [ebp+8] 

;; Get the second argument. 
mov eax, dword [ebp+12] 
;; Note that the address of the second argument is 
;; equal to the address of the first argument plus the 
;; size of the first argument. 
;; The address of the first argument is EBP+8 and the 
;; size of each argument, in this case, is 4, so 
;; EBP+8 + 4 is the same as EBP+12 

;; Get the third argument. 
mov eax, dword [ebp+16] 

;; Get the fourth argument. 
mov eax, dword [ebp+20] 

;; Get the fifth argument. 
mov eax, dword [ebp+24] 

leave 
ret 20

So, from the above example, we see that the address of the nth argument is equal to EBP+8 + ( (n - 1) * 4).

Also, there are 5 arguments, with each being 4 bytes, so we free 5 * 4 = 20 bytes with the RET instruction.

We've seen how to access function arguments; but we also need to be able to use local variables.


Local Variables
To reserve space on the stack for local variables, we need to somehow subtract the number of bytes to reserve from ESP.

One way is by using the SUB instruction. But there are other ways, too, just like there are other ways of defining global variables.

So let's look at this code:
my_function: 
push ebp 
mov ebp, esp 
push dword 65 

;; some more code...  

mov esp, ebp 
pop ebp 
ret 0
What we did there is we set up a new stack frame, reserved 4 bytes on the stack, and set our local variable to 65.

If you think about that code carefully, you would figure out that it's the same as this:
my_function: 
push ebp 
mov ebp, esp 
sub esp, 4 

mov dword [ebp-4], 65 

;; some more code...  

mov esp, ebp 
pop ebp 
ret 0
If we reserve n bytes of stack space, we can access memory addresses EBP-n through EBP-1, which are considered local variables. How we manage that stack space is our choice, which is another thing I like about assembly language.

For more reference on how local variables and the stack work, refer to the Intel Architecture Software Developer's Manual, volume 1 (Document Download Page).


Win32 Structures
As mentioned earlier, it's your choice how you want to use local, and even global, variables. In assembly language, getting the correct result is really what matters.

When you make a new window, you need a window class. To make a new window class, you need to fill out a WNDCLASSEX structure (WNDCLASS will probably work too, but I never used that before). When we make a WNDCLASSEX structure, we first need to know where to put it. We can save it as a local variable.

The WNDCLASSEX structure is 48 bytes in size, so we would have to reserve at least 48 bytes on the stack, for local variables.

If the start of the WNDCLASSEX structure will be at [ebp-48], then the first entry of the structure would be [ebp-48], the second would be [ebp-44], the third [ebp-40], and so on.

For a reference to the WNDCLASSEX structure, go to That Page.

For a reference to any Windows structure, go to google, type the structure's name, and use search. You can also include an optional " structure windows " as (without the quotes) part of the search, to search more specifically for a windows structure.


LEA Instruction - Load Effective Address
The LEA instruction is pretty useful, especially when working with local variables. Up until now, to get the address of [ebp-4] we have to do this:
mov eax, ebp 
sub eax, 4
But with the LEA instruction, we can just use the effective address, as if we were accessing memory:
lea eax, [ebp-4]


Combining Everything Into A Program - The Plan
  • Call main.
  • Exit, returning 0.
main:
  • Local variable 1 = my_function
  • Display message box with msg1 as text.
  • Call [local variable 1], with msg2 as an argument.
  • Call c_function, with msg3 as an argument.
  • Return.
my_function:
  • Display message box with first argument as text.
  • Return.
c_function:
  • Reserve 4 bytes on the stack, and set local variable 1 to first argument.
  • Display message box with local variable 1 as text.
  • Return.

main and my_function are standard call functions.
c_function is a C calling convention function.

Combining Everything Into A Program - The Code
;; We define the externs. 
extern MessageBoxA                    ;; MessageBox is one of those functions that has a unicode version, so we have to use the A suffix. 
extern ExitProcess 

;; Then we have the symbol import table. 
import MessageBoxA user32.dll                ;; MessageBox is a function defined in user32.dll 
import ExitProcess kernel32.dll              ;; ExitProcess is part of kernel32.dll 

;; This is the code section; use 32-bit code. 
section .text use32 
;; This is where the program entry point is. 
..start: 

;; main(); 
call main 

;; ExitProcess(0); 
push dword 0 
call [ExitProcess] 

;; This is our main() function; though it doesn't take any arguments. 
;; We could modify the code to scan the command line and obtain the 
;; argc and argv values, but what we have is fine for now. 
main: 
    enter 4, 0 
    
    ;; [ebp-4]= my_function; 
    mov dword [ebp-4], my_function 
    
    ;; [ebp-4] is a local variable. 
    ;; At the moment, it stores the address of 
    ;; our my_function() function. 
    
    ;; MessageBoxA(0, msg1, the_title, 0); 
    push dword 0 
    push dword the_title 
    push dword msg1 
    push dword 0 
    call [MessageBoxA] 
    
    ;; [ebp-4](msg2); 
    push dword msg2 
    call [ebp-4] 
    
    ;; c_function(msg3); 
    push dword msg3 
    call c_function 
    add esp, 4 
    
    leave 
ret 0 

;; This is a standard calling convention function. 
my_function: 
    push ebp 
    mov ebp, esp 
    
    ;; [ebp+8] is the first argument. 
    
    ;; MessageBox(0, [ebp+8], the_title, 0); 
    push dword 0 
    push dword the_title 
    push dword [ebp+8] 
    push dword 0 
    call [MessageBoxA] 
    
    mov esp, ebp 
    pop ebp 
ret 4 

;; This is a C calling convention function. 
c_function: 
    enter 0, 0 
    push dword [ebp+8] 
    
    ;; [ebp-4] is now equal to the first argument, 
    ;; due to the PUSH DWORD [EBP+8] instruction, 
    ;; which reserves 4 bytes on the stack and, 
    ;; at the same time, sets those 4 bytes to 
    ;; the value of the first argument. 
    
    push dword [ebp-4] 
    call my_function 
    
    leave 
ret 

;; This goes into the data section. 
section .data 
;; We define the data global variables. 
the_title                                 db "Local Variables Test", 0 
msg1                                      db "Hello World! ", 13, 10, 0 
msg2                                      db "Oh, you're still there? ", 13, 10, 13, 10, "Well then, hello again.", 0 
msg3                                      db "Come on, don't be silly. ", 13, 10, "Aren't you ever going to leave? ", 13, 10, 13, 10 
db "Oh... wait... I'm the one who's not leaving. My bad. ", 13, 10, 0 

;; The following goes into the bss section. 
section .bss 


You should get 3 message boxes in a row, when you run the program. The output might look like this:
tut2_mb1_cc.PNG



Well, this is it for now.

For reference on just about any Win32 API function or structure, you can do a google search for (name is the name of the function or structure you're looking for) "name windows function" (without the quotes) or "name windows structure" (again, without the quotes), depending on whether name is a function or a structure.




First Tutorial:
Intro To Win32 Assembly, Using NASM

Previous Tutorial:
Intro To Win32 Assembly, Using NASM

Next Tutorial:
Simple Window



References:
Intel Architecture Software Developer's Manual volume 1: (Document Download Page)

Edited by RhetoricalRuvim, 20 August 2011 - 06:53 PM.

  • 0





Also tagged with one or more of these keywords: hello world, assembly

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download