Last time I taught you how to make conditional jumps and create if-else and switch statements, as well as all kinds of loops. Today we're going to learn how to call functions and basic methods for passing arguments to these functions. Passing arguments to variadic functions and returning values from a function will be left for the next tutorial. This tutorial is going to assume you're going to integrate with other libraries or C/C++ code, so we're going to follow those standards. However, if you're doing your own project and it's all in assembly language, I'll also tell you which conventions you can violate and which you probably shouldn't.
Function Calls
There's a really simple way of calling functions: with the call mnemonic. call can be used to call hard-coded functions (i.e. the same function will always be called) or variable functions (the same code can call different functions):
So how would we do this in our code? Mmm...depends on what compiler you're using. Most will let you declare a function and then just use the function name for the call, like so:Code:/*declare a function pointer type FUNCPTR that points to a function taking a single void* as an argument and returning an int*/ typedef int (*FUNCPTR)(void *); FUNCPTR functions[4]; ... /*hard-coded function call*/ foo(); /*variable function call - a different function is called on every iteration of the loop*/ for(int i = 0; i < 4; ++i) functions[i](NULL);
To use a function pointer, you can do something like the following:Code:call whatever
I used to use Microsoft's ancient-as-hell debug.exe, which would only allow addresses. So I had to manually calculate the addresses of my functions--a real pain in the butt because if you're off by one byte you're screwed. So my function calls looked like this:Code:;assume EAX points to a function pointer in memory call [eax]
I'm going to use generic syntax that you can easily adapt for use in any assembler, whether it's NASM, MASM, YASM, debug.exe, or whatever.Code:; calling foo() w/ no args call 08F4
All functions have addresses. Otherwise the CPU wouldn't know where to get the instructions from. I'm going to assume you don't really have control over where you locate your functions. If you use something like debug.exe to compile your stuff (not recommended), you can easily figure it out for yourself. If not, just ask me and I'll be glad to help you.
Aha, you ask, but how does the CPU know where to resume execution? Well, it's simple. The call instruction does two things: push the return address onto the stack, then it jumps to the location you specify. Make a note of this; it'll be important later.
Anyway, I showed you above how to call a function that returns nothing and takes no arguments. This is all well and good, but what if we want to pass in arguments? What then? The stack comes to our rescue.
The Stack
Think of the stack as a bunch of sheets of paper. If you want to remember something, just scribble something on that sheet of paper and stick it on the top of the stack. You can't sift around for what you want, though--you can only "push" a sheet on top, or "pop" the top sheet off. That means that if you want to access the third sheet from the top of the stack, you need to pop off the first and second sheets. There's a way around this, but I'll show you later. There are three registers controlling the stack in Intel CPUs: SS, SP, and BP. (In 32-bit ASM, it's SS/ESP/EBP and 64-bit ASM it's SS/RSP/RBP.)
SS - The segment of memory that the stack is in. (I will discuss segments, protected mode, real mode, etc. in later tutorials.)
SP - Points to the top of the stack in memory.
BP - Points to the bottom of the stack in memory.
One would think that the stack would grow upwards in memory (i.e. from low to high addresses), but with Intel processors it's exactly the opposite. (See footnote for why.) This means that BP should always be less than SP. If SP reaches BP, then our stack is empty.
There are two main instructions for manipulating the stack: push and pop:
Code:;push takes one argument, the data ;you want to push onto the stack. push ebx ;is the same as mov [esp],ebx sub esp,4There are a few other instructions for manipulating the stack, but they're not really necessary/pertinent for this discussion. See footnote 2 for these extra instructions. So what can you do with push and pop?Code:;pop takes one argument, the register/memory ;location that you want to pop the top of the ;stack into. pop eax ;is the same as mov eax,[esp] add esp,4
You can't pop into a constant for obvious reasons. But why can't you push/pop bytes? This is to keep data aligned on even byte boundaries, to avoid issues with hardware that doesn't like reading multibyte data from odd byte boundaries. For 16-bit systems, always keep the stack aligned to a 2-byte boundary, 4 bytes for 32-bit systems, and 8 bytes for 64-bit systems. If you have to waste memory...oh well.Code:;push the contents of EAX onto the stack push eax ;push 16 bits from where EBX points to push WORD PTR [ebx] ;push a 32-bit value push DWORD 0x8000F185 ;ILLEGAL! push al ;pop into a memory location pop DWORD ES:[0xF858] ;pop into a register pop cx ;pop into a 32-bit register pop edx ;ILLEGAL! pop al ;ILLEGAL! pop WORD 0x1234
Passing Values
But what does this all have to do with functions? Well, as you've probably guessed, arguments are passed on the stack. Now because the stack grows downward, we usually push arguments on backwards so that the first argument ends up at the top of the stack (the lowest address). The reason for this will become apparent momentarily.
Let's assume that we have a function void foo(uint32_t a, uint16_t b). We could call foo like so:
Note that I stuck a 16-bit variable in a 32-bit register for passing it to a function. Again, we need to keep the stack aligned.Code:;assume eax=a, ebx=b push ebx push eax call foo
So how would foo access the arguments? Well, we could pop them off the stack...but we have a limited amount of registers. What if we have a function that takes, I dunno...ten arguments? What if it takes a variable number of arguments? We'd be screwed. There's a better way of doing this: Using a single register to point to the first argument, and just adding offsets to access subsequent arguments, sort of like an array. Most compilers/coders use bp, ebp or rbp for this for historical reasons. (See footnote 3 for why.) Continuing with our above example, here's how we would access the arguments:
Note that you need the prologue and return code so that you don't screw up the stack alignment or lose track of where your calling function's variables are at. Note what happens if our function calls a subfunction: because ESP points below our local variables, the arguments to the function don't overwrite our local variables.Code:;function entry point ;save EBP before we use it for our argument pointer push ebp ;EBP now points to the top of the stack. this is ;actually NOT the first argument, but the EBP followed ;by the return address of our function. assuming the ;return address is 32 bits (4 bytes) wide, our first ;argument is actually at [EBP+8], not [EBP]. mov ebp,esp ;allocate space for local variables sub esp,TOTAL_SIZE_OF_LOCAL_VARIABLES .... ;add 5 to A add DWORD PTR [ebp+8],5 ;subtract 10 from B sub WORD PTR [ebp+12],10 ;store B in a local variable. mov ebx, [ebp+12] mov [esp], ebx ;store A in a different local variable mov eax, [ebp+8] mov [esp+2], eax .... ;return code ;deallocate space for local variables add esp, TOTAL_SIZE_OF_LOCAL_VARIABLES ;restore EBP pop ebp ;THIS CODE IS ONLY HERE FOR NON-VARIADIC FUNCTIONS ONLY. ;Our function takes a fixed number of arguments--we ;always know how many bytes' worth of arguments are ;passed in. Either we must clean up the arguments off ;the stack, or the function that called us. Because ;we always know how many bytes were passed in, we might ;as well clean up. Otherwise every time our function ;is called, the caller would have to clean up--which is ;lots of code duplication. If we clean up here, the same ;code is only in one place, so we save space. ; ;We take one DWORD argument and one WORD argument, for a ;total of 8 bytes. Remember we have to keep the stack aligned, ; so all arguments must be multiples of 4 bytes. ret 8
Well, now you know how to call functions and pass in values! Next time I'll teach you how to deal with variadic functions (functions that take a variable number of arguments), and return values from your functions.
Next In This Series
Intro to Intel Assembly Language: Part 6
Footnote 1 - Why the Intel stack is upside-down
Way back in the day, a lot of programs used the tiny model for code layout, which dictated that everything must fit in one 64K segment - code, data, and stack. To minimize the chance of a stack overflow overwriting code and/or data, Intel engineers decided that the stack should start at the end of the segment and grow downward towards the code and data.
Footnote 2 - Extra instructions for manipulating the stack
PUSHA / POPA - Push/pop all 16-bit general registers (except for SP)
PUSHAD / POPAD - Push/pop all 32-bit general registers (except for ESP)
PUSHAQ / POPAQ - Push/pop all 64-bit general registers (except for RSP)
PUSHF / POPF - Push/pop flags register (16-bit)
PUSHFD / POPFD - Push/pop eflags register (32-bit)
PUSHFQ / POPFQ - Push/pop rflags register (64-bit)
Footnote 3 - Why EBP?
The way Intel instructions were originally encoded, one could only use register-offset addressing with bx and ebp. Since ebp automatically references the stack segment and bx automatically references the data segment, using bx would require a segment override every single time a function tried to access an argument. Clearly this would slow things down and bloat code, so...we use ebp.
Last edited by dargueta; 11-30-2010 at 12:47 PM. Reason: Made comments easier to read, fixed grammar
sudo rm -rf /
Very well done, Dargueta! +rep
Very nice job. +rep
Thanks!
EDIT: I fixed an offset error with the order of the arguments. The first argument is at EBP+8, not EBP+4 as I previously stated.
Last edited by dargueta; 11-14-2009 at 02:18 PM.
sudo rm -rf /
I enjoyed your tutorial!
more'm still a beginner, could you help me?
I program in delphi and wanted to call this address.
how do? : Tasm
asm
push eax
mov eax, 008D0C70h
mov byte ptr [eax], 1
pop eax
end;
I think this is it...
Code:procedure MyFunction; near; begin asm push eax mov eax, 008d0c70h mov BYTE PTR [eax], 1 pop eax ret end; end;
sudo rm -rf /
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks