In this tutorial, we'll take a look at some string algorithms and make functions that use those algorithms.
What Is An Algorithm?
An algorithm is a rule or set of rules specifying how to do something or solve a problem.
Making An Algorithm:
Here's what we need to do:
- Think of a task.
- Think about how you would do the task.
- Write pseudo-code.
- Write the assembly language function.
Think of a task
As with just about anything else, before you start something you have to think of what you're going to do.
Think about how you would do the task
Then you need to think how you would have done the task, if you were given the job.
Write pseudo-code
Generally, pseudo-code is code that does not really mean anything to any compiler, but is useful for writing what the computer will have to do.
This is where you go into more details about what, exactly, needs to be done.
The syntax of the pseudo-code depends on how you want to write it.
Write the assembly language function
After you have exactly what needs to be done, you can then start "compiling" your writing (pseudo-code) to assembly language code.
The Algorithms We'll Focus On
The algorithms we'll be working on here are string-handling algorithms.
We would make these four algorithms, but you can make more on your own:
- strlen - Getting the length of a string.
- strcpy - Copying a string.
- strcat - String concatenation; this is where you append one string to another.
- strcmp - Compare strings.
New Instructions
There are four new instructions that we'll use that we didn't go over yet. They're PUSHA, POPA, CBW, (CWD,) and CWDE.
PUSHA - Push All
Same as pushing the registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI, in that order. Does that in one instruction.
POPA - Pop All
Same as popping the registers EDI, ESI, and EBP, then adding 4 to ESP, then popping EBX, EDX, ECX, and EAX, all in that order.
CBW - Convert Byte To Word
Takes bit 7 of AL and sets all the bits in AH to that value.
CWD - Convert Word To Double-Word
Takes bit 15 of AX and sets all the bits in register DX to that value.
CWDE - Convert Word To Double-Word
Same as CWD, but changes the higher-order word of EAX, rather than changing DX.
A technique that we'll use
In assembly language, we usually use [ebp-4], etc., for local variables. But this time, we'll use [ebp-4] for our return value.
The address is [ebp-(the sum of the sizes of all the local variables)-4], if we use the PUSHA instruction right after ENTER. The sum of the sizes of all the local variables is usually equal to the number used for the first operand of the ENTER instruction. For example, if we have 10 bytes of local variables, we would use 'ENTER 10, 0' , and that number would be 10. We would then use [ebp-10-4], which is [ebp-14], to access the return value.
This works because we reserve 10 bytes on the stack with the ENTER instruction, so [ebp-10] is the first byte of our stack frame. Then we push the registers using the PUSHA instruction, which pushes EAX first, meaning that the double-word right below [ebp-10] is used for EAX. But since our return value is in EAX, and since the POPA instruction would use that double-word (the one below [ebp-10]) for restoring the EAX register, it would make sense that if we save the return value to [ebp-(number of bytes reserved on the stack)-4] then it will be copied to the EAX register upon our return.
strlen
We know that every string has a NULL (0) terminator byte at the end of it. That way, we can scan the string and stop at the NULL terminator. While we are scanning the string, we're also counting the number of characters; this does not include the NULL terminator.
The Pseudo-Code
strlen: integer c pointer b c= 0 b= address of the string loop 1: if byte [b] is 0 then go to loop 1 stop c= c + 1 b= b + 1 go to loop 1 loop 1 stop: return c
Converting The Pseudo-Code To Assembly Language
First of all, we don't have to use memory for variables. If we can, we could use registers; the program runs faster, that way.
Since EBX is the address/base register, let's use it for our 'b' variable. And since ECX is the count register, let's use it for our 'c' variable.
strlen: enter 0, 0 ;; The stack frame. pusha ;; We push EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI (in that order). mov eax, dword [ebp+8] ;; Get the address of the string; it's the first argument. mov ebx, eax ;; b= address of the string xor ecx, ecx ;; c= 0 .lp1: ;; local label loop 1 ;; if byte [b] is 0 then go to loop 1 stop cmp byte [ebx], 0 jz .lp1s inc ecx ;; c= c + 1 inc ebx ;; b= b + 1 jmp .lp1 ;; go to loop 1 .lp1s: ;; return c mov eax, ecx mov dword [ebp-4], eax popa leave ret 4
strcpy
Here, we want to copy each byte from the string at argument 2 to the corresponding byte of the string at argument 1, until after we copy the NULL terminator byte.
The Pseudo-Code
strcpy: character a pointer b pointer d b= argument 1 (destination) d= argument 2 (source) (That's how strcpy() takes arguments; strcpy(copy to, copy from)) loop 1: a= byte [d] byte [b]= a b= b + 1 d= d + 1 if a then go to loop 1 ('if a' means "if a is not 0" ; when a is 0 (the NULL terminator), the loop would end.) return
Converting Our Pseudo-Code To Assembly Language
strcpy: enter 0, 0 ;; We won't need any local variables. pusha ;; Save all the register values. mov eax, dword [ebp+08] mov ebx, eax ;; b= argument 1 mov eax, dword [ebp+12] mov edx, eax ;; d= argument 2 .lp1: mov al, byte [edx] ;; a= byte [d] (note: EDX (on 386 or later machines) can also be used for addressing) mov byte [ebx], al ;; byte [b]= a inc ebx ;; b= b + 1 inc edx ;; d= d + 1 cmp al, 0 ;; Subtract 0 from AL (don't store the result). jnz .lp1 ;; If the result is not supposed to be 0, go to loop 1. popa ;; Restore all the register values. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after return.
strcat
No, this algorithm does not have anything to do with cats

The same applies to strcpy().
The idea, now, is to add the length of string 1 to the address of string 1, and pass that as the first argument to the strcpy() function, while using string 2 as the second argument.
The Pseudo-Code
strcat: dword a a= argument1 + strlen argument1 strcpy a, argument2 return
The Assembly Language Routine
strcat: enter 0, 0 ;; No local memory variables. pusha ;; Save the values of all the registers. ;; a= argument1 + strlen argument1 = strlen argument1 + argument1, so... push dword [ebp+8] ;; strlen argument1 call strlen ;; ... + argument1 add eax, dword [ebp+8] ;; strcpy(a, argument2) push dword [ebp+12] ;; argument2 push eax ;; a call strcpy popa ;; Restore the values of all the registers. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after returning.
strcmp
Remember the CMP instruction? Well, the CMP instruction only works on small, static, values; it's time to make an algorithm that would do the same job but for bigger string values.
This function should return string 1 - string 2.
string 1 - string 2 - What?
string 1 - string 2 is 0 if the two strings are equal. Otherwise, it's the first unequal character of string 1 - the first unequal character of string 2. That means that if we're comparing "abcde" and "abcfe" then "abcde" - "abcfe" will be -2, because the first unequal character ('d') of string 1 - the first unequal character ('f') of string 2 is -2. 'd' is ASCII 100 and 'f' is ASCII 102, so that 100 - 102 is -2.
The Pseudo-Code
strcmp: character a integer c pointer b pointer d b= argument 1 d= argument 2 c= 0 loop 1: a= byte [b] c= a - byte [d] if not a then go to loop 1 stop if c then go to loop 1 stop b= b + 1 d= d + 1 go to loop 1 loop 1 stop: return c
The Assembly Language Code
strcmp: enter 0, 0 ;; We don't really need any memory local variables, right now. pusha ;; Save the values of all the general-purpose registers. ;; b= argument 1 mov eax, dword [ebp+08] mov ebx, eax ;; d= argument 2 mov eax, dword [ebp+12] mov edx, eax ;; We'll use EAX for both integer c and character a. xor eax, eax .lp1: mov al, byte [ebx] ;; Get byte [b] mov ah, byte [edx] ;; Get byte [d] sub al, ah ;; See if they're the same. jnz .lp1s ;; If not, the strings are not equal for sure. cmp ah, 0 ;; They're the same; but are they 0? ;; Note that we can't use AL, here, because it'll always be 0 if the processor ;; makes it to this instruction, because of the SUB instruction, above. jz .lp1s ;; If so, the strings ended already, and we need to return. inc ebx ;; b= b + 1 inc edx ;; d= d + 1 jmp .lp1 .lp1s: cbw ;; This instruction converts a byte value to a word value. ;; This helps in situations where AL is a signed integer, because ;; if you just clear AH then AX will be 256 - |the integer| for negative numbers. ;; 00000010 is binary for 2 ;; 11111110 is binary for -2 ;; 11111101 is binary for -3 ;; 11111100 is binary for -4 ;; ... and so on ... ;; But if you include AH to make AX, then it would be (if AH is 0): ;; 00000000 00000010 is binary for 2 ;; 00000000 11111110 is binary for 254 ;; 00000000 11111101 is binary for 253 ;; ... and so on ... ;; So what CBW does is it takes the left-most bit of AL and ;; it sets every bit in AH to that value. ;; 00000000 00000010 is binary for 2 ;; 11111111 11111110 is binary for -2 ;; 11111111 11111101 is binary for -3 ;; ... and so on ... cwde ;; Then we have to convert the word (in AX) to a double-word; same idea as before. mov dword [ebp-4], eax ;; return c popa ;; Restore the register values. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after return.
Example Program
Well, what would be the point of showing these functions if I don't show an example? So let's take a look at how to use the functions...
Example Program - The Plan
- Get the length of the string "hello" and make a message box that says "hello" that many times.
- Make a message box that displays string1 (a blank string of 512 bytes).
- Copy "one" to string1; make a message box with string1.
- Copy "two" to string1; make a message box with string1.
- Append "three" to string1; make a message box with string1.
- Compare string1 to "abcd"; if equal, make a message box that says "string1 is equal to \"abcd\""
- Copy "abc" to string1, then append "123" to string1; make a message box with string1.
- Compare string1 to "abc123" and if equal, make a message box that says "string1 is equal to \"abc123\""
- Exit, returning 0.
Example Program - The Code
;; Define the externs. extern MessageBoxA extern ExitProcess ;; Construct our symbol import table. import MessageBoxA user32.dll import ExitProcess kernel32.dll ;; The following is the code section for our program. Use 32-bit code. section .text use32 ;; ..start is where our program starts running. ..start: call main ;; Exit, returning whatever main() returned. push eax call [ExitProcess] main: enter 8, 0 ;; Set up the stack frame with 8 bytes reserved for local variables. ;; MessageBox() (or MessageBoxA(), or MessageBoxW()) takes 4 arguments: ;; hwnd - The handle to the window that owns the message box. We don't have a window, currently, so this should be 0. ;; msg - A pointer to the string to output as the text for the message box. ;; ttl - A pointer to the string that has the title for the message box. ;; flags - Some flags for the message box. For a plain message box with an OK button, we use 0. ;; Part 1 - Display a message box with "hello" (strlen "hello") times. ;; Get the length of the string "hello" push dword txt1 call strlen mov dword [ebp-8], eax ;; Save the number of "hello" 's, needed to be displayed, in dword [ebp-8]. ;; [ebp-4] will be our index variable for the loop for the "hello" message boxes. xor eax, eax mov dword [ebp-4], eax .lp1: mov eax, dword [ebp-4] ;; Get the current index value (the number of "hello" 's already displayed). cmp eax, dword [ebp-8] ;; Compare it to the number of "hello" 's needed to be displayed. jnl .lp1s ;; If already displayed that many "hello" 's, go to loop 1 stop. ;; Display a message box that says "hello" push dword 0 push dword the_title push dword txt2 push dword 0 call [MessageBoxA] inc dword [ebp-4] ;; Number of "hello" 's already displayed = number of "hello" 's already displayed + 1 jmp .lp1 ;; Go to .lp1, to continue the loop. .lp1s: ;; Part 2 - Make a message box that displays string1. push dword 0 push dword the_title push dword string1 push dword 0 call [MessageBoxA] ;; Part 3 - Copy "one" to string1 and display a message box with string1. push dword txt3 push dword string1 call strcpy push dword 0 push dword the_title push dword string1 push dword 0 call [MessageBoxA] ;; Part 4 - Copy "two" to string1 and display a message box with string1. push dword txt4 push dword string1 call strcpy push dword 0 push dword the_title push dword string1 push dword 0 call [MessageBoxA] ;; Part 5 - Append "three" to string1 and display a message box with string1. push dword txt5 push dword string1 call strcat push dword 0 push dword the_title push dword string1 push dword 0 call [MessageBoxA] ;; Part 6 - Compare string1 to "abcd" and if equal then display message box with "string1 is equal to \"abcd\"" push dword txt6 push dword string1 call strcmp cmp eax, 0 ;; Compare EAX to 0. jz .lbl1 ;; If it's equal to 0, go to .lbl1 jmp .lbl1f ;; Otherwise, go to .lbl1f (label 1 finish). .lbl1: push dword 0 push dword the_title push dword txt7 push dword 0 call [MessageBoxA] jmp .lbl1f .lbl1f: ;; Part 7 - Copy "abc" to string1, then append "123" to string1; display a message box with string1. push dword txt8 push dword string1 call strcpy push dword txt9 push dword string1 call strcat push dword 0 push dword the_title push dword string1 push dword 0 call [MessageBoxA] ;; Part 8 - Compare string1 to "abc123" and if equal then display a message box with "string1 is equal to \"abc123\"" push dword txt10 push dword string1 call strcmp cmp eax, 0 ;; If EAX is 0 (meaning string1 is equal to "abc123") jz .lbl2 ;; then go to .lbl2 jmp .lbl2f ;; Else, go to .lbl2f .lbl2: push dword 0 push dword the_title push dword txt11 push dword 0 call [MessageBoxA] jmp .lbl2f .lbl2f: ;; Part 9 - Exit, returning 0. xor eax, eax ;; Return 0. XOR a, a clears a. Another way of saying MOV a, 0 leave ;; Switch back to previous stack frame. ret ;; Return to caller. ;; Our data section. section .data the_title db "AlgorithmsIntro", 0 txt1 db "hello", 0 txt2 db "hello", 0 txt3 db "one", 0 txt4 db "two", 0 txt5 db "three", 0 txt6 db "abcd", 0 txt7 db "string1 is equal to ", 34, "abcd", 34, 0 ;; 34 is ASCII for double quote. txt8 db "abc", 0 txt9 db "123", 0 txt10 db "abc123", 0 txt11 db "string1 is equal to ", 34, "abc123", 34, 0 ;; The following is part of the bss section. section .bss ;; Reserve 512 bytes for string1. string1 resb 512 ;; Tell NASM that we want to put the next things into the code section. section .text strlen: enter 0, 0 ;; The stack frame. pusha ;; We push EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI (in that order). mov eax, dword [ebp+8] ;; Get the address of the string; it's the first argument. mov ebx, eax ;; b= address of the string xor ecx, ecx ;; c= 0 .lp1: ;; local label loop 1 ;; if byte [b] is 0 then go to loop 1 stop cmp byte [ebx], 0 jz .lp1s inc ecx ;; c= c + 1 inc ebx ;; b= b + 1 jmp .lp1 ;; go to loop 1 .lp1s: ;; return c mov eax, ecx mov dword [ebp-4], eax popa leave ret 4 strcpy: enter 0, 0 ;; We won't need any local variables. pusha ;; Save all the register values. mov eax, dword [ebp+08] mov ebx, eax ;; b= argument 1 mov eax, dword [ebp+12] mov edx, eax ;; d= argument 2 .lp1: mov al, byte [edx] ;; a= byte [d] (note: EDX (on 386 or later machines) can also be used for addressing) mov byte [ebx], al ;; byte [b]= a inc ebx ;; b= b + 1 inc edx ;; d= d + 1 cmp al, 0 ;; Subtract 0 from AL (don't store the result). jnz .lp1 ;; If the result is not supposed to be 0, go to loop 1. ;; ..... popa ;; Restore all the register values. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after return. strcat: enter 0, 0 ;; No local memory variables. pusha ;; Save the values of all the registers. ;; a= argument1 + strlen argument1 = strlen argument1 + argument1, so... push dword [ebp+8] ;; strlen argument1 call strlen ;; ... + argument1 add eax, dword [ebp+8] ;; strcpy(a, argument2) push dword [ebp+12] ;; argument2 push eax ;; a call strcpy popa ;; Restore the values of all the registers. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after returning. strcmp: enter 0, 0 ;; We don't really need any memory local variables, right now. pusha ;; Save the values of all the general-purpose registers. ;; b= argument 1 mov eax, dword [ebp+08] mov ebx, eax ;; d= argument 2 mov eax, dword [ebp+12] mov edx, eax ;; We'll use EAX for both integer c and character a. xor eax, eax .lp1: mov al, byte [ebx] ;; Get byte [b] mov ah, byte [edx] ;; Get byte [d] sub al, ah ;; See if they're the same. jnz .lp1s ;; If not, the strings are not equal for sure. cmp ah, 0 ;; They're the same; but are they 0? ;; Note that we can't use AL, here, because it'll always be 0 if the processor ;; makes it to this instruction, because of the SUB instruction, above. jz .lp1s ;; If so, the strings ended already, and we need to return. inc ebx ;; b= b + 1 inc edx ;; d= d + 1 jmp .lp1 .lp1s: cbw ;; This instruction converts a byte value to a word value. ;; This helps in situations where AL is a signed integer, because ;; if you just clear AH then AX will be 256 - |the integer| for negative numbers. ;; 00000010 is binary for 2 ;; 11111110 is binary for -2 ;; 11111101 is binary for -3 ;; 11111100 is binary for -4 ;; ... and so on ... ;; But if you include AH to make AX, then it would be (if AH is 0): ;; 00000000 00000010 is binary for 2 ;; 00000000 11111110 is binary for 254 ;; 00000000 11111101 is binary for 253 ;; ... and so on ... ;; So what CBW does is it takes the left-most bit of AL and ;; it sets every bit in AH to that value. ;; 00000000 00000010 is binary for 2 ;; 11111111 11111110 is binary for -2 ;; 11111111 11111101 is binary for -3 ;; ... and so on ... cwde ;; Then we have to convert the word (in AX) to a double-word; same idea as before. mov dword [ebp-4], eax ;; return c popa ;; Restore the register values. leave ;; Switch back to the previous stack frame. ret 8 ;; Free 8 bytes after return.
Example Program - The Output
You should get some message boxes in a row, that say the following (a message box message per line):
hello
hello
hello
hello
hello
one
two
twothree
abc123
string1 is equal to "abc123"
http://forum.codecal...tachmentid=4138
First Tutorial:
Intro To Win32 Assembly, Using NASM
Previous Tutorial:
Simple Text
Next Tutorial:
String-Integer Algorithms
Edited by RhetoricalRuvim, 21 August 2011 - 03:54 PM.