Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Assembly, String Array Sorting and Displaying Algorithms (Win32, NASM)

innerHTML string array assembly

  • Please log in to reply
No replies to this topic

#1 RhetoricalRuvim

RhetoricalRuvim

    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1307 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 21 August 2011 - 04:11 PM

In the last tutorial, we worked on code that ordered an array of integers in ascending numerical order.

But let's say, if you go to school, your teacher gave you a big list of words that he/she told you to put in alphabetical order. The list is so big that it would take a fairly long time to do the task. Well why not just write a program that would do the ordering? After all, computers are good at doing the same (sometimes boring) thing over and over again and not getting bored.

In this tutorial, let's write a function that would organize an array of strings (string pointers) in alphabetical order.

Overview
  • Would Last Time's Algorithm Work This Time?
  • The Function We'll Use
  • Example Program





Would Last Time's Algorithm Work This Time?
The question is, can we re-use the last time's algorithm for this time? For a short answer, yes; but let's take a look at it in more depth.

Let's get back to last time's JavaScript.

The JavaScript - With Strings This Time
But this time, we want to use strings instead of numbers.

So here's the JavaScript:
<html> 
	<head> 
		<title> Integer Sort </title> 
	</head> 
	<body> 
		<h1> Integer Sort </h1> 
		<div id="txt"> 
			  
		</div> 
		<script type="text/javascript"> 
			function IntegerSort(pArray, sArray){ 
				var a; 
				var i; 
				var j; 
				for (i= 0; i < (sArray - 1); i++){ 
					for (j= 0; j < (sArray - 1); j++){ 
						if (pArray[j+0] > pArray[j+1]){ 
							a= pArray[j+0]; 
							pArray[j+0]= pArray[j+1]; 
							pArray[j+1]= a; 
						} 
					} 
				} 
			} 
		</script> 
		<script type="text/javascript"> 
			var txt= document.getElementById("txt"); 
			var some_array= ["ambitious", "alphabet", "beta", "testing", "OAKS", "Oregon", "options"]; 
			txt.innerHTML= some_array + " <BR> "; 
			IntegerSort(some_array, some_array.length); 
			txt.innerHTML += some_array + " <BR> "; 
		</script> 
	</body> 
</html>

And here's the output of the above HTML/JavaScript program:
http://forum.codecal...tachmentid=4179

It worked! Now let's try that using assembly language.

The Assembly Code - Also With Strings This Time
First of all, we'll need to change our alert_array() function.

We would change the name of the function to alert_string_array() and the new code is:
;; alert_string_array() - Display a message box with an array of strings. 
;; parameters: pArray, sArray 
alert_string_array: 
	;; We'll need enough space for a string; we'll use 2048 bytes. 
	;; We'll use 2 more bytes for our comma (,) string. 
	enter 2048, 0 
	push word 44                            ;; This will push ',\0' to the stack, which is a one-character 
	;; string with a comma (,). 
	push ebx 
	push esi 
	
	;; Initialize our string to empty (""). 
	mov byte [ebp-2048], 0 
	
	xor esi, esi                            ;; i= 0 
	.lp1: 
		mov eax, esi                        ;; i 
		cmp eax, dword [ebp+012]            ;; sArray 
		jnl .lp1s                           ;; while i < sArray 
		
		;; This part will be different from last time. 
		
		mov eax, dword [ebp+8]              ;; Get pArray 
		mov ebx, eax                        ;; Save that to the base/address register. 
		push dword [ebx+esi*4]              ;; Append *(pArray + (i*4)) = pArray[i] 
			lea eax, [ebp-2048] 
		push eax                            ;; to our local string. 
		call strcat 
		
		lea eax, [ebp-2048]                 ;; Our local string. 
		lea ebx, [ebp-2050]                 ;; The comma string. 
		push ebx 
		push eax 
		call strcat 
		
		inc esi                             ;; i++ 
		jmp .lp1 
	.lp1s: 
	
	lea ebx, [ebp-2048]                     ;; The string. 
	push ebx 
	call strlen 
	add ebx, eax                            ;; The pointer = the address of the NULL terminator of the string. 
	mov byte [ebx-1], 0                     ;; We take out everything after and including the last comma (,). 
	
	;; The last comma in the string comes right before the NULL terminator, so it would make sense 
	;; to take out the last character in the string (and everything after it). 
	
	;; Get the address of our string. 
	lea ebx, [ebp-2048] 
	push dword 0                            ;; Fourth parameter for MessageBoxA is MB_OK (0), for this. 
	call .over1                             ;; This pushes the address of the "next instruction" to the 
	;; stack and jumps to .over1 
		db "Integer Array", 0               ;; And this is the "next instruction" 
		;; We use this string for the title of our message box. 
	.over1: 
	push ebx                                ;; Then we push the address of the string to the stack. 
	push dword 0                            ;; And we don't own a window, so this parameter is NULL. 
	call [MessageBoxA] 
	
	pop esi 
	pop ebx 
	leave 
ret 8

So that this time we're just using a blank string, appending each string from the array to the blank string, and appending a comma after appending any string from the array to our local (used to be blank) string. Then we'll just have to remove the last comma from the string, just like last time.

Okay, let's try it:
;; Define the externs. 
extern MessageBoxA 
extern ExitProcess 

;; Construct our symbol import table. 
import MessageBoxA user32.dll 
import ExitProcess kernel32.dll 

;; This goes into the code section; use 32-bit code, from this point on. 
section .text use32 
;; Start running the program from here. 
..start: 

;; Call the main() function. 
call main 

;; Exit, returning whatever main() returned. 
push eax 
call [ExitProcess] 

main: 
	enter 0, 0 
	
	push dword 7                            ;; The size of the array is 14 DWORDs. 
	push dword the_array                    ;; The pointer to the array. 
	call alert_string_array                        ;; Display a message box with the array. 
	
	push dword 7                            ;; Size of array, in DWORDs. 
	push dword the_array                    ;; Address of array. 
	call IntegerSort                        ;; Sort the elements inside the array. 
	
	push dword 7 
	push dword the_array 
	call alert_string_array                        ;; Display a message box, again, with the newly-organized array. 
	
	;; Return 0. 
	xor eax, eax 
	leave 
ret 

;; IntegerSort() - Sort integers within an array. 
;;  parameters: 
;;  	A pointer to the array. 
;;  	The size of the array, in double-words. 
;;  Note that this function works with 32-bit (double-word) integers. 
IntegerSort: 
	enter 0, 0 
	;; Save the important registers. 
	push ebx 
	push esi 
	push edi 
	
	;; Set the base/address to the address of the array. 
	mov eax, dword [ebp+08] 
	mov ebx, eax 
	
	;; loop 1. 
	xor esi, esi                            ;; Clear i (for (i= 0)) 
	.lp1: 
		mov eax, dword [ebp+12]             ;; sArray 
		dec eax                             ;; - 1 
		cmp esi, eax 
		jnl .lp1s                           ;; while (i < (sArray - 1)) 
		
		;; loop 2. 
		xor edi, edi                        ;; Clear j (for (j= 0)) 
		.lp2: 
			mov eax, dword [ebp+12]         ;; sArray 
			dec eax                         ;; - 1 
			cmp edi, eax 
			jnl .lp2s                       ;; while (j < (sArray - 1)) 
			
			;; EBX is pArray 
			mov eax, dword [ebx+edi*4+0]    ;; pArray[j+0] 
			cmp eax, dword [ebx+edi*4+4]    ;; pArray[j+1]         // Each element in the array is 4 bytes; hence the 4's in the effective address. 
			jng .lp2over1                   ;; if pArray[j+0] !> pArray[j+1] then skip over this next part. 
				;; EAX is already equal to [ebx+edi*4+0] (which is pArray[j+0]) 
				xchg eax, dword [ebx+edi*4+4]         ;; Echange the value in EAX with the value at pArray[j+1]. 
				mov dword [ebx+edi*4+0], eax          ;; Save the value from pArray[j+1] to pArray[j+0]. 
			.lp2over1: 
			
			inc edi                         ;; on continue, j++ 
			jmp .lp2 
		.lp2s: 
		
		inc esi                             ;; on continue, i++ 
		jmp .lp1 
	.lp1s: 
	
	;; Restore the important registers. 
	pop edi 
	pop esi 
	pop ebx 
	leave 
ret 8 

;; alert_string_array() - Display a message box with an array of strings. 
;; parameters: pArray, sArray 
alert_string_array: 
	;; We'll need enough space for a string; we'll use 2048 bytes. 
	;; We'll use 2 more bytes for our comma (,) string. 
	enter 2048, 0 
	push word 44                            ;; This will push ',\0' to the stack, which is a one-character 
	;; string with a comma (,). 
	push ebx 
	push esi 
	
	;; Initialize our string to empty (""). 
	mov byte [ebp-2048], 0 
	
	xor esi, esi                            ;; i= 0 
	.lp1: 
		mov eax, esi                        ;; i 
		cmp eax, dword [ebp+012]            ;; sArray 
		jnl .lp1s                           ;; while i < sArray 
		
		;; This part will be different from last time. 
		
		mov eax, dword [ebp+8]              ;; Get pArray 
		mov ebx, eax                        ;; Save that to the base/address register. 
		push dword [ebx+esi*4]              ;; Append *(pArray + (i*4)) = pArray[i] 
			lea eax, [ebp-2048] 
		push eax                            ;; to our local string. 
		call strcat 
		
		lea eax, [ebp-2048]                 ;; Our local string. 
		lea ebx, [ebp-2050]                 ;; The comma string. 
		push ebx 
		push eax 
		call strcat 
		
		inc esi                             ;; i++ 
		jmp .lp1 
	.lp1s: 
	
	lea ebx, [ebp-2048]                     ;; The string. 
	push ebx 
	call strlen 
	add ebx, eax                            ;; The pointer = the address of the NULL terminator of the string. 
	mov byte [ebx-1], 0                     ;; We take out everything after and including the last comma (,). 
	
	;; The last comma in the string comes right before the NULL terminator, so it would make sense 
	;; to take out the last character in the string (and everything after it). 
	
	;; Get the address of our string. 
	lea ebx, [ebp-2048] 
	push dword 0                            ;; Fourth parameter for MessageBoxA is MB_OK (0), for this. 
	call .over1                             ;; This pushes the address of the "next instruction" to the 
	;; stack and jumps to .over1 
		db "Integer Array", 0               ;; And this is the "next instruction" 
		;; We use this string for the title of our message box. 
	.over1: 
	push ebx                                ;; Then we push the address of the string to the stack. 
	push dword 0                            ;; And we don't own a window, so this parameter is NULL. 
	call [MessageBoxA] 
	
	pop esi 
	pop ebx 
	leave 
ret 8 

%include "../inc/str.asm"                   ;; Include the file with the string functions. 

;; The following goes into the data section. 
section .data 
;; This is the array of pointers to the strings: 
the_array: 
dd s1 
dd s2 
dd s3 
dd s4 
dd s5 
dd s6 
dd s7 
dd 0                                        ;; The NULL terminator of the array. 

;; Arrays of integers can have entries with the value 0; arrays of string pointers 
;; can't. Though the strings themselves CAN be empty. 

;; Now we have to define our strings. 
s1 db "ambitious", 0 
s2 db "alphabet", 0 
s3 db "beta", 0 
s4 db "testing", 0 
s5 db "OAKS", 0 
s6 db "Oregon", 0 
s7 db "options", 0 


And the output:
http://forum.codecal...tachmentid=4180

Uh, what's wrong? The JavaScript worked; what's with the assembly code?

There's nothing wrong with the algorithm. The assembly code above sorts the elements in the array according to their memory addresses, but we were expecting it to sort things using the strings' contents.

Our algorithm uses high-level rules to say what needs to be done. The part that needs attention is the low-level assembly code. Low-level code requires more house-keeping. That means for lower-level code we have to do things in a more specific manner.

We won't have to change much in our code, but we'll have to use strcmp() (the string function), this time, instead of just CMP (the instruction); so let's get to it.





The Function We'll Use

Changes From Integer Sort
You know the way we sorted integers? We used something like this:
CMP i1, i2 where i1 is pArray[i] and i2 is pArray[i+1].

But this time, we'll have to change that part of the code a little bit:
CMP strcmp(i1, i2), 0

Not too different, don't you think? Well, we'll have to make it lower level, and obviously that syntax for assembly is not allowed, but same idea.
push dword [ebx+esi*4+4]    ;; pArray[i+1] 
push dword [ebx+esi*4]    ;; pArray[i] 
call strcmp 
cmp eax, 0

In the integer comparison function, we used this:
mov eax, dword [ebx+esi*4]     ;; pArray[i] 
cmp eax, dword [ebx+esi*4+4]        ;; pArray[i+1]

Not very different; after all, this is following the same high-level algorithm.

Let's look at the full code of StringSort(), now.

StringSort() - The Code
Here's the full code - the changes are outlined in orange:
[COLOR=#FF8C00];; StringSort() - Sort strings in an array. [/COLOR]
;;  parameters: 
;;  	A pointer to the array. 
;;  	The size of the array, in double-words. 
[COLOR=#FF8C00]StringSort: [/COLOR]
	enter 0, 0 
	;; Save the important registers. 
	push ebx 
	push esi 
	push edi 
	
	;; Set the base/address to the address of the array. 
	mov eax, dword [ebp+08] 
	mov ebx, eax 
	
	;; loop 1. 
	xor esi, esi                            ;; Clear i (for (i= 0)) 
	.lp1: 
		mov eax, dword [ebp+12]             ;; sArray 
		dec eax                             ;; - 1 
		cmp esi, eax 
		jnl .lp1s                           ;; while (i < (sArray - 1)) 
		
		;; loop 2. 
		xor edi, edi                        ;; Clear j (for (j= 0)) 
		.lp2: 
			mov eax, dword [ebp+12]         ;; sArray 
			dec eax                         ;; - 1 
			cmp edi, eax 
			jnl .lp2s                       ;; while (j < (sArray - 1)) 
			
			;; EBX is pArray 
			[COLOR=#FF8C00]push dword [ebx+edi*4+4]        ;; pArray[j+1] 
			push dword [ebx+edi*4+0]        ;; pArray[j+0] 
			call strcmp 
			cmp eax, 0 [/COLOR]
			jng .lp2over1                   ;; if pArray[j+0] !> pArray[j+1] then skip over this next part. 
				mov eax, dword [ebx+edi*4+0]          ;; pArray[j] 
				xchg eax, dword [ebx+edi*4+4]         ;; Echange the value in EAX with the value at pArray[j+1]. 
				mov dword [ebx+edi*4+0], eax          ;; Save the value from pArray[j+1] to pArray[j+0]. 
			.lp2over1: 
			
			inc edi                         ;; on continue, j++ 
			jmp .lp2 
		.lp2s: 
		
		inc esi                             ;; on continue, i++ 
		jmp .lp1 
	.lp1s: 
	
	;; Restore the important registers. 
	pop edi 
	pop esi 
	pop ebx 
	leave 
ret 8

Okay, now that we have this code ready, let's get to the example program.






Example Program
Our example program is supposed to display a message box with an array of (not alphabetically ordered) strings, order the strings in ascending alphabetical order, and display a message box with that same array again.

Example Program - The Code
;; Define the externs. 
extern MessageBoxA 
extern ExitProcess 

;; Construct our symbol import table. 
import MessageBoxA user32.dll 
import ExitProcess kernel32.dll 

;; This goes into the code section; use 32-bit code, from this point on. 
section .text use32 
;; Start running the program from here. 
..start: 

;; Call the main() function. 
call main 

;; Exit, returning whatever main() returned. 
push eax 
call [ExitProcess] 

main: 
	enter 0, 0 
	
	push dword 7                            ;; The size of the array is 14 DWORDs. 
	push dword the_array                    ;; The pointer to the array. 
	call alert_string_array                        ;; Display a message box with the array. 
	
	push dword 7                            ;; Size of array, in DWORDs. 
	push dword the_array                    ;; Address of array. 
	call StringSort                         ;; Sort the elements inside the array. 
	
	push dword 7 
	push dword the_array 
	call alert_string_array                        ;; Display a message box, again, with the newly-organized array. 
	
	;; Return 0. 
	xor eax, eax 
	leave 
ret 

;; StringSort() - Sort strings in an array. 
;;  parameters: 
;;  	A pointer to the array. 
;;  	The size of the array, in double-words. 
StringSort: 
	enter 0, 0 
	;; Save the important registers. 
	push ebx 
	push esi 
	push edi 
	
	;; Set the base/address to the address of the array. 
	mov eax, dword [ebp+08] 
	mov ebx, eax 
	
	;; loop 1. 
	xor esi, esi                            ;; Clear i (for (i= 0)) 
	.lp1: 
		mov eax, dword [ebp+12]             ;; sArray 
		dec eax                             ;; - 1 
		cmp esi, eax 
		jnl .lp1s                           ;; while (i < (sArray - 1)) 
		
		;; loop 2. 
		xor edi, edi                        ;; Clear j (for (j= 0)) 
		.lp2: 
			mov eax, dword [ebp+12]         ;; sArray 
			dec eax                         ;; - 1 
			cmp edi, eax 
			jnl .lp2s                       ;; while (j < (sArray - 1)) 
			
			;; EBX is pArray 
			push dword [ebx+edi*4+4]        ;; pArray[j+1] 
			push dword [ebx+edi*4+0]        ;; pArray[j+0] 
			call strcmp 
			cmp eax, 0 
			jng .lp2over1                   ;; if pArray[j+0] !> pArray[j+1] then skip over this next part. 
				mov eax, dword [ebx+edi*4+0]          ;; pArray[j] 
				xchg eax, dword [ebx+edi*4+4]         ;; Echange the value in EAX with the value at pArray[j+1]. 
				mov dword [ebx+edi*4+0], eax          ;; Save the value from pArray[j+1] to pArray[j+0]. 
			.lp2over1: 
			
			inc edi                         ;; on continue, j++ 
			jmp .lp2 
		.lp2s: 
		
		inc esi                             ;; on continue, i++ 
		jmp .lp1 
	.lp1s: 
	
	;; Restore the important registers. 
	pop edi 
	pop esi 
	pop ebx 
	leave 
ret 8 

;; alert_string_array() - Display a message box with an array of strings. 
;; parameters: pArray, sArray 
alert_string_array: 
	;; We'll need enough space for a string; we'll use 2048 bytes. 
	;; We'll use 2 more bytes for our comma (,) string. 
	enter 2048, 0 
	push word 44                            ;; This will push ',\0' to the stack, which is a one-character 
	;; string with a comma (,). 
	push ebx 
	push esi 
	
	;; Initialize our string to empty (""). 
	mov byte [ebp-2048], 0 
	
	xor esi, esi                            ;; i= 0 
	.lp1: 
		mov eax, esi                        ;; i 
		cmp eax, dword [ebp+012]            ;; sArray 
		jnl .lp1s                           ;; while i < sArray 
		
		;; This part will be different from last time. 
		
		mov eax, dword [ebp+8]              ;; Get pArray 
		mov ebx, eax                        ;; Save that to the base/address register. 
		push dword [ebx+esi*4]              ;; Append *(pArray + (i*4)) = pArray[i] 
			lea eax, [ebp-2048] 
		push eax                            ;; to our local string. 
		call strcat 
		
		lea eax, [ebp-2048]                 ;; Our local string. 
		lea ebx, [ebp-2050]                 ;; The comma string. 
		push ebx 
		push eax 
		call strcat 
		
		inc esi                             ;; i++ 
		jmp .lp1 
	.lp1s: 
	
	lea ebx, [ebp-2048]                     ;; The string. 
	push ebx 
	call strlen 
	add ebx, eax                            ;; The pointer = the address of the NULL terminator of the string. 
	mov byte [ebx-1], 0                     ;; We take out everything after and including the last comma (,). 
	
	;; The last comma in the string comes right before the NULL terminator, so it would make sense 
	;; to take out the last character in the string (and everything after it). 
	
	;; Get the address of our string. 
	lea ebx, [ebp-2048] 
	push dword 0                            ;; Fourth parameter for MessageBoxA is MB_OK (0), for this. 
	call .over1                             ;; This pushes the address of the "next instruction" to the 
	;; stack and jumps to .over1 
		db "Integer Array", 0               ;; And this is the "next instruction" 
		;; We use this string for the title of our message box. 
	.over1: 
	push ebx                                ;; Then we push the address of the string to the stack. 
	push dword 0                            ;; And we don't own a window, so this parameter is NULL. 
	call [MessageBoxA] 
	
	pop esi 
	pop ebx 
	leave 
ret 8 

%include "../inc/str.asm"                   ;; Include the file with the string functions. 

;; The following goes into the data section. 
section .data 
;; This is the array of pointers to the strings: 
the_array: 
dd s1 
dd s2 
dd s3 
dd s4 
dd s5 
dd s6 
dd s7 
dd 0                                        ;; The NULL terminator of the array. 

;; Arrays of integers can have entries with the value 0; arrays of string pointers 
;; can't. Though the strings themselves CAN be empty. 

;; Now we have to define our strings. 
s1 db "ambitious", 0 
s2 db "alphabet", 0 
s3 db "beta", 0 
s4 db "testing", 0 
s5 db "OAKS", 0 
s6 db "Oregon", 0 
s7 db "options", 0 


The array, this time, instead of having any type of integers, has (RVA) addresses of strings. Unlike last time, when the integers could contain any values, we can only have valid addresses; a 0 would mean a NULL pointer, which can - in some cases - tell us that it's the end of the array, already (we still use the sArray parameter, though, for consistency with the last tutorial).

Example Program - The Output
There should be two message boxes in a row, just like last time. The first message box has the array of the strings in their original order; the second message box has the strings in alphabetical, ascending, order.

Here's what the second message box might look like:
http://forum.codecal...tachmentid=4181










First Tutorial:
Intro To Win32 Assembly, Using NASM

Previous Tutorial:
Integer Array Sorting and Displaying Algorithms

Attached Thumbnails

  • StringSort_output_cc.PNG
  • IntegerSort_js_output_cc.PNG
  • StringSort_1st_output_cc.PNG

  • 0





Also tagged with one or more of these keywords: innerHTML, string, array, assembly