Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Assembly, Handling Bugs In Your Programs (Win32, NASM)

assembly

  • Please log in to reply
4 replies to this topic

#1 RhetoricalRuvim

RhetoricalRuvim

    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1307 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 19 August 2011 - 07:50 PM

A lot of the times (or even almost every time) you write an intermediate or advanced program or routine, it'll have bugs. No, not actual bugs with six legs, but bugs as in errors. That even happens to more experienced programmers; but the point is not to make programs without errors - as that is nearly impossible - the point is to be able to fix those errors.

Have you seen the example program in the last tutorial? Nice program, huh? Well I didn't just sit down at my computer and magically write it. When I first wrote the code, it had errors. The program failed (a Windows "MemoryAlphabet.exe has stopped working" message dialog appeared), with an exception code of C0000005.

Oh No! It's A Bug

http://forum.codecal...tachmentid=4154

The exception offset was at 1172 (hex).

Here's the code for the program in the above screenshot:
;; Define the externs. 
extern MessageBoxA 
extern ExitProcess 
extern GlobalAlloc 
extern GlobalFree 

;; Construct our symbol import table. 
import MessageBoxA user32.dll 
import ExitProcess kernel32.dll 
import GlobalAlloc kernel32.dll 
import GlobalFree kernel32.dll 

;; This is the code section; use 32-bit code. 
section .text use32 
;; Start execution here. 
..start: 

;; Call the main() function. 
call main 

;; Exit, returning whatever main() retured. 
push eax 
call [ExitProcess] 

main: 
	enter 4, 0 
	
	push dword 0                      ;; Don't put spaces between every letter. 
	push dword 122                    ;; Stop after 'z'. 
	push dword 65                     ;; Start at 'A'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push dword [ebp-4] 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the buffer for the string. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	push dword 0                      ;; Don't put spaces between every letter. 
	push dword 80                     ;; Stop after 'P'. 
	push dword 70                     ;; Start at 'F'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push eax                          ;; Since EAX is already equal to [ebp-4], why not just use EAX? 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the buffer. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	push dword 1                      ;; Put spaces between every letter. 
	push dword 122                    ;; Stop at 'z'. 
	push dword 97                     ;; Start at 'a'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push dword [ebp-4] 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the string buffer. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	;; Return 0. 
	xor eax, eax 
	leave 
ret 

;; abc() - returns a string with the alphabet. 
;; parameters: 
;;  	the letter to start from 
;;  	the letter to stop after 
;;  	whether to seperate each letter from another letter with a space 
;; return value: 
;;  	the pointer to the new string with the letters 
abc: 
	enter 4, 0 
	push dword 0 
	push ebx                          ;; Save EBX. 
	
	;; First of all, we need to run a scan of the letters, without saving the values. 
	xor ebx, ebx                      ;; This should tell the loop not to save the values. 
	call .the_loop                    ;; Call the letter scan loop. 
	
	;; The total size of the new string should be equal to the value returned 
	;; by .the_loop in ECX. 
	;; We need to save that number. 
	mov eax, ecx 
	mov dword [ebp-8], eax 
	
	;; Now we need to ask Windows to allocate some memory for us. 
	push dword [ebp-8]                ;; Pass the size number to the function. 
	inc dword [esp]                   ;; Increment the value of that number, before we call the function. 
	                                  ;; This is so that there's an extra byte, in the buffer, for a NULL terminator. 
	push dword 0                      ;; No flags, for now. 
	call [GlobalAlloc]                ;; Call the Windows global memory allocation API function. 
	mov dword [ebp-4], eax            ;; Save the pointer that GlobalAlloc() returned. 
	
	mov ebx, eax                      ;; Also use that pointer for the .the_loop function. 
	call .the_loop                    ;; Now we run the loop and save everything to the new string. 
	
	mov eax, dword [ebp-4]            ;; We would return the pointer to the new string. 
	
	jmp .finish                       ;; We'll need space to define our nested function, 
	;; so we'll have to jump over the nested function 
	;; to the .finish label. 
	
	.the_loop: 
		;; EBX is the pointer to the buffer for the new string. 
		
		push edi                      ;; Save EDI. 
		
		xor ecx, ecx                  ;; We're supposed to be counting how many characters the new string would have. 
		;; We start counting from 0, for now. 
		
		mov eax, dword [ebp+12]       ;; Get the letter to stop at. 
		cmp eax, 122 
		jng .the_loop_over1 
			;; If the letter to stop after is greater than 'z', set it to 'z'. 
			mov eax, 122 
		.the_loop_over1: 
		cmp eax, 65 
		jnl .the_loop_over2 
			;; If the letter to stop after is less than 'A', set it to 'A'. 
			mov eax, 65 
		.the_loop_over2: 
		mov edi, eax                  ;; Save that letter in EDI. 
		
		mov eax, dword [ebp+08]       ;; Get the letter to start from. 
		mov edx, eax 
		
		.the_loop1: 
			;; Check if it's time to stop the loop yet. 
			cmp edx, edi              ;; Compare the current letter to the letter to stop after. 
			jg .the_loop1s            ;; If the current letter is greater, break the loop. 
			
			cmp edx, 65 
			jnl .the_loop1over1 
				;; If the current letter is less than 'A', set it to 'A'. 
				mov edx, 65 
			.the_loop1over1: 
			
			cmp edx, 122 
			jng .the_loop1over2 
				;; If the current letter is greater than 'z', reset it to 'A'. 
				mov edx, 65 
			.the_loop1over2: 
			
			cmp ebx, 0 
			jz .the_loop1over3 
				;; If EBX is not a NULL pointer (meaning if we're supposed to save the output), 
				;; do the following: 
				
				;; Save the current character. 
				mov byte [ebx], al 
				
				;; Increment the pointer. 
				inc ebx 
				
				cmp dword [ebp+16], 0 ;; Check if the third parameter is FALSE. 
				jz .the_loop1over3    ;; If so, skip over the space-adding part of the code. 
				
				;; Otherwise, save a space to where the pointer is pointing. 
				mov byte [ebx], 32 
				
				;; And increment the pointer. 
				inc ebx 
			.the_loop1over3: 
			
			cmp dword [ebp+16], 0     ;; Check if the third parameter is FALSE. 
			jz .the_loop1over4 
			
			;; If it's TRUE, increment the count an extra time. 
			inc ecx 
			
			.the_loop1over4: 
			
			;; In any case, we'll still need to increment the count. 
			inc ecx 
			
			;; Increment the current letter. 
			inc edx 
			
			;; Continue the loop. 
			jmp .the_loop1 
		.the_loop1s: 
		
		;; If put spaces between the letters. 
		cmp dword [ebp+16], 0 
		jz .the_loop_over3 
			;; Since the last letter doesn't need a following space, we'll decrement the character count. 
			dec ecx 
			
			;; But we still put a space for the last character, so we'll have to put a NULL to that position. 
			mov byte [ebx-1], 0 
			
			;; .the_loop_over3 is for not-putting-spaces code. 
			jmp .the_loop_over4 
		.the_loop_over3: 
			;; Save a NULL character at [pointer]. 
			mov byte [ebx], 0 
		.the_loop_over4: 
		
		pop edi                       ;; Restore EDI. 
	ret 0 
	
	.finish: 
	
	pop ebx                           ;; Restore EBX. 
	leave 
ret 12 

;; The data section. 
section .data 
the_title                                             db "Memory Alphabet Example", 0 

;; We don't have to define every section in our source code; NASM would do the defining even if we don't.

I needed to know exactly where the error happens, so I inserted a call to MessageBoxA() right after this line:
;; First of all, we need to run a scan of the letters, without saving the values. 
	xor ebx, ebx                      ;; This should tell the loop not to save the values. 
	call .the_loop                    ;; Call the letter scan loop.

I assembled and linked the program again, and the message box didn't appear, meaning that the error must have happened before the call to MessageBoxA().

Locating The Bug
As you have seen, the exception offset is at 0x1172. But where does the address start from?

To find that out, I inserted this line, right after the ..start label:
mov cs, ax

That type of code (accessing CS like that) is not allowed and will raise an exception.

http://forum.codecal...tachmentid=4155

The exception offset this time is 0x1000; because we know that the instruction that raised the exception is at ..start, we also know that the RVA of the ..start label is 0x1000. So I subtracted 0x1000 from 0x1172, and got 0x172; that's the offset, from the ..start label, of the instruction that raised the exception (the previous exception, not the 'mov cs, ax' one).

But how do I figure out where, in the .exe file, that instruction is located? Well, I could go look into the PE headers of the file, but I didn't do that. I replaced the 'mov cs, ax' instruction with something like this 'db "the code start"'. After assembling and linking the program, I opened the .exe file with Notepad++.

Then I pressed the Ctrl+F key combination, to find text, and typed "the code start" (without the quotes) and pressed enter to search for that text.

http://forum.codecal...tachmentid=4156

Okay, so there's the "the code start" text, but what now?

Then I closed the 'Find' window and pressed the Ctrl+G key combination. I then selected the "Offset" radio button option.

http://forum.codecal...tachmentid=4157

So the file offset of ..start is 1024 (or 0x400).

You can close the 'Go To...' window, now, by the way.

Then I added 0x400 to 0x172 and got 0x572, which is, in decimal, 1394.

Fixing The Bug (/s)

And then I removed the 'db "the code start"' line and assembled and linked the program, again. I went to the .exe file using Notepad++ and used the Ctrl+G keyboard shortcut to go to offset 1394.
I saw that the 5 bytes that come right before that instruction are all part of one instruction, so I deleted everything before those 5 bytes (everything before the previous instruction). Then I also deleted everything after some bytes ahead of that offset.

I saved the file and used the NASM disassembler (comes with the assembler package).

http://forum.codecal...tachmentid=4158

Since I also included the previous instruction in cropping the .exe file, it's the second instruction in the disassembled code that raises the exception. That disassembled code looked very familiar to me, and I recognized where it was from. Looking inside the .asm file, I noticed what the problem was: I forgot to check whether EBX is 0 or not.

I fixed that and another problem came up: instead of saying "FGHIJKLMNOP" it said "FFFFFFFFFFF"

Then, after looking at the code some more, I figured out what that problem was, too. That problem was that I forgot to copy the current letter to EAX from EDX, thinking that EAX is already the current letter; that was half-way true: EAX was equal to the first letter in the sequence, and not the current letter, while EDX was the one that had the current letter. All I had to do to fix that was add another 'mov eax, edx' instruction.

The New, Correct Code

Now here's what the code looked like after I made the above changes (changes outlined in orange):
;; Define the externs. 
extern MessageBoxA 
extern ExitProcess 
extern GlobalAlloc 
extern GlobalFree 

;; Construct our symbol import table. 
import MessageBoxA user32.dll 
import ExitProcess kernel32.dll 
import GlobalAlloc kernel32.dll 
import GlobalFree kernel32.dll 

;; This is the code section; use 32-bit code. 
section .text use32 
;; Start execution here. 
..start: 

;; Call the main() function. 
call main 

;; Exit, returning whatever main() retured. 
push eax 
call [ExitProcess] 

main: 
	enter 4, 0 
	
	push dword 0                      ;; Don't put spaces between every letter. 
	push dword 122                    ;; Stop after 'z'. 
	push dword 65                     ;; Start at 'A'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push dword [ebp-4] 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the buffer for the string. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	push dword 0                      ;; Don't put spaces between every letter. 
	push dword 80                     ;; Stop after 'P'. 
	push dword 70                     ;; Start at 'F'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push eax                          ;; Since EAX is already equal to [ebp-4], why not just use EAX? 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the buffer. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	push dword 1                      ;; Put spaces between every letter. 
	push dword 122                    ;; Stop at 'z'. 
	push dword 97                     ;; Start at 'a'. 
	call abc 
	mov dword [ebp-4], eax            ;; Save the pointer to the string. 
	
	;; Display a message box with the new string. 
	push dword 0 
	push dword the_title 
	push dword [ebp-4] 
	push dword 0 
	call [MessageBoxA] 
	
	;; Free the string buffer. 
	push dword [ebp-4] 
	call [GlobalFree] 
	
	;; Return 0. 
	xor eax, eax 
	leave 
ret 

;; abc() - returns a string with the alphabet. 
;; parameters: 
;;  	the letter to start from 
;;  	the letter to stop after 
;;  	whether to seperate each letter from another letter with a space 
;; return value: 
;;  	the pointer to the new string with the letters 
abc: 
	enter 4, 0 
	push dword 0 
	push ebx                          ;; Save EBX. 
	
	;; First of all, we need to run a scan of the letters, without saving the values. 
	xor ebx, ebx                      ;; This should tell the loop not to save the values. 
	call .the_loop                    ;; Call the letter scan loop. 
	
	;; The total size of the new string should be equal to the value returned 
	;; by .the_loop in ECX. 
	;; We need to save that number. 
	mov eax, ecx 
	mov dword [ebp-8], eax 
	
	;; Now we need to ask Windows to allocate some memory for us. 
	push dword [ebp-8]                ;; Pass the size number to the function. 
	inc dword [esp]                   ;; Increment the value of that number, before we call the function. 
	                                  ;; This is so that there's an extra byte, in the buffer, for a NULL terminator. 
	push dword 0                      ;; No flags, for now. 
	call [GlobalAlloc]                ;; Call the Windows global memory allocation API function. 
	mov dword [ebp-4], eax            ;; Save the pointer that GlobalAlloc() returned. 
	
	mov ebx, eax                      ;; Also use that pointer for the .the_loop function. 
	call .the_loop                    ;; Now we run the loop and save everything to the new string. 
	
	mov eax, dword [ebp-4]            ;; We would return the pointer to the new string. 
	
	jmp .finish                       ;; We'll need space to define our nested function, 
	;; so we'll have to jump over the nested function 
	;; to the .finish label. 
	
	.the_loop: 
		;; EBX is the pointer to the buffer for the new string. 
		
		push edi                      ;; Save EDI. 
		
		xor ecx, ecx                  ;; We're supposed to be counting how many characters the new string would have. 
		;; We start counting from 0, for now. 
		
		mov eax, dword [ebp+12]       ;; Get the letter to stop at. 
		cmp eax, 122 
		jng .the_loop_over1 
			;; If the letter to stop after is greater than 'z', set it to 'z'. 
			mov eax, 122 
		.the_loop_over1: 
		cmp eax, 65 
		jnl .the_loop_over2 
			;; If the letter to stop after is less than 'A', set it to 'A'. 
			mov eax, 65 
		.the_loop_over2: 
		mov edi, eax                  ;; Save that letter in EDI. 
		
		mov eax, dword [ebp+08]       ;; Get the letter to start from. 
		mov edx, eax 
		
		.the_loop1: 
			;; Check if it's time to stop the loop yet. 
			cmp edx, edi              ;; Compare the current letter to the letter to stop after. 
			jg .the_loop1s            ;; If the current letter is greater, break the loop. 
			
			cmp edx, 65 
			jnl .the_loop1over1 
				;; If the current letter is less than 'A', set it to 'A'. 
				mov edx, 65 
			.the_loop1over1: 
			
			cmp edx, 122 
			jng .the_loop1over2 
				;; If the current letter is greater than 'z', reset it to 'A'. 
				mov edx, 65 
			.the_loop1over2: 
			
			cmp ebx, 0 
			jz .the_loop1over3 
				;; If EBX is not a NULL pointer (meaning if we're supposed to save the output), 
				;; do the following: 
				
				;[COLOR=#FF8C00]; Get the current character. 
				mov eax, edx        ;[/COLOR]
				
				;; Save the current character. 
				mov byte [ebx], al 
				
				;; Increment the pointer. 
				inc ebx 
				
				cmp dword [ebp+16], 0 ;; Check if the third parameter is FALSE. 
				jz .the_loop1over3    ;; If so, skip over the space-adding part of the code. 
				
				;; Otherwise, save a space to where the pointer is pointing. 
				mov byte [ebx], 32 
				
				;; And increment the pointer. 
				inc ebx 
			.the_loop1over3: 
			
			cmp dword [ebp+16], 0     ;; Check if the third parameter is FALSE. 
			jz .the_loop1over4 
			
			;; If it's TRUE, increment the count an extra time. 
			inc ecx 
			
			.the_loop1over4: 
			
			;; In any case, we'll still need to increment the count. 
			inc ecx 
			
			;; Increment the current letter. 
			inc edx 
			
			;; Continue the loop. 
			jmp .the_loop1 
		.the_loop1s: 
		
		;; If put spaces between the letters. 
		cmp dword [ebp+16], 0 
		jz .the_loop_over3 
			;; Since the last letter doesn't need a following space, we'll decrement the character count. 
			dec ecx 
			
			;[COLOR=#FF8C00]; If EBX is a NULL pointer, skip over this part of the code. 
			cmp ebx, 0 
			jz .the_loop_over4        ;[/COLOR]
			
			;; But we still put a space for the last character, so we'll have to put a NULL to that position. 
			mov byte [ebx-1], 0 
			
			;; .the_loop_over3 is for not-putting-spaces code. 
			jmp .the_loop_over4 
		.the_loop_over3: 
			;[COLOR=#FF8C00]; If EBX is a NULL pointer, skip over this part of the code. 
			cmp ebx, 0 
			jz .the_loop_over4        ;[/COLOR]
			
			;; Save a NULL character at [pointer]. 
			mov byte [ebx], 0 
		.the_loop_over4: 
		
		pop edi                       ;; Restore EDI. 
	ret 0 
	
	.finish: 
	
	pop ebx                           ;; Restore EBX. 
	leave 
ret 12 

;; The data section. 
section .data 
the_title                                             db "Memory Alphabet Example", 0 

;; We don't have to define every section in our source code; NASM would do the defining even if we don't.

Then I made another change to check whether the current character is actually a letter, of course, but that part's not that important. The important part is that we found and fixed the bugs.

For a conclusion, I would like to say don't be afraid of bugs in your code; even professional programmers make mistakes. As long as you are able to find and fix the errors, you should be fine.










First Tutorial:
Intro To Win32 Assembly, Using NASM

Previous Tutorial:
Using Memory Allocation and Alphabet Algorithm

Next Tutorial:
Integer Array Sorting and Displaying Algorithms

Attached Thumbnails

  • MemoryAlphabet_the_code_start_offset_cc.PNG
  • MemoryAlphabet_code_raising_exception_cc.PNG
  • MemoryAlphabet_the_code_start_cc.PNG
  • MemoryAlphabet_exe_failed_cc.PNG
  • MemoryAlphabet_cs_ax_output_cc.PNG

Edited by RhetoricalRuvim, 20 August 2011 - 07:02 PM.

  • 0

#2 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 26 August 2011 - 08:59 PM

It's a lot easier if you use objdump or a similar tool. You won't have to edit binaries, count addresses, or anything like that.
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#3 RhetoricalRuvim

RhetoricalRuvim

    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1307 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 26 August 2011 - 09:16 PM

A method I just discovered, a little after writing this tutorial, is using NASM's ndisasm like this:
\nasm\ndisasm -b32 -e1024 the_file.exe >> the_file_debug.asm

That way ndisasm would tell what addresses are where and the file offset is not even needed. The ">> the_file_debug.asm" part tells the system to redirect the output of ndisasm to the_file_debug.asm, instead of printing it to the screen. So then the next thing to do would be to open the_file_debug.asm with the editor that's used for editing .asm files (I use Notepad++ for just about any code files).
  • 0

#4 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 26 August 2011 - 10:08 PM

Might want to rewrite this tutorial in that case.
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#5 RhetoricalRuvim

RhetoricalRuvim

    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1307 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 26 August 2011 - 10:24 PM

The way in this tutorial is the way I handled the error when I was writing the program for the previous tutorial. Though I could either re-write this one or write another one, when I have time.

Thanks for the feedback.
  • 0





Also tagged with one or more of these keywords: assembly