Jump to content

Check out our Community Blogs

Register and join over 40,000 other developers!

Recent Status Updates

View All Updates

- - - - -

Assembly, String-Integer Algorithms (Win32, NASM)

string assembly

  • Please log in to reply
No replies to this topic

#1 RhetoricalRuvim


    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1311 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 17 August 2011 - 06:13 PM

Different kinds of variables have different types. A type tells about how to handle the variable. For example, an integer is handle differently from a float, which is handled differently from a string pointer.

One type of variable is an integer, which is a whole number that can be either positive or negative (whole numbers can't be negative; integers can). And for the user to understand what you're saying, when working with integers, you need to first convert the integer to a string. Same applies for your program: in order to work with the number, you need to convert it to an integer.

Of course, in assembly language, it is possible to use a pointer as an integer, an integer as a float, etc., but you should only do such a thing if you know what you're doing, or else you might get unpredictable results.

Let's take a look at how integers are represented.

Unsigned Integers
Unsigned integers are pretty simple. When looking at one (bit by bit), it's just a number, represented with binary digits.

65 would be 01000001
70 would be 01000110
24 would be 00011000

Representation Of (Signed) Integers - Two's Complement
Signed integers are similar to unsigned integers. To negate an integer (invert it's sign), you need to complement all its bits and then increment it.

For example, the integer 65 (01000001), negated, would be -65 (10111111 or ((NOT 01000001) + 1 = 10111110 + 1)).

-70 would be 10111010
-24 would be 11101110

This is for 8-bit integers; so what about for 16- or 32-bit integers?

Integer Precision - 8-Bit vs 16-Bit vs 32-Bit (vs 64-bit, ...)
In general, the formula for negating a number is the same as described above.

There is another way, but I'll explain that later.

A 16-bit 65 would be 0000000001000001;
a 32-bit 65 would be 00000000000000000000000001000001;
a 64-bit 65 would be 0000000000000000000000000000000000000000000000000000000001000001;
and so on...

You can also negate 16-bit numbers and 32-bit numbers and so on...

So a 16-bit -65 would be 1111111110111111;

There are also integers of arbitrary precision, but we won't get into those right now.

Number Characters Within Number Strings
The characters used for representing number digits are:
'0' ASCII 48 binary 00110000
'1' ASCII 49 binary 00110001
'2' ASCII 50 binary 00110010
'3' ASCII 51 binary 00110011
'4' ASCII 52 binary 00110100
'5' ASCII 53 binary 00110101
'6' ASCII 54 binary 00110110
'7' ASCII 55 binary 00110111
'8' ASCII 56 binary 00111000
'9' ASCII 57 binary 00111001

As you can see, the ASCII codes range from 48 to 57 (inclusive). We'll need this information for our integer-string algorithms.

DIV - Divide
Divides the value in the accumulator by the first operand.


One thing you have to remember is to make sure that both the high-order and the low-order values of the dividend are what you want them to be. For example, to divide whatever is in EAX by whatever is in ECX, you would use 'DIV ECX', but the DIV instruction, for a 32-bit operand, also looks at EDX, so you would have to 'XOR EDX, EDX' (or use some other method to set it to 0), for unsigned integers, or sign-extend the 64-bit integer, for signed integers, in order for the division to work properly.

I once was using the DIV instruction in one of my algorithms and I forgot to clear EDX before the division. Even worse, it was inside a loop, and the loop termination depended on the result of the division. The loop never ended, and I had to forcefully end the process. I figured out what the problem was, a little later, and fixed it. Well, we're all people and we all sometimes forget, but the important thing is that we know how to fix the mistakes.

When you are working with signed integers, you could use the CDQ instruction, which sign-extends EAX to EDX:EAX. But the other thing, if you're just done multiplying a signed integer, you don't have to do anything to EDX, because the IMUL instruction took care of the sign-extending.

IDIV - Signed Integer Divide
This instruction is the same as the DIV instruction, except it works for signed integers, also.

MUL - Multiply
This instruction is somewhat similar to the DIV instruction. It multiplies the accumulator by operand 1 and stores the result in the accumulator or and in the data register.

IMUL - Signed Integer Multiply
Same as MUL but works for signed integers, as well.

NEG - Negate
Sets operand1 to 0 - operand1.

XCHG - Exchange
Exchanges the values of the two operands. Same as operand1= operand2 and operand2= what operand1 was an instruction ago.

CDQ - Convert Double-Word To Quad-Word
Sign-extend the EAX register to get the 64-bit signed integer in EDX:EAX.

Division - The Remainder
In C/C++, the modulus - or the remainder - is noted by the % operator. In assembly language, you have to actually do division, to get the remainder.

No, you don't have to do multiplication and subtraction; it's just in Intel assembly language both the quotient (the result of the division) and the remainder are calculated with one instruction.

Well, see, with integers, you can't have decimals, so you have to round down. The remainder is what's left over after you divide. For example, if you divide 12 by 5, you get 2, but if you multiply 12 by 5 again, you get 10. The remainder is the difference between 12 and 10, in this case.

10 % 3 = 1
15 % 4 = 3
18 % 3 = 0

The remainder is actually pretty useful for some things; we will have to find the remainder in our integer-string algorithms.

Converting An Integer To A String
We'll need to find the number of digits that the integer will require, first of all. Then we would set our pointer to point to the beginning of the output string + that number; if the string is negative, we increment our pointer. We should probably initialize the first character in the string to a negative sign (-). Then we also set the character at the memory address pointed to by the pointer to 0 (for the NULL terminator for the string).

This is a loop.
We decrement the pointer.
Then we divide the integer by 10, and save the remainder + 48 at whatever the pointer points to.
The quotient result of the division is now the new integer we're working on. If that integer is not 0, continue with the loop.

And that's all we need for the integer-to-string algorithm. I'll put the assembly code for this into the example program.

Converting A String To An Integer
We get an integer and initialize it to 0.
We get a pointer and initialize it to the first number digit in the string.

This is a loop.
Take the byte at the memory location pointed to by our pointer.
Break from the loop if this value is not a valid number character (if it's less than 48 or greater than 57).
Subtract 48 from that. Then add that number to our integer.
Multiply our integer by 10.
Increment our pointer.
Continue the loop.

This is the end of the loop.

Divide our integer by 10, because the loop multiplied it by 10 one more times than needed.

Return the integer.

That's all that's needed for a string-to-integer algorithm. Watch out for these steps in the example program.

Converting A String With A Signed Number To An Integer
This process is not very different from the above process.

Just count the number of leading minus (-) signs in the string and if that number is odd then negate the result.

We'll call the integer-to-string function i2str() and the string-to-integer function str2i().

Example Program - The Plan
Here's the plan:
  • Display a message box with the string "-1234"
  • Convert that string ("-1234") to an integer, multiply that by 3, divide by 2, add 7, and convert that back to a string.
  • Display a message box with that string.

Example Program - The Code
;; Define the externs. 
extern MessageBoxA 
extern ExitProcess 

;; Construct our symbol import table. 
import MessageBoxA user32.dll 
import ExitProcess kernel32.dll 

;; The following is in the code section. Use 32-bit code. 
section .text use32 
;; ..start is where the execution of our program starts. 

;; Call the main() function. 
call main 

;; Exit the program, returning whatever main() returned in EAX. 
push eax 
call [ExitProcess] 

	enter 516, 0                    ;; We'll need a 512-byte string and a 4-byte integer. 
	;; Display a message box with the string "102" 
	push dword 0 
	push dword the_title 
	push dword txt1 
	push dword 0 
	call [MessageBoxA] 
	;; Convert that string to an integer. 
	push dword txt1 
	call str2i 
	mov dword [ebp-4], eax 
	;; Multiply the integer by 3. 
	mov eax, dword [ebp-4]            ;; EAX is already that value, but whatever. 
	mov ecx, 3                        ;; We'll be multiplying by 2. 
	imul ecx                          ;; Just in case the integer is negative, we'll use IMUL. 
	;; Then we divide that number by 2. 
	mov ecx, 2                        ;; We'll be dividing by 2. 
	;; This time we can't clear EDX because it's part of the signed integer. Though if we did 
	;; reset it now, we'd use the CDQ instruction. 
	                                  ;; contains the high-order double-word of the dividend. 
	idiv ecx                          ;; We'll use IDIV. 
	;; And then we add 7 to that number. 
	add eax, 7 
	;; And save that number in [ebp-4]. 
	mov dword [ebp-4], eax 
	;; Now we convert that number back to a string. 
	lea ebx, [ebp-516]                                   ;; Get the address of the string. 
	push ebx 
	push eax                                             ;; EAX is equal to [ebp-4], so why not use EAX? 
	call i2str 
	;; Now we display a message box with that new string. Note that EBX was preserved by the i2str function, 
	;; so we can still use it. 
	push dword 0 
	push dword the_title 
	push ebx 
	push dword 0 
	call [MessageBoxA] 
	;; And we return 0. 
	xor eax, eax 
	leave                        ;; Switch back to the previous stack frame. 
ret 0                            ;; Free 0 bytes off the stack, after returning; the 0 is not really necessary, but anyway. 

;; I think some of the things (ie ENTERs, PUSHAs, etc.) are pretty obvious, by now, and don't need to be explained as much. 

;; Now the i2str() function. 
i2str:                 ;; params:  the_integer, the_string 
	enter 4, 0 
	;; Initialize the beginning of the string to '-' 
	mov eax, dword [ebp+12] 
	mov ebx, eax 
	mov byte [ebx], 45                        ;; ASCII code for '-' 
	;; We initialize ESI to 0. 
	xor eax, eax 
	mov esi, eax 
	;; Check if the integer is negative. 
	mov eax, dword [ebp+08] 
	sub eax, 0 
	jnl .over1 
		inc esi                   ;; If the integer is negative, we would need to increment the number. 
		neg eax                   ;; Also, we make sure the number we're dealing with is positive. 
	mov dword [ebp-4], eax        ;; We save the integer to [ebp-4] 
	;; Count the number of digits required for the string. 
	mov eax, dword [ebp-4]            ;; EAX= the integer 
	xor ecx, ecx                      ;; ECX= 0 
	mov ebx, 10                       ;; EBX= 10 
		inc ecx                       ;; ECX= ECX + 1 
		xor edx, edx                  ;; EDX= 0 
		div ebx                       ;; EAX= EAX / EBX = EAX / 10 
		cmp eax, 0                    ;; If not 0 yet, 
		jnz .lp1                      ;; continue loop. 
	;; .....  
	add ecx, esi                  ;; ESI will be 1 if the number's negative and 0 if the number is positive. 
	;; That way, this number (in ECX) would be incremented if the number is negative. 
	;; Then we get the pointer. 
	mov eax, dword [ebp+12] 
	add eax, ecx                  ;; We add the offset number, that we came up with earlier, to the pointer. 
	mov ebx, eax                  ;; Now the pointer points to where the NULL terminator should be, in the future. 
	;; Set the byte at the memory address pointed to by the pointer to 0. 
	mov byte [ebx], 0 
	;; This is a loop. 
		dec ebx                      ;; We decrement the pointer. 
		mov eax, dword [ebp-4] 
		xor edx, edx 
		mov ecx, 10 
		div ecx                      ;; Divide the integer by 10. 
		xchg eax, edx                ;; We'll change this back in a moment. 
		;; EAX is now the remainder. 
		add eax, 48 
		mov byte [ebx], al           ;; Set the byte pointed to by the pointer to the remainder + 48. 
		xchg eax, edx                ;; EAX is now the integer, again. 
		mov dword [ebp-4], eax 
		sub eax, 0 
		jnz .lp2                     ;; If the integer is not 0, we continue with the loop. 
ret 8 

;; And the str2i() function. 
str2i:                           ;; params:  the_string 
	enter 0, 0 
	push dword 1         ;; Set dword [ebp-4] to 1. 
	;; Let's count the number of minus (-) signs in the string. 
	mov eax, dword [ebp+8] 
	xor ecx, ecx 
	mov ebx, eax 
		mov al, byte [ebx] 
		cmp al, 0 
		jz .lp1s 
		inc ebx 
		inc ecx 
		cmp al, 45                          ;; 45 is the ASCII code for '-' (minus) 
		jz .lp1 
		dec ecx 
		cmp al, 48                          ;; Also, leading '0' s don't count. 
		jz .lp1 
		cmp al, 32                          ;; 32 is the ASCII code for ' ' (space) 
		jz .lp1 
		cmp al, 9                           ;; 9 is the ASCII code for '	' (tab-space) 
		jz .lp1 
		dec ebx 
	;; See if the number's odd. 
	mov eax, ecx                             ;; We're looking at the number of minus (-) signs. 
	test eax, 1 
	jz .over1 
		mov dword [ebp-4], -1                ;; If odd, set dword [ebp-4] to -1. 
	;; EBX already points to the first non-space, non-tab-space, non-minus, and non-ASCII-48 character. 
	;; We initialize the integer to 0. 
	xor eax, eax 
	mov esi, eax                             ;; We'll use ESI for the integer. 
	;; This is a loop. 
		xor eax, eax                         ;; Clear EAX. 
		mov al, byte [ebx]                   ;; Take the byte from the memory location pointed to by our pointer. 
		;; Break the loop if this value is not a valid number character. 
		cmp al, 48 
		jl .lp2s 
		cmp al, 57 
		jg .lp2s 
		sub al, 48                           ;; Subtract 48 from that. 
		add esi, eax                         ;; Then add that number to our integer. 
		;; Multiply our integer by 10. 
		mov eax, esi                         ;; Multiplying the integer. 
		mov ecx, 10                          ;; Multiplying by 10. 
		mul ecx                              ;; And perform the operation. 
		mov esi, eax                         ;; Then we save the result. 
		inc ebx                              ;; Increment our pointer. 
		jmp .lp2                             ;; Continue loop. 
	;; This is the end of the loop. 
	;; We'll need to divide the number by 10, because our loop multiplies the number by 10 
	;; one more than needed times. 
	mov eax, esi 
	xor edx, edx                             ;; Clear EDX. 
	mov ecx, 10                              ;; We're going to divide by 10. 
	div ecx 
	mov esi, eax 
	;; Now we just have to apply the sign to our integer. 
	;; Since [ebp-4] is -1 if the number's supposed to be negative, 
	;; and 1 if the number's supposed to be positive, it would 
	;; make sense to just multiply our integer by [ebp-4] to 
	;; get it to be with the right sign. 
	;; Get [ebp-4] and prepare for multiplication. 
	mov eax, dword [ebp-4] 
	mov ecx, eax 
	;; Get the integer. 
	mov eax, esi 
	;; We need to use IMUL because [ebp-4] is a signed integer. 
	imul ecx 
	;; Now we have EAX= the final result; we need to return that integer. 
	mov dword [ebp-8], eax                     ;; Note that the number of bytes reserved for local variables is 4; 
	;; the return value (EAX) is at [ebp-(number of bytes reserved)-4], which is [ebp-(4)-4], which is [ebp-8]. 
ret 4 

;; The data section of our program. 
section .data 
the_title                                       db "String-Integer Algorithms Example", 0 
txt1                                            db "-1234", 0 
txt2                                            db "                         ", 0 

Example Program - The Output
You should get a message box with "-1234" and then another message box with "-1844"

Here's what the output might look like:

First Tutorial:
Intro To Win32 Assembly, Using NASM

Previous Tutorial:
Intro To Algorithms With String Functions

Next Tutorial:
Applications of Mouse and Keyboard Input

Attached Thumbnails

  • StringIntegerAlgorithms_output_cc.PNG

Edited by RhetoricalRuvim, 20 August 2011 - 06:58 PM.

  • 0

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download