Jump to content

How to generate Intel or AT&T assemblies with GCC/LLVM

- - - - -

  • Please log in to reply
2 replies to this topic

#1
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
Apart from disassembling your compiled programs (with useful GNU bin tools such as objdump) to retrieve the instructions, compilers often have the ability to emit the compiled assemblies before assembling them.

A simple program we can demonstrate this on, will place two integers in to the stack and add one of them before returning 0 from main:
int main() {

  int a = 100;

  int b = 200 + a;

  return 0;

}
Normally optimization can improve the code, although we can turn this feature off to better understand the instructions in correlation to our uncompiled code.

The key flags for GCC here being:

  • -S: Tell GCC to not assemble, and to output what it has done prior to.
  • -O: Optimization levels (do not modify the code beyond our recognition)
gcc -O0 -S codecall.c -O codecall.s
The example may produce the following in GCC's native AT&T syntax:
    movl    $100, 12(%esp)

    movl    $200, 8(%esp)

    movl    8(%esp), %eax

    movl    12(%esp), %edx

    leal    (%edx,%eax), %eax

    leave

    ret

Certain lines may be unclear, especially what belongs to which line.

We may be able to add comments in to our C code to manually describe each line before the source (i.e. asm("# this line is a+b:") although this is messy/unrelated to C and assembly comments should not be inside your code for this reason.

You can instruct GCC to generate comments to each corresponding instruction based on the code you have given it to prevent the need for this

The main key flag being:

  • -f: Pass a parameter with a long name, in this case verbose-asm.

gcc -O0 -S -fverbose-asm codecall.c -o codecall.s
This may generate the following assembly out of the same C source above (although I have removed the unneeded portions of code, and the large compiler information comment that has been added due to this flag):
    movl    $100, 12(%esp)     #, a

    movl    12(%esp), %eax     # a, tmp61

    addl    $200, %eax     #, tmp60

    movl    %eax, 8(%esp)     # tmp60, b

    movl    $0, %eax     #, D.1957

    leave

    ret
From this we can gather what the program does line per line, I could assume the following from the previous information now:

  • store 100 (a) in to stack with an offset
  • move previous result in to register eax
  • eax += 200 (a + b)
  • move eax back in to stack with new offset to be used])
These pairs of assemblies are useful for us to understand how our code actually be run, however you can likely see that it performs a static operation based on 100 and 200 and then stores a value to the stack of which will not later be read.

This is where optimizations come in, they will likely just store 300 in to the code, or even remove this code reference all together as it will not ever be called later on - This is why you must heed optimization when generating the source (unless you wish to view what code is redundant in any case.)

For clang (part of the LLVM compiler toolchain) we can generate the assemblies as well in a similar syntax, the important flags being:

  • -S: Only run preprocessor and compilation steps
  • -O0: Do not run any extra optimizations (much the same as GCC)
clang -S -O0 codecall.c -o codecall.s
And will generate a similar source:
    movl    $0, %eax

    movl    $200, %ecx

    movl    $0, -4(%ebp)

    movl    $100, -8(%ebp)

    movl    -8(%ebp), %edx

    addl    %ecx, %edx

    movl    %edx, -12(%ebp)

    addl    $12, %esp

    popl    %ebp

    ret
Do I have to use AT&T style?

You may be more familiar to Intel syntax than AT&T (for example if you have read Dargeta's set of Intel tutorials: http://forum.codecal...e-part-1-a.html).

GCC and LLVM compilers can both attempt* to compile to another assembly syntax with direct translation from the original assembly. You may pass these following flags to the appropriate compiler:
GCC: -masm=intel

LLVM (llc static compiler): --x86-asm-syntax=intel 
GCC along with the -fverbose-asm and -masm=intel flags may generate this:
    mov    DWORD PTR [esp+12], 100     # a,

    mov    eax, DWORD PTR [esp+12]     # tmp61, a

    add    eax, 200     # tmp60,

    mov    DWORD PTR [esp+8], eax     # b, tmp60

    mov    eax, 0     # D.1957,

    leave

    ret
*The Darwin version of GCC does not support the Intel syntax.

And those are just a few ways of viewing useful information about each instruction of your program, especially when learning about code you have written and how fundamental optimizations can or will be applied to it.

You may review the man pages of both compiler toolchains to review what options you can pass to each compiler, some of which may increase the clearity or speed of specific portions of code.

Edited by Alexander, 23 June 2011 - 10:50 PM.
Added a reference

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#2
dargueta

dargueta

    Writes binary right handed and hex left handed

  • Moderators
  • 4,705 posts
  • Programming Language:C, Java, C++, PHP, Python, Perl, Assembly, Bash, Others
  • Learning:JavaScript
I know this isn't the subject of your tutorial, but you might want to mention objdump and how it can disassemble programs you haven't written. (Thanks for the recognition, by the way. :) )
sudo rm -rf /

#3
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
Certainly welcome!

I have added a reference to objdump's man page in the first lines, it could be certainly useful to compare our method to manual disassemblies.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users