Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

How to generate Intel or AT&T assemblies with GCC/LLVM


  • Please log in to reply
3 replies to this topic

#1 Alexander

Alexander

    YOL9

  • Moderator
  • 3963 posts
  • Location:Vancouver, Eh! Cleverness: 200
  • Programming Language:C, C++, PHP, Assembly

Posted 03 June 2011 - 05:47 PM

Apart from disassembling your compiled programs (with useful GNU bin tools such as objdump) to retrieve the instructions, compilers often have the ability to emit the compiled assemblies before assembling them.

A simple program we can demonstrate this on, will place two integers in to the stack and add one of them before returning 0 from main:
int main() {
  int a = 100;
  int b = 200 + a;
  return 0;
}
Normally optimization can improve the code, although we can turn this feature off to better understand the instructions in correlation to our uncompiled code.

The key flags for GCC here being:

  • -S: Tell GCC to not assemble, and to output what it has done prior to.
  • -O: Optimization levels (do not modify the code beyond our recognition)
gcc -O0 -S codecall.c -O codecall.s
The example may produce the following in GCC's native AT&T syntax:
movl    $100, 12(%esp)
    movl    $200, 8(%esp)
    movl    8(%esp), %eax
    movl    12(%esp), %edx
    leal    (%edx,%eax), %eax
    leave
    ret
Certain lines may be unclear, especially what belongs to which line.

We may be able to add comments in to our C code to manually describe each line before the source (i.e. asm("# this line is a+b:") although this is messy/unrelated to C and assembly comments should not be inside your code for this reason.

You can instruct GCC to generate comments to each corresponding instruction based on the code you have given it to prevent the need for this

The main key flag being:

  • -f: Pass a parameter with a long name, in this case verbose-asm.

gcc -O0 -S -fverbose-asm codecall.c -o codecall.s
This may generate the following assembly out of the same C source above (although I have removed the unneeded portions of code, and the large compiler information comment that has been added due to this flag):
movl    $100, 12(%esp)     #, a
    movl    12(%esp), %eax     # a, tmp61
    addl    $200, %eax     #, tmp60
    movl    %eax, 8(%esp)     # tmp60, b
    movl    $0, %eax     #, D.1957
    leave
    ret
From this we can gather what the program does line per line, I could assume the following from the previous information now:

  • store 100 (a) in to stack with an offset
  • move previous result in to register eax
  • eax += 200 (a + b)
  • move eax back in to stack with new offset to be used])
These pairs of assemblies are useful for us to understand how our code actually be run, however you can likely see that it performs a static operation based on 100 and 200 and then stores a value to the stack of which will not later be read.

This is where optimizations come in, they will likely just store 300 in to the code, or even remove this code reference all together as it will not ever be called later on - This is why you must heed optimization when generating the source (unless you wish to view what code is redundant in any case.)

For clang (part of the LLVM compiler toolchain) we can generate the assemblies as well in a similar syntax, the important flags being:

  • -S: Only run preprocessor and compilation steps
  • -O0: Do not run any extra optimizations (much the same as GCC)
clang -S -O0 codecall.c -o codecall.s
And will generate a similar source:
movl    $0, %eax
    movl    $200, %ecx
    movl    $0, -4(%ebp)
    movl    $100, -8(%ebp)
    movl    -8(%ebp), %edx
    addl    %ecx, %edx
    movl    %edx, -12(%ebp)
    addl    $12, %esp
    popl    %ebp
    ret
Do I have to use AT&T style?

You may be more familiar to Intel syntax than AT&T (for example if you have read Dargeta's set of Intel tutorials: http://forum.codecal...e-part-1-a.html).

GCC and LLVM compilers can both attempt* to compile to another assembly syntax with direct translation from the original assembly. You may pass these following flags to the appropriate compiler:
GCC: -masm=intel
LLVM (llc static compiler): --x86-asm-syntax=intel
GCC along with the -fverbose-asm and -masm=intel flags may generate this:
mov    DWORD PTR [esp+12], 100     # a,
    mov    eax, DWORD PTR [esp+12]     # tmp61, a
    add    eax, 200     # tmp60,
    mov    DWORD PTR [esp+8], eax     # b, tmp60
    mov    eax, 0     # D.1957,
    leave
    ret
*The Darwin version of GCC does not support the Intel syntax.

And those are just a few ways of viewing useful information about each instruction of your program, especially when learning about code you have written and how fundamental optimizations can or will be applied to it.

You may review the man pages of both compiler toolchains to review what options you can pass to each compiler, some of which may increase the clearity or speed of specific portions of code.

Edited by Alexander, 23 June 2011 - 10:50 PM.
Added a reference

  • 0

All new problems require investigation, and so if errors are problems, try to learn as much as you can and report back.


#2 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 05 June 2011 - 10:38 AM

I know this isn't the subject of your tutorial, but you might want to mention objdump and how it can disassemble programs you haven't written. (Thanks for the recognition, by the way. :) )
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#3 Alexander

Alexander

    YOL9

  • Moderator
  • 3963 posts
  • Location:Vancouver, Eh! Cleverness: 200
  • Programming Language:C, C++, PHP, Assembly

Posted 05 June 2011 - 02:08 PM

Certainly welcome!

I have added a reference to objdump's man page in the first lines, it could be certainly useful to compare our method to manual disassemblies.
  • 0

All new problems require investigation, and so if errors are problems, try to learn as much as you can and report back.


#4 AndressoureMM

AndressoureMM

    CC Lurker

  • Just Joined
  • Pip
  • 1 posts

Posted 20 December 2016 - 08:20 PM

Hi Codeman, How does Geromes solution benchmark against your initial code? FBSL is NOT VB Mike
  • 0