georg's blog

Inline Assembler

Assembly instructions can be embedded in C code with the asm keyword:

asm ("mov ...");

Multiple lines must be separated by newlines:

asm ("mov ...\n"
     "add ...\n"
     "...");

The assembly instructions must be provided either in Intel or AT&T syntax which will be explained below. Some compilers or C standards require the keyword __asm or __asm__ (two underline characters). The keyword volatile will instruct the compiler not to change or remove an inline assembly block for example to prohibit the compiler from removing an empty loop:

for (int i = 0; i < 100000; i++) {
    asm volatile ("nop");
}

Under some circumstances, it is possible to use C variables in the assembly code (since the labels are known to both the compiler and assembler).

GCC

The GNU compiler collection does not understand the embedded assembly and just prints it to the assembly code it generates and that is further translated to machine instructions by the GNU assembler. Since GCC uses the AT&T syntax (probably since it is easier to machine-generate), the embedded code must be in that syntax. However, the compiler (and subsequently, the assembler) can be configured to use and understand the Intel syntax:

$ gcc -masm=intel

This instructs the compiler to generate Intel syntax assembly and consequently also calls the assembler with that parameter. Now, the inline assembly can be in the (to me) more natural Intel syntax.

Extended Inline Assembly

To help the GCC to correctly embed the piece of assembly into its own generated code, it must be provided with some information about the assembly:

input: values to load into registers before the execution of this block
output: where should the register values go after the execution
clobbered: other registers, that change their content so that they must be preserved (push on the stack before and restore after execution)

The complete syntax is:

asm ("mov ..." : /*output*/ : /*input*/ : /*clobbered*/ );

The output and input registers are provided with the information what goes where. In the following example, =c means: "the new value of ECX", a means: "load EAX with ...". The full list of those modifiers is in the documentation linked below.

uint32_t func = 1;
uint32_t reg_ecx, reg_edx;
asm volatile ( "cpuid" 
                : "=c"(reg_ecx), "=d"(reg_edx)  /* output: ECX and EDX */
                : "a"(func)                     /* input: func into EAX */
                : "ebx" );                      /* also modified: EBX */

References

Intel syntax

The Intel Syntax the natural one (used in most textbooks and the processor manuals from Intel and AMD). The major property is the target, source sequence (like in the assignment of a variable in C):

MOV eax, ebx        ; EAX := EBX
ADD ebx, 10         ; EBX += 10

Example program:

#include <stdio.h>

int x = 5;
int main(int argc, char **argv) {

  printf("x = %u\n", x);
  asm (
   "mov eax, x"  "\r\n"
   "mov ecx, eax" "\r\n"
   "add ecx, eax" "\r\n"
   "mov x, ecx"  "\r\n"
   ::: "eax", "ecx"
  );
  printf("x = %u\n", x);

  return 0;
}

This program compiles with gcc -masm=intel. If using gcc -S -o - -masm=intel filename.c, the assembly generated by the compiler is printed in Intel syntax to the screen.

Theoretically, both syntaxes can be mixed by using the assembly keyword .intel_syntax noprefix and .att_syntax noprefix. But when doing so, don't forget to restore the AT&T setting at the end of every inline assembly block.

AT&T syntax

The GCC uses this syntax by default. It has the sequence source, target. Further, the instructions are postfixed with a modifier indicating the bit-width (here: l for long=32 bit) and constant numbers must be prepended with a $ sign.

movl %ebx, %eax     # EAX -> EBX
addl $10, %ebx      # (EBX + 10) -> EBX

social