Assembly instructions can be embedded in C code with the asm
keyword:
Multiple lines must be separated by newlines:
asm ("mov ...\n"
"add ...\n"
"...");
The assembly instructions must be provided either in Intel or AT&T syntax
which will be explained below.
Some compilers or C standards require the keyword __asm
or __asm__
(two underline characters).
The keyword volatile
will instruct the compiler not to change or remove an inline assembly block
for example to prohibit the compiler from removing an empty loop:
for (int i = 0; i < 100000; i++) {
asm volatile ("nop");
}
Under some circumstances, it is possible to use C variables in the assembly code
(since the labels are known to both the compiler and assembler).
GCC
The GNU compiler collection does not understand the embedded assembly and just
prints it to the assembly code it generates and that is further translated to machine instructions
by the GNU assembler.
Since GCC uses the AT&T syntax (probably since it is easier to machine-generate),
the embedded code must be in that syntax.
However, the compiler (and subsequently, the assembler) can be configured
to use and understand the Intel syntax:
This instructs the compiler to generate Intel syntax assembly and consequently
also calls the assembler with that parameter.
Now, the inline assembly can be in the (to me) more natural Intel syntax.
Extended Inline Assembly
To help the GCC to correctly embed the piece of assembly into its own generated code,
it must be provided with some information about the assembly:
- input: values to load into registers before the execution of this block
- output: where should the register values go after the execution
- clobbered: other registers, that change their content so that they must be preserved
(push on the stack before and restore after execution)
The complete syntax is:
asm ("mov ..." : /*output*/ : /*input*/ : /*clobbered*/ );
The output and input registers are provided with the information
what goes where.
In the following example, =c
means: "the new value of ECX", a
means: "load EAX with ...".
The full list of those modifiers is in the documentation linked below.
uint32_t func = 1;
uint32_t reg_ecx, reg_edx;
asm volatile ( "cpuid"
: "=c"(reg_ecx), "=d"(reg_edx) /* output: ECX and EDX */
: "a"(func) /* input: func into EAX */
: "ebx" ); /* also modified: EBX */
References
Intel syntax
The Intel Syntax the natural one (used in most textbooks and the processor manuals from Intel and AMD).
The major property is the target, source sequence (like in the assignment of a variable in C):
MOV eax, ebx ; EAX := EBX
ADD ebx, 10 ; EBX += 10
Example program:
#include <stdio.h>
int x = 5;
int main(int argc, char **argv) {
printf("x = %u\n", x);
asm (
"mov eax, x" "\r\n"
"mov ecx, eax" "\r\n"
"add ecx, eax" "\r\n"
"mov x, ecx" "\r\n"
::: "eax", "ecx"
);
printf("x = %u\n", x);
return 0;
}
This program compiles with gcc -masm=intel
.
If using gcc -S -o - -masm=intel filename.c
, the assembly generated by the compiler is printed
in Intel syntax to the screen.
Theoretically, both syntaxes can be mixed by using the assembly keyword .intel_syntax noprefix
and .att_syntax noprefix
. But when doing so, don't forget to restore the AT&T setting at the
end of every inline assembly block.
AT&T syntax
The GCC uses this syntax by default.
It has the sequence source, target.
Further, the instructions are postfixed with a modifier indicating the bit-width (here: l
for long=32 bit)
and constant numbers must be prepended with a $
sign.
movl %ebx, %eax # EAX -> EBX
addl $10, %ebx # (EBX + 10) -> EBX