With the ever maturing and stable ARM backend of LLVM it is hard to find information using it vs. the well known ARM-GCC release.
So lets start with the most simple HelloWorld example and compare LLVM and ARM-GCC.
Balau’s post is a popular one showing an ARM bare metal Hello World and test using QEMU, so lets start with that one. First, lets reproduce the compile/link steps to make sure it works:
1 2 3 4 5 6 7 |
|
Works just fine, so lets reproduce that using my LLVM bare metal build. All the compiler options are being shown even though some are defaulted in my build of LLVM so you can see everything it is required to get the LLVM bitcode conversion to produce a valid object file for our ARM target (I’m using the Clang driver, but you can use LLVM and pipe bitcode through the various tools so you can deeply control the optimization phase):
1 2 3 4 5 6 |
|
- target : Option providing the triple that you are ‘targeting’
- mpcu : Option provding the ARM core that will be flashed
- mfloat-abi : Soft or Hard depending upon if your ARM core has an FPU implementation on it. Cores that can support an FPU does not mean your vendor’s core has one, comes down to features/price of the core.
Note: In both, I am turning off the optimizers via the compile drivers.
Lets look at the size of the AXF (ARM Executable Format) produced by:
1 2 3 4 5 |
|
There is a 10 byte difference, interesting… lets look at that a little more:
llvm | arm-gcc | ||||
---|---|---|---|---|---|
section | size | addr | section | size | addr |
.startup | 16 | 65536 | .startup | 16 | 65536 |
.text | 108 | 65552 | .text | 104 | 65552 |
.ARM.exidx | 8 | 65660 | |||
.rodata | 4 | 65668 | .rodata | 20 | 65656 |
.rodata.str1.1 | 14 | 65672 | |||
.ARM.attributes | 40 | 0 | .ARM.attributes | 46 | 0 |
.comment | 19 | 0 | .comment | 112 | 0 |
Total | 209 | Total | 298 |
Note: I ran strip on the arm-gcc version to remove the empty debug sections that gcc inserts automatically
The .startup are the same size since this code is assembly and no codegen or optimization will happen there.
It is interesting that LLVM inserts a .ARM.exidx section even though this is only .c code. I’ll have to look at LLVM to see if -funwind-tables and/or -fexceptions are defaulted to on, but I disassemble it below so we can look at that as that is 8 bytes and accounts for the size difference in this really basic example.
.ARM.exidx is the section containing information for unwinding the stack
Note: Understanding the ARM ELF format is not really required to do bare metal programming, but, understanding how your code is allocated and loaded can maek a world of differences when you are writting linker definitions files for different cores, so send a few minutes and read the 46 pages :-)
First the gcc disassembly so we can compare the LLVM version to it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Now the LLVM version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
We can ignore the _Reset section as that is hand coded assembly and the same for both.
The c_entry is interesting as LLVM uses a move to copy the stack register to fp (r11 = frame pointer) which I what I would do, but arm-gcc does an “"add”“ to get fp into the sp and does that by adding fp to register #4(?) This is flagged as general variable for gcc… I am slightly confused by gcc’s choice to do that, now that question is when would #4 not contain zero? The rest of this function is the same between the two compilers.
The print_uart0 function is a hack function as it does not implement FIFO/flow-control to an actual UART, but in this case it points to a memory address where the discontinued ARM Versatile PB dev-board does have a UART and QEMU board simulation echos those writes. I am not going to do a line by line comparision of the generated code as for un-optimized code they are both getting the job done, but in slightly different ways in almost the same number of instructions.
So we are able to produce a working bare metal ARM AXF from LLVM and next time, I will spend a little time on compiler optimizations to see how the two code generators/optimizisers compare…