I tend to use semi-hosting via QEMU simluation and OpenJTAG/OpenOCD a lot; i.e.: for debugging, simulating sensor input and output, setting the RTC on a board for the first time and while the RMI Monitor interface is built-in to newlib stdio functions like printf, using a library like stdio is not really an option when a core only has 8k of ROM and 2k of RAM. So I need a really small printf routine to use on cores like the Kinetis KL03 (MKL03Z32CAF4R)
1 2 3 4 5 |
|
There are a lot of embedded printf routines posted with a variety of features and mine is just a collection/combo of various standard practices. The main difference of mine is it normally uses SVC/BKPT routines to perform the ‘print’ output and I try to make sure that optimizations via LLVM are taken advantage of.
So the question of how small of a routine is it as otherwise it is useless on something like the ‘world’s smallest ARM’ KL03? Lets start with a newlib stdio version that uses the default syscalls that have RMI enabled. First you have to some heap as newlib printf using malloc.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The code size of the complete program above is huge if you are trying to run it on a Cortex-M0+ that only has 8K of ROM and 2k of RAM. Over 32K of ROM and 2+K of RAM just to output two lines of code!
1 2 |
|
So lets use a printf that is self-contained and uses no heap (malloc) and update our test code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Now that is more like it, 2k of ROM and 64 bytes of RAM: Debug code size:
1 2 |
|
In release configuration it is even better, ~1k of ROM is use, RAM is the same as expected; Release code size, LLVM compiled with -Os, linked with -O4:
1 2 |
|
Adding 10 more printf statements that each contain a different but static 10 char string only adds 210 bytes to the ROM. Removing the 100 bytes for static string allocation, that breaks down to 10 bytes for the printf call. This can be improved upon a little, but 10 bytes is acceptable for now.
1 2 |
|
A quick break down of elf size:
text: your code, vector table plus constants.
data: Initialized variables, and it counts for RAM and FLASH. The linker allocates data in FLASH which then is copied from ROM to RAM in the startup code (in startup.c via the Reset_Handler function in my case)
bss: Uninitialized data in RAM which is initialized with zero in the startup code (again see the Reset_Handler function)