This time around, lets use the CMSIS abstraction layer to access the SysTick core peripheral.
This peripheral can be used to provide the core timer to an embedded RTOS kernel, such as FreeRTOS, or to provide application timing events to know when to read some attached sensors or such. In the most basic form, it provides a pollable countdown value. This value is decreased from a user settable value (Reload Value) on every clock tick. If it configured as an interrupt, the function assigned activates every n+1 clock ticks.
I used Clang/LLVM to compile a simple app that shows you how to set the reload value, read (poll) the internal SysTick value or enable it as an interrupt.
#include <stdlib.h>#include "CortexM3_xx.h"#include <core_cm3.h>#include <stdint.h> #include "svc.h"volatileuint32_tmyTicks;voidSysTick_Handler(void){myTicks++;printf("...myTicks = %lu; SysTick->VAL = %lu\n",myTicks,SysTick->VAL);}intmain(void){printf("SysTick should not be active yet...\n");for(intx=0;x<10;x++){printf("...Current value: %lu\n",SysTick->VAL);}printf("Enable SysTick and lets poll it...\n");volatileuint32_tclock=10000;SysTick->LOAD=clock-1;/* * SysTick_CTRL_CLKSOURCE_Msk : Use core's clock * SysTick_CTRL_ENABLE_Msk : Enable SysTick * SysTick_CTRL_TICKINT_Msk : Active the SysTick interrupt on the NVIC */SysTick->CTRL=SysTick_CTRL_CLKSOURCE_Msk|SysTick_CTRL_ENABLE_Msk;for(intx=0;x<10;x++){printf("...Current value: %lu\n",SysTick->VAL);}printf("Enable SysTick Interrupts and watch local var get incremented...\n");myTicks=0;SysTick->CTRL=SysTick_CTRL_CLKSOURCE_Msk|SysTick_CTRL_ENABLE_Msk|SysTick_CTRL_TICKINT_Msk;while(myTicks<=10){asm("nop");// Do nothing till SysTick_Handler been been called at least 10 times}exit(0);}
I added ARM’s CMSIS 3.01 to my LLVM project and wanted to test out the pre-compiled DSP libraries that are supplied.
I borrowed one of the cos/sin examples and added some semihosting printfs using NEWLIB and cleaned up the code a bit.
CMSIS-DSP: DSP Library Collection with over 60 Functions for various data types: fix-point (fractional q7, q15, q31) and single precision floating-point (32-bit). The library is available for Cortex-M0, Cortex-M3, and Cortex-M4. The Cortex-M4 implementation is optimized for the SIMD instruction set.
Updating my Makefile to include the correct CMSIS libraries (arm_cortexM3l_math) for the ld and the currect headers for Clang/LLVM and the result works great for Cortex-M3. I copied the project over and mod’d the Makefile so it picks up the correct Cortex-M0 lib (arm_cortexM0l_math) and everything looks on this core also.
#include <stdlib.h> #include <math.h> #include "arm_math.h" /* ---------------------------------------------------------------------- * Defines each of the tests performed * ------------------------------------------------------------------- */#define MAX_BLOCKSIZE 32 #define DELTA (0.000001f) /* ---------------------------------------------------------------------- * Test input data for Floating point sin_cos example for 32-blockSize * Generated by the MATLAB randn() function * ------------------------------------------------------------------- */constfloat32_ttestInput_f32[MAX_BLOCKSIZE]={-1.244916875853235400,-4.793533929171324800,0.360705030233248850,0.827929644170887320,-3.299532218312426900,3.427441903227623800,3.422401784294607700,-0.108308165334010680,0.941943896490312180,0.502609575000365850,-0.537345278736373500,2.088817392965764500,-1.693168684143455700,6.283185307179590700,-0.392545884746175080,0.327893095115825040,3.070147440456292300,0.170611405884662230,-0.275275082396073010,-2.395492805446796300,0.847311163536506600,-3.845517018083148800,2.055818378415868300,4.672594161978930800,-1.990923030266425800,2.469305197656249500,3.609002606064021000,-4.586736582331667500,-4.147080139136136300,1.643756718868359500,-1.150866392366494800,1.985805026477433800};constfloat32_ttestRefOutput_f32=1.000000000;/* ---------------------------------------------------------------------- * Declare Global variables * ------------------------------------------------------------------- */uint32_tblockSize=32;float32_ttestOutput;float32_tcosOutput;float32_tsinOutput;float32_tcosSquareOutput;float32_tsinSquareOutput;/* ---------------------------------------------------------------------- * Max magnitude FFT Bin test * ------------------------------------------------------------------- */arm_statusstatus;int32_tmain(void){float32_tdiff;uint32_ti;printf("Starting Test...\n");for(i=0;i<blockSize;i++){cosOutput=arm_cos_f32(testInput_f32[i]);printf("Cos %f = %f\n",testInput_f32[i],cosOutput);sinOutput=arm_sin_f32(testInput_f32[i]);printf("Sin %f = %f\n",testInput_f32[i],sinOutput);arm_mult_f32(&cosOutput,&cosOutput,&cosSquareOutput,1);printf("Cos squared %f = %f\n",cosOutput,cosSquareOutput);arm_mult_f32(&sinOutput,&sinOutput,&sinSquareOutput,1);printf("Sin squared %f = %f\n",sinOutput,sinSquareOutput);arm_add_f32(&cosSquareOutput,&sinSquareOutput,&testOutput,1);printf("Add %f and %f = %f\n",cosSquareOutput,sinSquareOutput,testOutput);/* absolute value of difference between ref and test */diff=fabsf(testRefOutput_f32-testOutput);/* Comparison of sin_cos value with reference */if(diff>DELTA){printf("Diff failure %f\n",diff);exit(EXIT_FAILURE);/* just for QEMU testing */while(1);}}printf("Ending Test...\n");exit(EXIT_SUCCESS);/* just for QEMU testing */while(1);/* main function does not return */}
One of the issues that you run into using Clang/LLVM as your compiler for bare-metal ARM Cortex cores is you have to directly use arm-none-eabi-ld to do your linking.
Directly using ld can be a bit nerve wrecking at times to get the options correct (and the order of options does matter) as normally you are just let gcc use collect2 and have it internally execute ld to perform your linking.
One of the areas using it directly that can bite you is not linking to the proper libgcc.a for the Cortex-M that you are targeting. Looking into your arm-none-eabi/lib/gcc/arm-none-eabi/X.X.X tool-chain directory and you will find multiple directories. One for each ARM architecture; armv6-m, armv7-ar, armv7-m, thumb, thumb2, etc…
Add a library include for architecture directory that matches the core that you compiled against and everything will be fine:
I am working on a custom NEWLIB but first I wanted to make sure that NEWLIB compiled for ARM-NONE-EABI works out of the box with my ARM bare-metal Clang/LLVM build and Qemu.
Lets start with a simple main() that includes printf, puts and malloc. The first test is related to malloc, as if your linker script is not setting up your heap properly and providing the heap “end” address as defined in NEWLIB then not much else is going to work (i.e. printf uses malloc). If malloc works, then lets so some printfs including one with a random string. After that lets keep increasing the size of our mallocs till we run out of heap space.
#include <stdio.h> /* printf, scanf, NULL */#include <stdlib.h> /* malloc, free, rand */intmain(){externchar_heap_start;/* Defined by the linker from src/cortex_M3.ld */externchar_heap_end;/* Defined by the linker from src/cortex_M3.Ld. */inti,n;char*buffer;i=43;buffer=(char*)malloc(i);if(buffer==NULL){puts("Malloc failed\n");exit(1);}printf("Printf string\n");for(n=0;n<i;n++){buffer[n]=rand()%26+'a';}buffer[i]='\0';printf("Random string: %s\n",buffer);i=32;do{buffer=realloc(buffer,i);if(buffer==NULL){puts("Out of memory!\n");exit(1);}else{printf("%d bytes @ address 0x%X (Low=0x%X:Hi=0x%X)\n",i,(unsignedint)buffer,(unsignedint)&_heap_start,(unsignedint)&_heap_end);i=i+32;}}while(buffer!=NULL);exit(0);/* cause qemu to exit */return0;}
Easy enough, so lets create a linker script that is geared for a Cortex-M3, the main section to pay attention to in this example is .heap:
Ok, now that we have a linker script that defines our stack and heap properly, lets reuse our startup.c routine for the Cortex-M cores and compile it all with CLang/LLVM and link it with arm-none-eabi-ld:
I am not currently using a full IDE for my bare metal C coding on OS-X. Thus is mainly due to my usage of an ARM targeting Clang/LLVM build) since I am compiling to LLVM bitcode, piping to opts and than handing the resulting object files directly to arm-none-eabi-ld. Makefile creation is the only way to get this build pipeline working as no IDE on any OS is natively supporting using LLVM as a cross-compiler for bare metal ARM (yet!).
Thus that leaves me in a term window a lot, not that I mind, but gdb (arm-none-eabi-gdb) based debugging can be a pain when you are used to working with a fully intergated IDE (I dream of Visual Studio style bare metal debugging ;-) . The ‘layout asm’ and ‘layout src’ text-based gui of gdb does help a lot but till you learn all the commands and setup custom command-sets, productivity tends to suffer…
There are several GUI-based interfaces that can ease the pain of using gdb. Eclipse has the CDT debug perspective that provides a complete wrapper to gdb MI commands and ddd (Data Display Debugger) provides a frontend to many session based cmd-line debuggers, including gdb. But I figured I would give Affinic Debugger a quick try to see how it work.
Using Affinic Debugger for GDB does not completely shield you from gdb and you also have access to the gdb terminal so as you learn gdb commands you can type them vs. clicking your way throught the GUI.
You can use it as a gdb learning tool, as all the gui actions that involve gdb cmds are echo’d in the intergated terminal.
After you download and install it, you will need to set which gdb you are using to debug your target. I am using a version of arm-none-eabi-gdb that I built, so start the app and open the Preferences and change the “Set Debugger Path” entry to the gdb that you are using. Affinic Debugger will need to restart after that change.
Lets debug something!
Using the HelloWorld example from last time, let re-compile it with Clang/LLVM using “-g -O0” so we get the debug symbols (-g) and remove any code optimizations (-O0) so the generated assembly is easy to follow and allow breakpoints to be set with the source code (depending upon optimization level, your breakpoints might be limited in the source view):
Note: This is the same are if you were using gdb on the cmd-line. You can also use the Affinic menus to do this (Remote and File menus)
You will see the assembly and source tabs filed. At this point you can set breakpoints, step through your source/assembly code, view register values, etc…
So far I like the Affinic Debugger interface, but I guess time will tell if I buy the full version after the 30 day trail, use the limited light/free version or setup ddd and/or Eclipse on my MacBookPro…
What is semihosting?…Examples of these facilities include keyboard input, screen output, and disk I/O. For example, you can use this mechanism to enable functions in the C library, such as printf() and scanf(), to use the screen and keyboard of the host instead of having a screen and keyboard on the target system…
So you need to output some debug messages via your host debugging session (via JTAG or such) or working with QEMU to prototype some ARM code? Well semihosting is simple use, but it can come at a large price in memory and overhead if you use stdio to do it…
You can skip the “#include <stdio.h>” and linking the semihosting newlib library (assuming you have the syscalls inplementated) and just use some simple inline assembly to get the job done.
Lets take a quick look at two of the twenty-some service calls (SVC) that are available, SYS_WRITEC (0x03) and WRITE0 (0x04).
* SYS_WRITEC outputs a single character, an address pointer to that character is loaded in register R1. Register R0 is loaded with 0x03 and then you can execute a SuperVisor Call (SVC 0x00123456).
* SYS_WRITE0 outputs a null-term string, the string’s beginning address is stored in R1, R0 is loaded with 0x04 and you execute a supervisor call again.
If we translate that knowledge into inline assembly:
main.c
1234567891011121314151617181920212223242526272829
voidmain(){intSYS_WRITEC=0x03;intSYS_WRITE0=0x04;registerintreg0asm("r0");registerintreg1asm("r1");charoutchar='_';// A 'NOP' so we can 'see' the start of the folllowing svc callasmvolatile("mov r0,r0");outchar='!';reg0=SYS_WRITEC;reg1=(int)&outchar;asm("svc 0x00123456");// A 'NOP' so we can 'see' the start of the folllowing svc callasmvolatile("mov r0,r0");reg0=SYS_WRITEC;outchar='\n';reg1=(int)&outchar;asm("svc 0x00123456");// A 'NOP' so we can 'see' the start of the folllowing svc callasmvolatile("mov r0, r0");reg0=SYS_WRITE0;reg1=(int)&"Print this to my jtag debugger\n";asm("svc 0x00123456");}
Note: This is not pretty inline styling as it is meant to break each step down. Normally you would create a couple of functions (i.e: a ‘PutChar’ for SYS_WRITEC) and include the R0/R1 clobbers, etc…
And the output that we get:
123
qemu-system-arm -nographic -monitor null -serial null -semihosting -kernel main.axf
!
Print this to my jtag debugger
PS: SYS_TMPNAM and SYS_READC are not implemented in Qemu (up to and including 1.7.0), so consult the “qemu/target-arm/arm-semi.c” source if you are have questions about how those SVC calls are implemented.
In my last post I did a very basic comparsion of ARM code generation between LLVM and GCC compilers and testing the AXF in Qemu. The stand out difference was LLVM produced a *.ARM.exidx** section in the AXF/ELF while arm-gcc did not. The code is very simple, one .s and one .c file, no .cpp/.h involved.
So what is a .ARM.exidx section?
ARM ELF manual show this under the special sections chapter:
Names beginning .ARM.exidx name sections containing index entries for section unwinding. Names beginning .ARM.extab name sections containing exception unwinding information. See [EHABI] for details.
Table 4_4 from that manual shows the Processor specific section types and our attribute is:
So the question remains, what is in the section and what is being created? Lets dump HelloWorldSimple.o and only look at that section:
1234567
Relocation section '.rel.ARM.exidx' at offset 0x580 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000b2a R_ARM_PREL31 00000000 .text
00000008 00000b2a R_ARM_PREL31 00000000 .text
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x1 [cantunwind]
0x54 <c_entry>: 0x1 [cantunwind]
So it added both function calls to the table, but are marked cantunwind, which makes sense, but since nothing in the section can be unwound, why include the section? Using gc-sections during linking does not remove it as it has references to functions that are being used…
Let do a quick test and add -funwind-tables, recompile and yes we get a fully populated unwind table and using -fno-unwind-tables produces the results from above, so that is the default one that is being use. Research is on-going on this one…
12345678910111213141516
Relocation section '.rel.ARM.exidx' at offset 0x5a4 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000b2a R_ARM_PREL31 00000000 .text
00000000 00001600 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
00000008 00000b2a R_ARM_PREL31 00000000 .text
00000008 00001600 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x8001b0b0
Compact model index: 0
0x01 vsp = vsp + 8
0xb0 finish
0xb0 finish
0x54 <c_entry>: 0x809b8480
Compact model index: 0
0x9b vsp = r11
0x84 0x80 pop {r11, r14}
The patch below implements linker processing of ARM unwinding tables
(SHT_ARM_EXIDX).
ARM exception index tables only define the start address of each region. This
means that code with no unwinding information is effectively covered by the
preceding unwinding table entry.
For normal exceptions that doesn’t matter so much - the user should ensure
that any code they throw exceptions through has proper unwinding information.
Just as a quick check, I grep’d some source and the *.ARM.exidx** section is generated by the ARMELFStreamer:
http://llvm.org/docs/doxygen/html/Support_2ELF_8h_source.html01145 // Fixme: All this is duplicated in MCSectionELF. Why??
01146 // Exception Index table
01147 SHT_ARM_EXIDX = 0x70000001U,
With the ever maturing and stable ARM backend of LLVM it is hard to find information using it vs. the well known ARM-GCC release.
So lets start with the most simple HelloWorld example and compare LLVM and ARM-GCC.
Balau’s post is a popular one showing an ARM bare metal Hello World and test using QEMU, so lets start with that one. First, lets reproduce the compile/link steps to make sure it works:
Works just fine, so lets reproduce that using my LLVM bare metal build. All the compiler options are being shown even though some are defaulted in my build of LLVM so you can see everything it is required to get the LLVM bitcode conversion to produce a valid object file for our ARM target (I’m using the Clang driver, but you can use LLVM and pipe bitcode through the various tools so you can deeply control the optimization phase):
target : Option providing the triple that you are ‘targeting’
mpcu : Option provding the ARM core that will be flashed
mfloat-abi : Soft or Hard depending upon if your ARM core has an FPU implementation on it. Cores that can support an FPU does not mean your vendor’s core has one, comes down to features/price of the core.
Note: In both, I am turning off the optimizers via the compile drivers.
Lets look at the size of the AXF (ARM Executable Format) produced by:
12345
text data bss dec hex filename
140 0 0 140 8c bin/HelloWorldSimple.axf_gcc
text data bss dec hex filename
150 0 0 150 96 bin/HelloWorldSimple.axf
There is a 10 byte difference, interesting… lets look at that a little more:
llvm
arm-gcc
section
size
addr
section
size
addr
.startup
16
65536
.startup
16
65536
.text
108
65552
.text
104
65552
.ARM.exidx
8
65660
.rodata
4
65668
.rodata
20
65656
.rodata.str1.1
14
65672
.ARM.attributes
40
0
.ARM.attributes
46
0
.comment
19
0
.comment
112
0
Total
209
Total
298
Note: I ran strip on the arm-gcc version to remove the empty debug sections that gcc inserts automatically
The .startup are the same size since this code is assembly and no codegen or optimization will happen there.
It is interesting that LLVM inserts a .ARM.exidx section even though this is only .c code. I’ll have to look at LLVM to see if -funwind-tables and/or -fexceptions are defaulted to on, but I disassemble it below so we can look at that as that is 8 bytes and accounts for the size difference in this really basic example.
.ARM.exidx is the section containing information for unwinding the stack
Note: Understanding the ARM ELF format is not really required to do bare metal programming, but, understanding how your code is allocated and loaded can maek a world of differences when you are writting linker definitions files for different cores, so send a few minutes and read the 46 pages :-)
First the gcc disassembly so we can compare the LLVM version to it:
We can ignore the _Reset section as that is hand coded assembly and the same for both.
The c_entry is interesting as LLVM uses a move to copy the stack register to fp (r11 = frame pointer) which I what I would do, but arm-gcc does an “"add”“ to get fp into the sp and does that by adding fp to register #4(?) This is flagged as general variable for gcc… I am slightly confused by gcc’s choice to do that, now that question is when would #4 not contain zero? The rest of this function is the same between the two compilers.
The print_uart0 function is a hack function as it does not implement FIFO/flow-control to an actual UART, but in this case it points to a memory address where the discontinued ARM Versatile PB dev-board does have a UART and QEMU board simulation echos those writes. I am not going to do a line by line comparision of the generated code as for un-optimized code they are both getting the job done, but in slightly different ways in almost the same number of instructions.
So we are able to produce a working bare metal ARM AXF from LLVM and next time, I will spend a little time on compiler optimizations to see how the two code generators/optimizisers compare…
I tend to stay on the cmd line as much as possible, but for visual diffs, an ncurses console diff tool just does not cut it for me. Beyond Compare Pro by Scooter on Windows is one of best that I have ever used and with licenses at work I never had to worry about not having it on a work desktop or laptop.
But on OS-X at home, Beyond Compare was not available. There is a 4.0 release in the works (beta now), but $50.00 USD for a personal-use copy on OS-X and not having feature parity with Windows Pro features(?), I just can not pull the trigger on that purchuse when there are other
(cheaper) options that work just as well for personal development.
So, normally for a free visual diff, you can not beat meld, it is a great open-source tool, but on OS-X it fires up X (Quartz for me) and it getting long in the tooth in terms of the GUI’s human factors (feature set is still great). If there was a Qt version of this, the search would be over… free or not!
So some searching landed me on an old post by Todd Huss about using DiffMerge as your visual diff/merge for git and it was actually what I was looking for, well almost ;-) It is missing a few features, but they have a free version and it works really well and has a great OS-X interface… search is over for now…
SourceGear has a $19.00 USD version that include file export with HTML formatting and if I could see example HTML code that it produces, I pay for that feature in a heart-beat, but the feature is completely locked out till you actaully register, bummer…
Todd recommends using the DiffMerge installer version vs. the dmg version, I go the other way on that. Download the dmg version, open it and drag/drop the app to your Applications. Then in a term window you can copy the Extras/diffmerge.sh to your /usr/local/bin directory (Execute attrib is already set, so no chmod needed..), but I copied it as just vdiff as that is quicker to type. No admin rights are need to install it that way and that makes me happy… I can vdiff file1.c file2.c on the cmd line to pop the GUI open and populate it.
I then used the git setup he has listed and everything is working great so far. Click on the image above to it comparing the disassembly of LLVM vs. GCC code generation for bare metal ARM development.
# diff the local file.m against the checked-in versiongit difftool file.m
# diff the local file.m against the version in some-feature-branchgit difftool some-feature-branch file.m
# diff the file.m from the Build-54 tag to the Build-55 taggit difftool Build-54..Build-55 file.m
#To resolve merge conflicts, just run git mergetool:git mergetool
I am building a bare-metal ARM Clang/LLVM cross-compiler for my Arm Cortex-M LLVM vs. arm-gcc experiments and was looking for the complete ARM core list available.
LLVM makes it soooo easy to get that information from the LLVM static compiler binary (llc), just pass it a generic ARM triple, here is the lsit from my build: