Cortex-M0 & M3 SysTick: Polling vs. Interrupt driven

Mar 9th, 2014 7:24 pm

This time around, lets use the CMSIS abstraction layer to access the SysTick core peripheral.

This peripheral can be used to provide the core timer to an embedded RTOS kernel, such as FreeRTOS, or to provide application timing events to know when to read some attached sensors or such. In the most basic form, it provides a pollable countdown value. This value is decreased from a user settable value (Reload Value) on every clock tick. If it configured as an interrupt, the function assigned activates every n+1 clock ticks.

I used Clang/LLVM to compile a simple app that shows you how to set the reload value, read (poll) the internal SysTick value or enable it as an interrupt.

The semihosting output of this app (via QEMU):

qemu-system-arm -M lm3s811evb -cpu cortex-m3 -semihosting -kernel  bin/main.axf
SysTick should not be active yet...
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
...Current value: 0
Enable SysTick and lets poll it...
...Current value: 6913
...Current value: 2825
...Current value: 2450
...Current value: 2138
...Current value: 1825
...Current value: 1525
...Current value: 1225
...Current value: 913
...Current value: 613
...Current value: 313
Enable SysTick Interrupts and watch local var get incremented...
...myTicks = 1; SysTick->VAL = 0
...myTicks = 2; SysTick->VAL = 3425
...myTicks = 3; SysTick->VAL = 8725
...myTicks = 4; SysTick->VAL = 2938
...myTicks = 5; SysTick->VAL = 8113
...myTicks = 6; SysTick->VAL = 2550
...myTicks = 7; SysTick->VAL = 7725
...myTicks = 8; SysTick->VAL = 2938
...myTicks = 9; SysTick->VAL = 8125
...myTicks = 10; SysTick->VAL = 2563
...myTicks = 11; SysTick->VAL = 8100
...myTicks = 12; SysTick->VAL = 3038

#include <stdlib.h>

#include "CortexM3_xx.h"
#include <core_cm3.h>
#include <stdint.h> 
#include "svc.h"

volatile uint32_t myTicks;

void SysTick_Handler(void) {
  myTicks++;
  printf("...myTicks = %lu; SysTick->VAL = %lu\n", myTicks, SysTick->VAL);
}

int main(void) {
  printf("SysTick should not be active yet...\n");
  for (int x=0; x<10; x++) {
      printf("...Current value: %lu\n", SysTick->VAL);
  }
  printf("Enable SysTick and lets poll it...\n");
  
  volatile uint32_t clock = 10000;
  SysTick->LOAD = clock - 1;
  /*
     * SysTick_CTRL_CLKSOURCE_Msk : Use core's clock
     * SysTick_CTRL_ENABLE_Msk    : Enable SysTick
     * SysTick_CTRL_TICKINT_Msk   : Active the SysTick interrupt on the NVIC
 */
  SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk | SysTick_CTRL_ENABLE_Msk;
  for (int x=0; x<10; x++) {
      printf("...Current value: %lu\n", SysTick->VAL);
  }

  printf("Enable SysTick Interrupts and watch local var get incremented...\n");
  myTicks = 0;
  SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk |  SysTick_CTRL_ENABLE_Msk | SysTick_CTRL_TICKINT_Msk;
  
  while(myTicks <= 10) {
      asm("nop"); // Do nothing till SysTick_Handler been been called at least 10 times
  }
  exit(0);
}

LLVM, CMSIS DSP and Cortex-M3 & M0

Mar 5th, 2014 10:34 pm

I added ARM’s CMSIS 3.01 to my LLVM project and wanted to test out the pre-compiled DSP libraries that are supplied.

I borrowed one of the cos/sin examples and added some semihosting printfs using NEWLIB and cleaned up the code a bit.

CMSIS-DSP: DSP Library Collection with over 60 Functions for various data types: fix-point (fractional q7, q15, q31) and single precision floating-point (32-bit). The library is available for Cortex-M0, Cortex-M3, and Cortex-M4. The Cortex-M4 implementation is optimized for the SIMD instruction set.

Updating my Makefile to include the correct CMSIS libraries (arm_cortexM3l_math) for the ld and the currect headers for Clang/LLVM and the result works great for Cortex-M3. I copied the project over and mod’d the Makefile so it picks up the correct Cortex-M0 lib (arm_cortexM0l_math) and everything looks on this core also.

Clang/LLVM compile and link:

clang -Os  -nostdlib -ffreestanding   -target arm-none-eabi  -mcpu=cortex-m0   -mfloat-abi=soft  -mthumb  -DARM_MATH_CM3 -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/CMSIS/Include   -o obj/arm_sin_cos_example_f32.o -c src/arm_sin_cos_example_f32.c
clang -Os  -nostdlib -ffreestanding   -target arm-none-eabi  -mcpu=cortex-m0   -mfloat-abi=soft  -mthumb  -DARM_MATH_CM3 -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/CMSIS/Include   -o obj/startup.o -c src/startup.c
arm-none-eabi-ld -nostartfiles   -nostdlib -nostartupfiles  --gc-sections  --print-gc-sections  -Map bin/main.axf.map  -T src/cortex_M0.ld  --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb/thumb2 --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/lib/gcc/arm-none-eabi/4.8.3/armv7-m --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/CMSIS/LIB/GCC    obj/arm_sin_cos_example_f32.o obj/startup.o --start-group --library=gcc --library=c --library=m --library=arm_cortexM0l_math --end-group -o bin/main.axf

Sample semihousting output from a Cortex-M3:

qemu-system-arm -cpu cortex-m3  -semihosting -nographic -kernel  bin/main.axf
Starting Test...
Cos -1.244917 = 0.320142
Sin -1.244917 = -0.947370
Cos squared 0.320142 = 0.102491
Sin squared -0.947370 = 0.897509
Add 0.102491 and 0.897509 = 1.000000
Cos -4.793534 = 0.081056
Sin -4.793534 = 0.996710
Cos squared 0.081056 = 0.006570
Sin squared 0.996710 = 0.993430
Add 0.006570 and 0.993430 = 1.000000
...
...
Cos 1.985805 = -0.403198
Sin 1.985805 = 0.915113
Cos squared -0.403198 = 0.162568
Sin squared 0.915113 = 0.837431
Add 0.162568 and 0.837431 = 1.000000
Ending Test...

https://github.com/sushihangover/llvm_baremetal

#include <stdlib.h> 
#include <math.h>     
#include "arm_math.h" 

/* ---------------------------------------------------------------------- 
* Defines each of the tests performed 
* ------------------------------------------------------------------- */
#define MAX_BLOCKSIZE    32 
#define DELTA           (0.000001f) 


/* ---------------------------------------------------------------------- 
* Test input data for Floating point sin_cos example for 32-blockSize 
* Generated by the MATLAB randn() function 
* ------------------------------------------------------------------- */

const float32_t testInput_f32[MAX_BLOCKSIZE] =
{
 -1.244916875853235400, -4.793533929171324800,  0.360705030233248850,
  0.827929644170887320, -3.299532218312426900,  3.427441903227623800, 
  3.422401784294607700,    -0.108308165334010680,  0.941943896490312180,
  0.502609575000365850,    -0.537345278736373500,  2.088817392965764500,
 -1.693168684143455700,  6.283185307179590700, -0.392545884746175080,
  0.327893095115825040,     3.070147440456292300,  0.170611405884662230,
 -0.275275082396073010, -2.395492805446796300,  0.847311163536506600,
 -3.845517018083148800,  2.055818378415868300,  4.672594161978930800,
 -1.990923030266425800,  2.469305197656249500,  3.609002606064021000,
 -4.586736582331667500, -4.147080139136136300,  1.643756718868359500,
 -1.150866392366494800,  1.985805026477433800
};

const float32_t testRefOutput_f32 = 1.000000000;

/* ---------------------------------------------------------------------- 
* Declare Global variables  
* ------------------------------------------------------------------- */
uint32_t blockSize = 32;
float32_t  testOutput;
float32_t  cosOutput;
float32_t  sinOutput;
float32_t  cosSquareOutput;
float32_t  sinSquareOutput;

/* ---------------------------------------------------------------------- 
* Max magnitude FFT Bin test 
* ------------------------------------------------------------------- */

arm_status status;

int32_t main(void)
{
   float32_t diff;
   uint32_t i;

   printf("Starting Test...\n");
   for (i=0; i < blockSize; i++)
   {
      cosOutput = arm_cos_f32(testInput_f32[i]);
      printf("Cos %f = %f\n", testInput_f32[i], cosOutput);

      sinOutput = arm_sin_f32(testInput_f32[i]);
      printf("Sin %f = %f\n", testInput_f32[i], sinOutput);

      arm_mult_f32(&cosOutput, &cosOutput, &cosSquareOutput, 1);
      printf("Cos squared %f = %f\n", cosOutput, cosSquareOutput);

      arm_mult_f32(&sinOutput, &sinOutput, &sinSquareOutput, 1);
      printf("Sin squared %f = %f\n", sinOutput, sinSquareOutput);

      arm_add_f32(&cosSquareOutput, &sinSquareOutput, &testOutput, 1);
      printf("Add %f and %f = %f\n", cosSquareOutput, sinSquareOutput, testOutput);

      /* absolute value of difference between ref and test */
      diff = fabsf(testRefOutput_f32 - testOutput);
      /* Comparison of sin_cos value with reference */
      if (diff > DELTA)
      {
         printf("Diff failure %f\n", diff);
         exit(EXIT_FAILURE); /* just for QEMU testing */
         while(1);
      }
   }
   printf("Ending Test...\n");
   exit(EXIT_SUCCESS); /* just for QEMU testing */
   while(1); /* main function does not return */
}

Cortex-M0 vs. M3 : LLVM and LD

Mar 5th, 2014 6:30 am

One of the issues that you run into using Clang/LLVM as your compiler for bare-metal ARM Cortex cores is you have to directly use arm-none-eabi-ld to do your linking.

Directly using ld can be a bit nerve wrecking at times to get the options correct (and the order of options does matter) as normally you are just let gcc use collect2 and have it internally execute ld to perform your linking.

One of the areas using it directly that can bite you is not linking to the proper libgcc.a for the Cortex-M that you are targeting. Looking into your arm-none-eabi/lib/gcc/arm-none-eabi/X.X.X tool-chain directory and you will find multiple directories. One for each ARM architecture; armv6-m, armv7-ar, armv7-m, thumb, thumb2, etc…

Add a library include for architecture directory that matches the core that you compiled against and everything will be fine:

Cortex M3 example:

arm-none-eabi-ld -Map bin/main.axf.map -T src/cortex_M3.ld --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb/thumb2 --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb  --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib  --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/lib/gcc/arm-none-eabi/4.8.3/armv7-m -g   obj/printf_with_malloc.o obj/startup.o --start-group -lgcc -lc --end-group -o bin/main.axf

Cortex M0+ example:

arm-none-eabi-ld -Map bin/main.axf.map -T src/cortex_M0.ld --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb/thumb2 --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/lib/gcc/arm-none-eabi/4.8.3/armv6-m  --gc-sections --print-gc-sections  obj/printf_with_malloc.o obj/startup.o --start-group -lgcc -lc --end-group -o bin/main.axf

ARM Cortex-M instruction sets

ARM Cortex-M	Thumb	Thumb-2	Hardware multiply	Hardware divide	Saturated math	DSP extensions	Floating-point	ARM architecture	Core architecture
Cortex-M0^[1]	Most	Subset	1 or 32 cycle	No	No	No	No	ARMv6-M^[6]	Von Neumann
Cortex-M0+^[2]	Most	Subset	1 or 32 cycle	No	No	No	No	ARMv6-M^[6]	Von Neumann
Cortex-M1^[3]	Most	Subset	3 or 33 cycle	No	No	No	No	ARMv6-M^[6]	Von Neumann
Cortex-M3^[4]	Entire	Entire	1 cycle	Yes	Yes	No	No	ARMv7-M^[7]	Harvard
Cortex-M4^[5]	Entire	Entire	1 cycle	Yes	Yes	Yes	Optional	ARMv7E-M^[7]	Harvard

ARM Cortex-M3 Bare-metal with NEWLIB

Mar 4th, 2014 6:18 pm

I am working on a custom NEWLIB but first I wanted to make sure that NEWLIB compiled for ARM-NONE-EABI works out of the box with my ARM bare-metal Clang/LLVM build and Qemu.

Lets start with a simple main() that includes printf, puts and malloc. The first test is related to malloc, as if your linker script is not setting up your heap properly and providing the heap “end” address as defined in NEWLIB then not much else is going to work (i.e. printf uses malloc). If malloc works, then lets so some printfs including one with a random string. After that lets keep increasing the size of our mallocs till we run out of heap space.

#include <stdio.h>      /* printf, scanf, NULL */
#include <stdlib.h>     /* malloc, free, rand */

int main ()
{
  extern char _heap_start; /* Defined by the linker from src/cortex_M3.ld */
  extern char _heap_end; /* Defined by the linker from src/cortex_M3.Ld. */
  int i,n;
  char * buffer;

  i = 43;
  buffer = (char*) malloc (i);
  if (buffer==NULL)
  {
     puts ("Malloc failed\n");
     exit (1);
  }

  printf ("Printf string\n");
  for (n=0; n<i; n++)
  {
    buffer[n]=rand()%26+'a';
  }
  buffer[i]='\0';
  printf ("Random string: %s\n",buffer);

  i = 32;
  do
  {
     buffer = realloc(buffer, i);
     if (buffer == NULL)
     {
        puts("Out of memory!\n");
        exit (1);
     } else {
        printf("%d bytes @ address 0x%X (Low=0x%X:Hi=0x%X)\n",
           i,
           (unsigned int)buffer,
           (unsigned int)&_heap_start,
           (unsigned int)&_heap_end
       );
       i = i + 32;
     }
  } while (buffer != NULL);

  exit(0); /* cause qemu to exit */
  return 0;
}

Easy enough, so lets create a linker script that is geared for a Cortex-M3, the main section to pay attention to in this example is .heap:

OUTPUT_FORMAT ("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")

ENTRY(Reset_Handler)

/* Specify the memory areas */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x00000000, LENGTH = 0x10000 /* 64K */
  RAM (xrw)       : ORIGIN = 0x00020000, LENGTH = 0x04000 /* 16K */
}

heap_size = 0x800; /* 2K */

SECTIONS {
    . = ORIGIN(FLASH);

    .vectors :
    {
        . = ALIGN(4);
        KEEP(*(.vectors)) /* Startup code */
        . = ALIGN(4);
    } >FLASH

    .text :
    {
        . = ALIGN(4);
        _start_text = .;
        *(.text)
        *(.text*)
        *(.rodata)
        *(.rodata*)
        _end_text = .;
    } >FLASH

        .ARM.extab :
        {
                *(.ARM.extab* .gnu.linkonce.armextab.*)
        } > FLASH

        __exidx_start = .;
        .ARM.exidx :
        {
                *(.ARM.exidx* .gnu.linkonce.armexidx.*)
        } > FLASH
        __exidx_end = .;

    _end_text = .;

    .data : AT (_end_text)
    {
        _start_data = .;
        *(.data)
        *(.data*)
        . = ALIGN(4);
        _end_data = .;
    } >RAM

    .bss :
    {
         . = ALIGN(4);
        _start_bss = .;
        *(.bss)
        *(.bss*)
        *(COMMON)
        . = ALIGN(4);
        _end_bss = .;
    } >RAM

    . = ALIGN(4);
    .heap :
    {
        __end__ = .;
        /* _heap_start = .; */
        /* "end" is used by newlib's syscalls!!! */
        PROVIDE(end = .);
        PROVIDE(_heap_start = end );
        . = . + heap_size;
        PROVIDE(_heap_end = .);
    } >RAM

    .ARM.attributes 0 : { *(.ARM.attributes) }

    .stack_dummy (COPY):
    {
        _end_stack = .;
        *(.stack*)
    } > RAM

    /* Set stack top to end of RAM, and stack limit move down by
     * size of stack_dummy section */
    _start_stack = ORIGIN(RAM) + LENGTH(RAM);
    _size_stack = _start_stack - SIZEOF(.stack_dummy);
    PROVIDE(__stack = _start_stack);

    /* Check if data + heap + stack exceeds RAM limit */
    ASSERT(_size_stack >= _heap_end, "region RAM overflowed with stack")
}
_end = .;

Ok, now that we have a linker script that defines our stack and heap properly, lets reuse our startup.c routine for the Cortex-M cores and compile it all with CLang/LLVM and link it with arm-none-eabi-ld:

clang -g -nostdlib -ffreestanding  -O0  -target arm-none-eabi -mcpu=cortex-m3  -mfloat-abi=soft -mthumb -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/arm-none-eabi/include  -o obj/printf_with_malloc.o -c src/printf_with_malloc.c
clang -g -nostdlib -ffreestanding  -O0  -target arm-none-eabi -mcpu=cortex-m3  -mfloat-abi=soft -mthumb -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/include -I/Users/administrator/Code/llvm_superproject/install/arm-none-eabi/arm-none-eabi/include  -o obj/startup.o -c src/startup.c
arm-none-eabi-ld -Map bin/main.axf.map -T src/cortex_M3.ld --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb/thumb2 --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib/thumb  --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/newlib-syscalls/arm-none-eabi/lib  --library-path /Users/administrator/Code/llvm_superproject/install/arm-none-eabi/lib/gcc/arm-none-eabi/4.8.3/thumb -g   obj/printf_with_malloc.o obj/startup.o --start-group -lgcc -lc --end-group -o bin/main.axf

And now we can run a simulation of it with QEMU:

qemu-system-arm -cpu cortex-m3  -semihosting -nographic -kernel  bin/main.axf
Puts string
Printf string
Random string: lvqdyoqykfdbxnqdquhydjaeebzqmtblcabwgmscrno
bytes @ address 0x209C0 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
bytes @ address 0x20DF8 (Low=0x209B4:Hi=0x211B4)
Out of memory!

Bare metal debugging with Affinic Debugger

Feb 26th, 2014 11:19 pm

I am not currently using a full IDE for my bare metal C coding on OS-X. Thus is mainly due to my usage of an ARM targeting Clang/LLVM build) since I am compiling to LLVM bitcode, piping to opts and than handing the resulting object files directly to arm-none-eabi-ld. Makefile creation is the only way to get this build pipeline working as no IDE on any OS is natively supporting using LLVM as a cross-compiler for bare metal ARM (yet!).

Thus that leaves me in a term window a lot, not that I mind, but gdb (arm-none-eabi-gdb) based debugging can be a pain when you are used to working with a fully intergated IDE (I dream of Visual Studio style bare metal debugging ;-) . The ‘layout asm’ and ‘layout src’ text-based gui of gdb does help a lot but till you learn all the commands and setup custom command-sets, productivity tends to suffer…

There are several GUI-based interfaces that can ease the pain of using gdb. Eclipse has the CDT debug perspective that provides a complete wrapper to gdb MI commands and ddd (Data Display Debugger) provides a frontend to many session based cmd-line debuggers, including gdb. But I figured I would give Affinic Debugger a quick try to see how it work.

Using Affinic Debugger for GDB does not completely shield you from gdb and you also have access to the gdb terminal so as you learn gdb commands you can type them vs. clicking your way throught the GUI.

You can use it as a gdb learning tool, as all the gui actions that involve gdb cmds are echo’d in the intergated terminal.

After you download and install it, you will need to set which gdb you are using to debug your target. I am using a version of arm-none-eabi-gdb that I built, so start the app and open the Preferences and change the “Set Debugger Path” entry to the gdb that you are using. Affinic Debugger will need to restart after that change.

Lets debug something!

Using the HelloWorld example from last time, let re-compile it with Clang/LLVM using “-g -O0” so we get the debug symbols (-g) and remove any code optimizations (-O0) so the generated assembly is easy to follow and allow breakpoints to be set with the source code (depending upon optimization level, your breakpoints might be limited in the source view):

clang -g -O0 -target arm-none-eabi -mcpu=arm926ej-s -mfloat-abi=soft -o obj/startup.o -c src/startup.s
clang -g -O0 -target arm-none-eabi -mcpu=arm926ej-s -mfloat-abi=soft -o obj/HelloWorldSimple.o -c src/HelloWorldSimple.c
arm-none-eabi-ld -Lobj --gc-sections --print-gc-sections  -T src/HelloWorldSimple.ld obj/startup.o obj/HelloWorldSimple.o -o bin/HelloWorldSimple.axf
arm-none-eabi-size bin/HelloWorldSimple.axf

Lets startup QEMU as we will use it as our remote gdb debugging target.

qemu-system-arm -M versatilepb -m 128M -nographic -kernel  bin/HelloWorldSimple.axf -s -S

Note: We are using the two following additional options in order to remotely debug our HelloWorldSimple.axf program:

* -s shorthand for -gdb tcp::1234

* -S freeze CPU at startup

Now start Affinic and connect to the QEMU gdb remote debugging server that is running. Enter the following into the “Command:” text field:

target remote localhost:1234
file bin/HelloWorldSimple.axf

Note: This is the same are if you were using gdb on the cmd-line. You can also use the Affinic menus to do this (Remote and File menus)

You will see the assembly and source tabs filed. At this point you can set breakpoints, step through your source/assembly code, view register values, etc…

So far I like the Affinic Debugger interface, but I guess time will tell if I buy the full version after the 30 day trail, use the limited light/free version or setup ddd and/or Eclipse on my MacBookPro…

ARM Cortex-M Semihosting

Feb 24th, 2014 9:02 pm

What is semihosting? …Examples of these facilities include keyboard input, screen output, and disk I/O. For example, you can use this mechanism to enable functions in the C library, such as printf() and scanf(), to use the screen and keyboard of the host instead of having a screen and keyboard on the target system…

So you need to output some debug messages via your host debugging session (via JTAG or such) or working with QEMU to prototype some ARM code? Well semihosting is simple use, but it can come at a large price in memory and overhead if you use stdio to do it…

You can skip the “#include <stdio.h>” and linking the semihosting newlib library (assuming you have the syscalls inplementated) and just use some simple inline assembly to get the job done.

Lets take a quick look at two of the twenty-some service calls (SVC) that are available, SYS_WRITEC (0x03) and WRITE0 (0x04).

* SYS_WRITEC outputs a single character, an address pointer to that character is loaded in register R1. Register R0 is loaded with 0x03 and then you can execute a SuperVisor Call (SVC 0x00123456).

* SYS_WRITE0 outputs a null-term string, the string’s beginning address is stored in R1, R0 is loaded with 0x04 and you execute a supervisor call again.

If we translate that knowledge into inline assembly:

main.c

void main() {
  int SYS_WRITEC = 0x03;
  int SYS_WRITE0 = 0x04;
  register int reg0 asm("r0");
  register int reg1 asm("r1");
  char outchar = '_';

  // A 'NOP' so we can 'see' the start of the folllowing svc call
  asm volatile("mov r0,r0");

  outchar = '!';
  reg0 = SYS_WRITEC;
  reg1 = (int)&outchar;
  asm("svc 0x00123456");

  // A 'NOP' so we can 'see' the start of the folllowing svc call
  asm volatile("mov r0,r0");
  reg0 = SYS_WRITEC;
  outchar = '\n';
  reg1 = (int)&outchar;
  asm("svc 0x00123456");

  // A 'NOP' so we can 'see' the start of the folllowing svc call
  asm volatile("mov r0, r0");

  reg0 = SYS_WRITE0;
  reg1 = (int)&"Print this to my jtag debugger\n";
  asm("svc 0x00123456");
}

Note: This is not pretty inline styling as it is meant to break each step down. Normally you would create a couple of functions (i.e: a ‘PutChar’ for SYS_WRITEC) and include the R0/R1 clobbers, etc…

And the output that we get:

qemu-system-arm -nographic -monitor null -serial null -semihosting -kernel main.axf 
!
Print this to my jtag debugger

main.o: file format elf32-littlearm

00000000 <main>:
 e52db004     push {fp}       ; (str fp, [sp, #-4]!)
 e28db000     add  fp, sp, #0
 e24dd014     sub  sp, sp, #20
   c:  e3a03003     mov  r3, #3
 e50b3008     str  r3, [fp, #-8]
 e3a03004     mov  r3, #4
 e50b300c     str  r3, [fp, #-12]
  1c:  e3a0305f     mov  r3, #95  ; 0x5f
 e54b300d     strb r3, [fp, #-13]
 e1a00000     nop          ; (mov r0, r0)
 e3a03021     mov  r3, #33  ; 0x21
  2c:  e54b300d     strb r3, [fp, #-13]
 e51b0008     ldr  r0, [fp, #-8]
 e24b300d     sub  r3, fp, #13
 e1a01003     mov  r1, r3
  3c:  ef123456     svc  0x00123456
 e1a00000     nop          ; (mov r0, r0)
 e51b0008     ldr  r0, [fp, #-8]
 e3a0300a     mov  r3, #10
  4c:  e54b300d     strb r3, [fp, #-13]
 e24b300d     sub  r3, fp, #13
 e1a01003     mov  r1, r3
 ef123456     svc  0x00123456
  5c:  e1a00000     nop          ; (mov r0, r0)
 e51b000c     ldr  r0, [fp, #-12]
 e59f3010     ldr  r3, [pc, #16]    ; 7c <main+0x7c>
 e1a01003     mov  r1, r3
  6c:  ef123456     svc  0x00123456
 e28bd000     add  sp, fp, #0
 e8bd0800     ldmfd    sp!, {fp}
 e12fff1e     bx   lr
  7c:  00000000    .word 0x00000000

PS: SYS_TMPNAM and SYS_READC are not implemented in Qemu (up to and including 1.7.0), so consult the “qemu/target-arm/arm-semi.c” source if you are have questions about how those SVC calls are implemented.

LLVM and the ARM ELF .ARM.exidx* section

Feb 23rd, 2014 6:39 am

In my last post I did a very basic comparsion of ARM code generation between LLVM and GCC compilers and testing the AXF in Qemu. The stand out difference was LLVM produced a *.ARM.exidx** section in the AXF/ELF while arm-gcc did not. The code is very simple, one .s and one .c file, no .cpp/.h involved.

So what is a .ARM.exidx section?

ARM ELF manual show this under the special sections chapter:

Names beginning .ARM.exidx name sections containing index entries for section unwinding. Names beginning .ARM.extab name sections containing exception unwinding information. See [EHABI] for details.

Table 4_4 from that manual shows the Processor specific section types and our attribute is:

| Name | Value | Comment | | - | - | - | | “SHT_ARM_EXIDX” | 0x70000001 | | _

So the question remains, what is in the section and what is being created? Lets dump HelloWorldSimple.o and only look at that section:

Relocation section '.rel.ARM.exidx' at offset 0x580 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000000  00000b2a R_ARM_PREL31      00000000   .text
00000008  00000b2a R_ARM_PREL31      00000000   .text
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x1 [cantunwind]
0x54 <c_entry>: 0x1 [cantunwind]

So it added both function calls to the table, but are marked cantunwind, which makes sense, but since nothing in the section can be unwound, why include the section? Using gc-sections during linking does not remove it as it has references to functions that are being used…

Let do a quick test and add -funwind-tables, recompile and yes we get a fully populated unwind table and using -fno-unwind-tables produces the results from above, so that is the default one that is being use. Research is on-going on this one…

Relocation section '.rel.ARM.exidx' at offset 0x5a4 contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000000  00000b2a R_ARM_PREL31      00000000   .text
00000000  00001600 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
00000008  00000b2a R_ARM_PREL31      00000000   .text
00000008  00001600 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x8001b0b0
  Compact model index: 0
  0x01      vsp = vsp + 8
  0xb0      finish
  0xb0      finish
0x54 <c_entry>: 0x809b8480
  Compact model index: 0
  0x9b      vsp = r11
  0x84 0x80 pop {r11, r14}

Additional Reading: ARM unwind table linker processing; this concerns a patch to bintutils/ld:

The patch below implements linker processing of ARM unwinding tables (SHT_ARM_EXIDX).

ARM exception index tables only define the start address of each region. This means that code with no unwinding information is effectively covered by the preceding unwinding table entry.

For normal exceptions that doesn’t matter so much - the user should ensure that any code they throw exceptions through has proper unwinding information.

Just as a quick check, I grep’d some source and the *.ARM.exidx** section is generated by the ARMELFStreamer:

http://llvm.org/docs/doxygen/html/ARMELFStreamer_8cpp_source.html
inline void ARMELFStreamer::SwitchToExIdxSection(const MCSymbol &FnStart) {
 SwitchToEHSection(".ARM.exidx",
                   ELF::SHT_ARM_EXIDX,
                   ELF::SHF_ALLOC | ELF::SHF_LINK_ORDER,
                   SectionKind::getDataRel(),
                   FnStart);
}

http://llvm.org/docs/doxygen/html/Support_2ELF_8h_source.html01145   // Fixme: All this is duplicated in MCSectionELF. Why??
01146   // Exception Index table
01147   SHT_ARM_EXIDX           = 0x70000001U,

ARM Bare Metal Hello World: Comparing LLVM & ARM-GCC

Feb 22nd, 2014 8:43 pm

With the ever maturing and stable ARM backend of LLVM it is hard to find information using it vs. the well known ARM-GCC release.

So lets start with the most simple HelloWorld example and compare LLVM and ARM-GCC.

Balau’s post is a popular one showing an ARM bare metal Hello World and test using QEMU, so lets start with that one. First, lets reproduce the compile/link steps to make sure it works:

arm-none-eabi-as -mcpu=arm926ej-s src/startup.s -o obj/startup.o
arm-none-eabi-gcc -c -mcpu=arm926ej-s -O0 src/HelloWorldSimple.c -o obj/HelloWorldSimple.o
arm-none-eabi-ld -T src/HelloWorldSimple.ld obj/HelloWorldSimple.o obj/startup.o -o bin/HelloWorldSimple.axf_gcc
arm-none-eabi-size bin/HelloWorldSimple.axf_gcc
qemu-system-arm -M versatilepb -m 128M -nographic -kernel bin/HelloWorldSimple.axf_gcc
Hello world!
QEMU: Terminated

Works just fine, so lets reproduce that using my LLVM bare metal build. All the compiler options are being shown even though some are defaulted in my build of LLVM so you can see everything it is required to get the LLVM bitcode conversion to produce a valid object file for our ARM target (I’m using the Clang driver, but you can use LLVM and pipe bitcode through the various tools so you can deeply control the optimization phase):

clang -c -target arm-none-eabi -mcpu=arm926ej-s -O0 -mfloat-abi=soft -g startup.s -o startup.o
clang -c -target arm-none-eabi -mcpu=arm926ej-s -O0 -mfloat-abi=soft -g HelloWorldSimple.c -o main.o
arm-none-eabi-ld -T HelloWorldSimple.ld main.o startup.o -o main.axf_llvm
qemu-system-arm -M versatilepb -m 128M -nographic -kernel main.axf_llvm
Hello world!
QEMU: Terminated

target : Option providing the triple that you are ‘targeting’
mpcu : Option provding the ARM core that will be flashed
mfloat-abi : Soft or Hard depending upon if your ARM core has an FPU implementation on it. Cores that can support an FPU does not mean your vendor’s core has one, comes down to features/price of the core.

Note: In both, I am turning off the optimizers via the compile drivers.

Lets look at the size of the AXF (ARM Executable Format) produced by:

   text     data     bss     dec     hex filename
    140         0       0     140      8c bin/HelloWorldSimple.axf_gcc
    
   text      data     bss     dec     hex filename
    150         0       0     150      96 bin/HelloWorldSimple.axf

There is a 10 byte difference, interesting… lets look at that a little more:

llvm			arm-gcc
section	size	addr	section	size	addr
.startup	16	65536	.startup	16	65536
.text	108	65552	.text	104	65552
.ARM.exidx	8	65660
.rodata	4	65668	.rodata	20	65656
.rodata.str1.1	14	65672
.ARM.attributes	40	0	.ARM.attributes	46	0
.comment	19	0	.comment	112	0
Total	209		Total	298

Note: I ran strip on the arm-gcc version to remove the empty debug sections that gcc inserts automatically

The .startup are the same size since this code is assembly and no codegen or optimization will happen there.

It is interesting that LLVM inserts a .ARM.exidx section even though this is only .c code. I’ll have to look at LLVM to see if -funwind-tables and/or -fexceptions are defaulted to on, but I disassemble it below so we can look at that as that is 8 bytes and accounts for the size difference in this really basic example.

.ARM.exidx is the section containing information for unwinding the stack

Note: Understanding the ARM ELF format is not really required to do bare metal programming, but, understanding how your code is allocated and loaded can maek a world of differences when you are writting linker definitions files for different cores, so send a few minutes and read the 46 pages :-)

First the gcc disassembly so we can compare the LLVM version to it:

bin/HelloWorldSimple.axf_gcc:     file format elf32-littlearm
Disassembly of section .startup:
00010000 <_Reset>:
   10000: e59fd004    ldr sp, [pc, #4]    ; 1000c <_Reset+0xc>
   10004: eb000015    bl  10060 <c_entry>
   10008: eafffffe    b   10008 <_Reset+0x8>
   1000c: 00011090    .word   0x00011090
Disassembly of section .text:
00010010 <print_uart0>:
   10010: e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   10014: e28db000    add fp, sp, #0
   10018: e24dd00c    sub sp, sp, #12
   1001c: e50b0008    str r0, [fp, #-8]
   10020: ea000006    b   10040 <print_uart0+0x30>
   10024: e59f3030    ldr r3, [pc, #48]   ; 1005c <print_uart0+0x4c>
   10028: e51b2008    ldr r2, [fp, #-8]
   1002c: e5d22000    ldrb    r2, [r2]
   10030: e5832000    str r2, [r3]
   10034: e51b3008    ldr r3, [fp, #-8]
   10038: e2833001    add r3, r3, #1
   1003c: e50b3008    str r3, [fp, #-8]
   10040: e51b3008    ldr r3, [fp, #-8]
   10044: e5d33000    ldrb    r3, [r3]
   10048: e3530000    cmp r3, #0
   1004c: 1afffff4    bne 10024 <print_uart0+0x14>
   10050: e24bd000    sub sp, fp, #0
   10054: e49db004    pop {fp}        ; (ldr fp, [sp], #4)
   10058: e12fff1e    bx  lr
   1005c: 101f1000    .word   0x101f1000
00010060 <c_entry>:
   10060: e92d4800    push    {fp, lr}
   10064: e28db004    add fp, sp, #4
   10068: e59f0004    ldr r0, [pc, #4]    ; 10074 <c_entry+0x14>
   1006c: ebffffe7    bl  10010 <print_uart0>
   10070: e8bd8800    pop {fp, pc}
   10074: 0001007c    .word   0x0001007c

Now the LLVM version:

bin/HelloWorldSimple.axf:     file format elf32-littlearm
Disassembly of section .startup:
00010000 <_Reset>:
   10000: e59fd004    ldr sp, [pc, #4]    ; 1000c <_Reset+0xc>
   10004: eb000016    bl  10064 <c_entry>
   10008: eafffffe    b   10008 <_Reset+0x8>
   1000c: 00011098    .word   0x00011098
Disassembly of section .text:
00010010 <print_uart0>:
   10010: e24dd008    sub sp, sp, #8
   10014: e1a01000    mov r1, r0
   10018: e58d0004    str r0, [sp, #4]
   1001c: e58d1000    str r1, [sp]
   10020: e59d0004    ldr r0, [sp, #4]
   10024: e5d00000    ldrb    r0, [r0]
   10028: e3500000    cmp r0, #0
   1002c: 0a000009    beq 10058 <print_uart0+0x48>
   10030: eaffffff    b   10034 <print_uart0+0x24>
   10034: e59d0004    ldr r0, [sp, #4]
   10038: e5d00000    ldrb    r0, [r0]
   1003c: e59f101c    ldr r1, [pc, #28]   ; 10060 <print_uart0+0x50>
   10040: e5911000    ldr r1, [r1]
   10044: e5810000    str r0, [r1]
   10048: e59d0004    ldr r0, [sp, #4]
   1004c: e2800001    add r0, r0, #1
   10050: e58d0004    str r0, [sp, #4]
   10054: eafffff1    b   10020 <print_uart0+0x10>
   10058: e28dd008    add sp, sp, #8
   1005c: e12fff1e    bx  lr
   10060: 00010084    .word   0x00010084
00010064 <c_entry>:
   10064: e92d4800    push    {fp, lr}
   10068: e1a0b00d    mov fp, sp
   1006c: e59f0004    ldr r0, [pc, #4]    ; 10078 <c_entry+0x14>
   10070: ebffffe6    bl  10010 <print_uart0>
   10074: e8bd8800    pop {fp, pc}
   10078: 00010088    .word   0x00010088

We can ignore the _Reset section as that is hand coded assembly and the same for both.

The c_entry is interesting as LLVM uses a move to copy the stack register to fp (r11 = frame pointer) which I what I would do, but arm-gcc does an “"add”“ to get fp into the sp and does that by adding fp to register #4(?) This is flagged as general variable for gcc… I am slightly confused by gcc’s choice to do that, now that question is when would #4 not contain zero? The rest of this function is the same between the two compilers.

The print_uart0 function is a hack function as it does not implement FIFO/flow-control to an actual UART, but in this case it points to a memory address where the discontinued ARM Versatile PB dev-board does have a UART and QEMU board simulation echos those writes. I am not going to do a line by line comparision of the generated code as for un-optimized code they are both getting the job done, but in slightly different ways in almost the same number of instructions.

So we are able to produce a working bare metal ARM AXF from LLVM and next time, I will spend a little time on compiler optimizations to see how the two code generators/optimizisers compare…

OS-X : Using DiffMerge as your Git visual merge and diff tool

Feb 19th, 2014 8:56 pm

I tend to stay on the cmd line as much as possible, but for visual diffs, an ncurses console diff tool just does not cut it for me. Beyond Compare Pro by Scooter on Windows is one of best that I have ever used and with licenses at work I never had to worry about not having it on a work desktop or laptop.

But on OS-X at home, Beyond Compare was not available. There is a 4.0 release in the works (beta now), but $50.00 USD for a personal-use copy on OS-X and not having feature parity with Windows Pro features(?), I just can not pull the trigger on that purchuse when there are other (cheaper) options that work just as well for personal development.

So, normally for a free visual diff, you can not beat meld, it is a great open-source tool, but on OS-X it fires up X (Quartz for me) and it getting long in the tooth in terms of the GUI’s human factors (feature set is still great). If there was a Qt version of this, the search would be over… free or not!

So some searching landed me on an old post by Todd Huss about using DiffMerge as your visual diff/merge for git and it was actually what I was looking for, well almost ;-) It is missing a few features, but they have a free version and it works really well and has a great OS-X interface… search is over for now…

SourceGear has a $19.00 USD version that include file export with HTML formatting and if I could see example HTML code that it produces, I pay for that feature in a heart-beat, but the feature is completely locked out till you actaully register, bummer…

Todd recommends using the DiffMerge installer version vs. the dmg version, I go the other way on that. Download the dmg version, open it and drag/drop the app to your Applications. Then in a term window you can copy the Extras/diffmerge.sh to your /usr/local/bin directory (Execute attrib is already set, so no chmod needed..), but I copied it as just vdiff as that is quicker to type. No admin rights are need to install it that way and that makes me happy… I can vdiff file1.c file2.c on the cmd line to pop the GUI open and populate it.

I then used the git setup he has listed and everything is working great so far. Click on the image above to it comparing the disassembly of LLVM vs. GCC code generation for bare metal ARM development.

Your git setup is:

git config --global diff.tool diffmerge
git config --global difftool.diffmerge.cmd 'diffmerge "$LOCAL" "$REMOTE"'
git config --global merge.tool diffmerge
git config --global mergetool.diffmerge.cmd 'diffmerge --merge --result="$MERGED" "$LOCAL" "$(if test -f "$BASE"; then echo "$BASE"; else echo "$LOCAL"; fi)" "$REMOTE"'
git config --global mergetool.diffmerge.trustExitCode true

Your git shortcuts are: linenos:false

# diff the local file.m against the checked-in version
git difftool file.m
# diff the local file.m against the version in some-feature-branch
git difftool some-feature-branch file.m
# diff the file.m from the Build-54 tag to the Build-55 tag
git difftool Build-54..Build-55 file.m
#To resolve merge conflicts, just run git mergetool:
git mergetool

Thanks Todd, works great.

LLVM ARM triple CPU targets

Feb 19th, 2014 6:36 am

I am building a bare-metal ARM Clang/LLVM cross-compiler for my Arm Cortex-M LLVM vs. arm-gcc experiments and was looking for the complete ARM core list available.

LLVM makes it soooo easy to get that information from the LLVM static compiler binary (llc), just pass it a generic ARM triple, here is the lsit from my build:

llc -mtriple=arm-none-eabi -mcpu=help

Available CPUs for this target:

  arm1020e      - Select the arm1020e processor.
  arm1020t      - Select the arm1020t processor.
  arm1022e      - Select the arm1022e processor.
  arm10e        - Select the arm10e processor.
  arm10tdmi     - Select the arm10tdmi processor.
  arm1136j-s    - Select the arm1136j-s processor.
  arm1136jf-s   - Select the arm1136jf-s processor.
  arm1156t2-s   - Select the arm1156t2-s processor.
  arm1156t2f-s  - Select the arm1156t2f-s processor.
  arm1176jz-s   - Select the arm1176jz-s processor.
  arm1176jzf-s  - Select the arm1176jzf-s processor.
  arm710t       - Select the arm710t processor.
  arm720t       - Select the arm720t processor.
  arm7tdmi      - Select the arm7tdmi processor.
  arm7tdmi-s    - Select the arm7tdmi-s processor.
  arm8          - Select the arm8 processor.
  arm810        - Select the arm810 processor.
  arm9          - Select the arm9 processor.
  arm920        - Select the arm920 processor.
  arm920t       - Select the arm920t processor.
  arm922t       - Select the arm922t processor.
  arm926ej-s    - Select the arm926ej-s processor.
  arm940t       - Select the arm940t processor.
  arm946e-s     - Select the arm946e-s processor.
  arm966e-s     - Select the arm966e-s processor.
  arm968e-s     - Select the arm968e-s processor.
  arm9e         - Select the arm9e processor.
  arm9tdmi      - Select the arm9tdmi processor.
  cortex-a12    - Select the cortex-a12 processor.
  cortex-a15    - Select the cortex-a15 processor.
  cortex-a5     - Select the cortex-a5 processor.
  cortex-a53    - Select the cortex-a53 processor.
  cortex-a57    - Select the cortex-a57 processor.
  cortex-a7     - Select the cortex-a7 processor.
  cortex-a8     - Select the cortex-a8 processor.
  cortex-a9     - Select the cortex-a9 processor.
  cortex-a9-mp  - Select the cortex-a9-mp processor.
  cortex-m0     - Select the cortex-m0 processor.
  cortex-m3     - Select the cortex-m3 processor.
  cortex-m4     - Select the cortex-m4 processor.
  cortex-r5     - Select the cortex-r5 processor.
  ep9312        - Select the ep9312 processor.
  generic       - Select the generic processor.
  iwmmxt        - Select the iwmmxt processor.
  krait         - Select the krait processor.
  mpcore        - Select the mpcore processor.
  mpcorenovfp   - Select the mpcorenovfp processor.
  strongarm     - Select the strongarm processor.
  strongarm110  - Select the strongarm110 processor.
  strongarm1100 - Select the strongarm1100 processor.
  strongarm1110 - Select the strongarm1110 processor.
  swift         - Select the swift processor.
  xscale        - Select the xscale processor.

← Older Blog Archives Newer →

SushiHangover

PowerShell, Learn it or Perish ;-)