While I was waiting for my SpokePOV to come in the mail, I was looking at the firmware to see how spiffy I could could tweak it. So I load up the code in AVR Studio and what do I find for the code size:
Code: Select all
Program: 1966 bytes (96.0% Full)
Code: Select all
-Wall -gdwarf-2 -std=gnu99 -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -DF_CPU=8000000UL
I looked around and found a list of size optimizations here.
Results in order applied (I did not attempt to test every configuration with every other configuration):
- Added -Wl,--relax: 1964 bytes (95.9% Full)
- Added -ffreestanding and void main() __attribute__ ((noreturn)); : 1942 bytes (94.8% Full) Dropped 22 bytes!
- Added -fno-tree-scev-cprop: no effect still 1942 bytes (94.8% Full)
- Added -ffunction-sections, -fdata-sections, -Wl,--gc-sections: Program: 1910 bytes (93.3% Full) Dropped 32 bytes!
At this point I started looking in the code for repeated structures. I found the "NOP; NOP; NOP; NOP;" (or NOP4 for short) blocks present throughout the code, with the relatively consistent comment "// wait 500 ns". The delay comes from wait 4 cpu cycles before starting the next operation. So in theory any operation that takes 4 cpu cycles without changing the state of the CPU is an equivalent replacement. Looking at the source listing the NOP4 code takes 8 bytes each time it is used (2 bytes per NOP * 4). So as long as I can find an equivalent timing operation that has no effect on the state that is less than 8 bytes is a net gain per use.
SpokePOV uses 9 NOP4, or 80 bytes for timing code the does nothing else.
My first step was to replace "NOP; NOP; NOP; NOP;" with "wait500ns();" where "wait500ns();" is
Code: Select all
#define wait500ns() NOP; NOP; NOP; NOP;
Looking at the instruction chart located here, I pick found an instruction that could work, RJMP. RJMP takes the PC (program counter) and changes it too PC= PC + k + 1. It also happens to take two cpu cycles. During non-jump operations, the PC is incremented by one each clock cycle (if you ignore interrupts and the like). The key here is an "rjmp +.0" is a NOP that takes two cycles! It also happens to be encoded in 2 bytes, so wait500ns() can be redefined to:
Code: Select all
#define JMP_P1 asm volatile("rjmp .+0");
#define wait500ns() JMP_P1; JMP_P1;
Code: Select all
Program: 1874 bytes (91.5% Full)
tl;dr Final results: 92 bytes (~46 instructions) recovered while learning avr-gcc flags and AVR ASM with hopefully no effect on execution.
Anyone else can squeeze the SpokePOV firmware anymore?