Unroll some more loops in the fallback code that seems to work fine for ARM. Add some simple ARM optimizations taken from speex.