Whenever we see a gap, we flush the temporary packets (but not the adapter). If we
had some data temporarily stored it will be outputted (the sound will sound a bit
garbled... but that's how it sounds on MacOSX :)
Reverse-engineered by comparing:
* A rtp hinted file provided by DarwinStreamingServer
* The output procued by DSS for that same file
Also used various streaming sources available on the internet to fine-tune
the code.
The header/codec_data extraction methods are from FFMpeg (LGPL).