The first buffer does not contain more garbage than any other MP3 decoder
outputs and we don't really know how much we have to drop or not.
After this change the output has the same duration as with mad.
Some audio decoders (at least the MP3 decoder on MTK based devices) outputs
raw audio in batches of multiple audio frames. We need to handle that
properly, otherwise the base class will be kind of unhappy.