Contrary to what one might believe, this actually reduces the size of
the structs due to alignment constraints. On Linux x86-64 clang/gcc it
reduces the size of the caption_frame_t struct from 7760 bytes to 6800
bytes, on Windows x86-64 MSVC from 11600 bytes to 6800 bytes.
It also causes simpler and potentially faster assembly to be generated
as the values can be directly accessed as uint8_t instead of having to
extract the corresponding bits with bitwise operations.
It also gives us the same ABI with clang/gcc and MSVC.
There is some broken software out there not inserting the empty lines
and we don't really need them for proper parsing. Only require an empty
line between header and the first caption line.