Heuristic: if the second start-code is available, check whether that
one marks the start of a new frame because e.g. this is a sequence
or picture header. This doesn't save much, since we already cache the
results.
Accelerate scan for start codes by skipping up to 3 bytes per iteration.
A start code prefix is defined by the following bytes: 00 00 01. Thus,
for any group of 3 bytes (xx yy zz), we have the following possible cases:
1. If zz != 1, this cannot be a start code, then skip 3 bytes;
2. If yy != 0, this cannot be a start code, then skip 2 bytes;
3. If xx != 0 or zz != 1, this cannot be a start code, then skip 1 byte;
4. xx == 00, yy == 00, zz == 1, we have match!
This algorithm requires to peek bytes from the adapter. This increases the
amount of bytes copied to a temporary buffer, but this process is much faster
than scanning for all the bytes and using shift/masks. So, overall, this is
a win.
Move parsing back to decoding step, but keep functions separate for now.
This is needed for future optimizations that may introduce some meta data
for parsed info attached to codec frames.
Allocate decoder unit earlier in the main parse() function and don't
delegate this task to derived classes. The ultimate purpose is to get
rid of dynamic allocation of decoder units.
Avoid creating a GstBuffer for slice data. Rather, directly use the codec
frame input buffer data. This is possible because the codec frame is valid
until end_frame() where we submit the VA buffers for decoding. Anyway, the
slice data buffer is copied into the VA buffer when it is created.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Parse slice() header and first macroblock position earlier in _parse()
function instead of waiting for the _decode() stage. This doesn't change
anything but readability.
Introduce new GstVaapiDecoderUnitMpeg2 object, which holds the standard
GstMpegVideoPacket and additional parsed header info. Besides, we now
parse as early as in the _parse() function so that to avoid un-necessary
creation of sub-buffers in _decode() for video packets that are not slices.
Invoke gst_mpeg_video_finalise_mpeg2_sequence_header() to get the
correct PAR values. While doing so, require a newer version of the
bitstream parser library.
Note: it may be necessary to also parse the Sequence_Display_Extension()
header.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Split decode_buffer() into the core infrastructure that determines
the packets contained in the adapter and the actual function that
decodes the packet data.
Fix memory leakage of empty packets, i.e. packets that only contain
the start code prefix. In particular, free empty user-data packets.
Besides, the codec parser will already fail gracefully if the packet
to parse does not have the minimum required size. So, we can also
completely drop the block of code that used to handle packets of size 4
(including the start code).
Fix return value when the second scan for start code fails. This means
there is not enough data to determine the full extents of the current
packet and the function shall return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA
in this case, instead of GST_VAAPI_DECODER_STATUS_SUCCESS.
Integrate the start code prefix in the slice data buffer that is submitted
to the hardware. VA-API specifies that slice_data_offset is the offset to
the first byte of slice data. And, for MPEG-2, slice() data begins with
the slice_start_code. Some VA driver implementations (EMGD) expect this.
Allow MPEG-2 High profile streams only if the HW supports that profile
or no High profile specific bits are used, and thus Main profile could
be used instead. i.e. chroma_format is 4:2:0, intra_dc_precision is not
set to 11 and no sequence_scalable_extension() was parsed.
In P-pictures, prediction shall be made from the two most recently
decoded reference fields. However, when the first I-frame is a field,
the next field of the current picture could be a P-picture but only a
single field was decoded so far. In this case, create a dummy picture
with POC = -1 that will be used as reference.
Some VA drivers would error out if P-pictures don't have a forward
reference picture. This is true in general but not in this very specific
initial case.
Allow fallback from simple to main profile when the HW decoder does
not support the former profile and that no sequence_header_extension()
is available to point out this.
Introduce a POC field in GstVaapiPicture so that to store simpler sequential
numbers. A signed 32-bit integer should be enough for 1 year of continuous
video streaming at 60 Hz.
Use this new POC value to maintain the DPB, instead of 64-bit timestamps.
This also aligns with H.264 that will be migrated to GstVaapiDpb infrastructure.
Always prefer PTS from the demuxer layer for GOP times. If this is invalid,
i.e. demuxer could not determine the PTS or the generated PTS is lower than
max PTS from past pictures, then try to fix it up based on the duration of
a frame.
For picture PTS, simply use the GOP PTS formerly computed then use TSN to
reconstruct a current time. Also now handle wrapped TSN correctly.
Some streams, badly constructed, could have signaled an interlaced
frame while the sequence was meant to be progressive. Warn and force
frame to be progressive in this case.
Add first-field (FF) flag to GstVaapiPicture, thus not requiring is_first_field
member in each decoder. Rather, when a GstVaapiPicture is created, it is considered
as the first field. Any subsequent allocated field will become the second field.
Add top-field-first (TFF) and interlaced flags to GstVaapiPicture so they
could be propagated to the surface proxy when it is pushed for rendering.
Besides, top and bottom fields are now expressed with picture structure flags
from GstVaapiSurfaceRenderFlags.
Some streams have incorrect GOP timestamps, or nothing set at all.
i.e. GOP time is 00:00:00 for all GOPs. Try to recover in this case
from demuxer timestamps, which are monotonic.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Skip all pictures prior to the first sequence_header(). Besides,
skip all picture_data() if there was no prior picture_header().
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
VA-API expects slice_vertical_position as the initial position from the
bitstream. i.e. the direct slice() information. VA drivers will be fixed
accordingly.
Original values from sequence_header() are 12-bit and the remaining
2 most significant bits are coming from sequence_extension().
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
6.3.15 says that "some slices may have the same slice_vertical_position,
since slices may start and finish anywhere". So, we can't submit the current
picture to the HW right away since subsequent slices would be missing.