Don't create temporary GstBuffers for all decoder units, even if they
are lightweight "sub-buffers", since it is not really necessary to keep
the buffer data around.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Use GstVaapiDpb interface instead of maintaining our own prev and next
picture pointers. While doing so, try to derive a sensible POC value.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Avoid usage of goto. Simplify decode_step() process to first accumulate all
pending buffers into the GstAdapter, and then parse and decode units from
that input adapter. Stop the process once a frame is fully decoded or an
error occurred.
Make sure we always have a free surface left to use for decoding the
current frame. This means that decode_step() has to return once a frame
gets decoded. If the current adapter contains more buffers with valid
frames, they will get parsed and decoded on subsequent iterations.
Keep only one DPB interface and rename gst_vaapi_dpb2_get_references()
to gst_vaapi_dpb_get_neighbours() so that to retrieve pictures in DPB
around the specified picture POC.
Move GstVaapiDpbMpeg2 API to a more generic version that could also be
useful to other decoders that require 2 reference pictures, e.g. VC-1.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Fix support for global-alpha subpictures. The previous changes brought
the ability to check for GstVideoOverlayRectangle changes by comparing
the underlying pixel buffer pointers. If sequence number and pixel data
did not change, then this is an indication that only the global-alpha
value changed. Now, try to update the underlying VA subpicture global-alpha
value.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Don't re-upload VA subpicture if only the render rectangle changed.
Rather deassociate the subpicture and re-associate it with the new
render rectangle.
A GstVideoOverlayRectangle is created whenever the underlying pixels data
change. However, when global-alpha is supported, it is possible to re-use
the same GstVideoOverlayRectangle but with a change to the global-alpha
value. This process causes a change of sequence number, so we can no longer
check for that.
Still, if sequence numbers did not change, then there was no change in
global-alpha either. So, we need a way to compare the underlying GstBuffer
pointers. There is no API to retrieve the original pixels buffer from
a GstVideoOverlayRectangle. So, we use the following heuristics:
1. Use gst_video_overlay_rectangle_get_pixels_unscaled_argb() with the same
format flags from which the GstVideoOverlayRectangle was created. This
will work if there was no prior consumer of the GstVideoOverlayRectangle
with alternate (non-"native") format flags.
2. In overlay_rectangle_has_changed_pixels(), we have to use the same
gst_video_overlay_rectangle_get_pixels_unscaled_argb() function but
with flags that match the subpicture. This is needed to cope with
platforms that don't support global-alpha in HW, so the gst-video
layer takes care of that and fixes this up with a possibly new
GstBuffer, and hence pixels data (or) in-place by caching the current
global-alpha value applied. So we have to determine the rectangle
was previously used, based on what previous flags were used to
retrieve the ARGB pixels buffer.
We previously assumed that an overlay composition changed if the number
of overlay rectangles in there actually changed, or that the rectangle
was updated, and thus its seqnum was also updated.
Now, we can cope with cases where the GstVideoOverlayComposition grew
by one or a few more overlay rectangles, and the initial overlay rectangles
are kept as is.
Create the GPtrArray once in the _init() function and destroy it only
in the _finalize() function. Then use overlay_clear() to remove all
subpicture associations for intermediate updates, don't recreate the
GPtrArray.
Make GstVaapiOverlayRectangle a reference counted object. Also make
sure that overlay_rectangle_new() actually creates and associates the
VA subpicture.
Handle global-alpha from GstVideoOverlayComposition API. Likewise,
the same code path could also work for premultiplied-alpha but this
was not tested.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Add the necessary helpers in GstVaapiDisplay to determine whether subpictures
with global alpha are supported or not. Also add accessors in GstVaapiSubpicture
to address this feature.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Add premultiplied-alpha and global-alpha feature flags, along with converters
between VA-API and gstreamer-vaapi definitions. Another round of helpers is
also necessary for GstVideoOverlayComposition API.
Use GPOINTER_TO_SIZE() instead of GPOINTER_TO_UINT() while manipulating
pointers. The latter is meant to be 32-bit only, not uintptr_t like size.
Only a gsize can hold all bits of a pointer.
Thanks to Ouping Zhang for spotting this error.
Heuristic: if the second start-code is available, check whether that
one marks the start of a new frame because e.g. this is a sequence
or picture header. This doesn't save much, since we already cache the
results.
Accelerate scan for start codes by skipping up to 3 bytes per iteration.
A start code prefix is defined by the following bytes: 00 00 01. Thus,
for any group of 3 bytes (xx yy zz), we have the following possible cases:
1. If zz != 1, this cannot be a start code, then skip 3 bytes;
2. If yy != 0, this cannot be a start code, then skip 2 bytes;
3. If xx != 0 or zz != 1, this cannot be a start code, then skip 1 byte;
4. xx == 00, yy == 00, zz == 1, we have match!
This algorithm requires to peek bytes from the adapter. This increases the
amount of bytes copied to a temporary buffer, but this process is much faster
than scanning for all the bytes and using shift/masks. So, overall, this is
a win.
Move parsing back to decoding step, but keep functions separate for now.
This is needed for future optimizations that may introduce some meta data
for parsed info attached to codec frames.
Optimize pre-allocation of decoder units, thus avoiding un-necessary
memory reallocations. The heuristic used is that we could have around
one slice unit per macroblock line.
Use a GArray to hold decoder units in a frame, instead of a single-linked
list. This makes 'append' calls faster, but not that much. At least, this
makes things clearer.
Allocate decoder unit earlier in the main parse() function and don't
delegate this task to derived classes. The ultimate purpose is to get
rid of dynamic allocation of decoder units.
The SPS, PPS and slice headers are not fully zero-initialized in the
codecparsers/ library. Rather, the standard upstream behaviour is to
initialize only certain syntax elements with some inferred values if
they are not present in the bitstream.
At the gstreamer-vaapi decoder level, we need to further initialize
certain syntax elements with some sensible default values so that to
not complicate VA drivers that just pass those verbatim to the HW,
and also avoid an memset() of the whole decoder unit.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Make GstVaapiVideoBuffer a simple wrapper for video meta. This buffer is
no longer necessary but for compatibility with GStreamer 0.10 APIs or users
expecting a GstSurfaceBuffer like Clutter.
Use PTS value computed by the decoder, which could also be derived from
the GstVideoCodecFrame PTS. This makes it possible to fix up the PTS if
the original one was miscomputed or only represented a DTS instead.
Create a new VA context if the encoded surface size changes because we
need to keep the underlying surface pool until the last one was released.
Otherwise, either of the following cases could have happened: (i) release
a VA surface to an inexistent pool, or (ii) release VA surface to an
existing surface pool, but with different size.
Avoid creating a GstBuffer for slice data. Rather, directly use the codec
frame input buffer data. This is possible because the codec frame is valid
until end_frame() where we submit the VA buffers for decoding. Anyway, the
slice data buffer is copied into the VA buffer when it is created.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Parse slice() header and first macroblock position earlier in _parse()
function instead of waiting for the _decode() stage. This doesn't change
anything but readability.
Introduce new GstVaapiDecoderUnitMpeg2 object, which holds the standard
GstMpegVideoPacket and additional parsed header info. Besides, we now
parse as early as in the _parse() function so that to avoid un-necessary
creation of sub-buffers in _decode() for video packets that are not slices.
Theory of operations: all units marked as "slice" are moved to the "units"
list. Since this list only contains slice data units, the prev_slice pointer
was removed. Besides, we now maintain two extra lists of units to be decoded
before or after slice data units.
In particular, all units in the "pre_units" list will be decoded before
GstVaapiDecoder::start_frame() is called and units in the "post_units"
list will be decoded after GstVaapiDecoder::end_frame() is called.
Make GstVaapiSurfaceProxy only a thin wrapper around a VA context and a
VA surface. i.e. drop any other attribute like timestamp, duration,
interlaced or top-field-first.
Maintain decoded surfaces as GstVideoCodecFrame objects instead of
GstVaapiSurfaceProxy objects. The latter will tend to be reduced to
the strict minimum: a context and a surface.
Add new gst_vaapi_decoder_get_frame() function meant to be used with
gst_vaapi_decoder_decode(). The purpose is to return the next decoded
frame as a GstVideoCodecFrame and the associated GstVaapiSurfaceProxy
as the user-data object.
Decoder units were zero-initialized, including the SPS/PPS/slice headers.
The latter don't require zero-initialization since the codecparsers/ lib
will do so for key variables already. This is not a great value per se but
at least it makes it possible to check whether the default initialization
decisions made in the codecparsers/ lib were right or not.
This can be reverted if this exposes too many issues.
Drop explicit initialization of most fields that are implicitly set to
zero. Drop helper macros for casting to GstVaapiPictureH264 or
GstVaapiFrameStore. Also remove some useless checks for NULL pointers.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Introduce new GstVaapiDecoderUnitH264 object, which holds the standard
NAL unit header (GstH264NalUnit) and additional parsed header info.
Besides, we now parse headers as early as in the _parse() function so
that to avoid un-necessary creation of sub-buffers in _decode() for
NAL units that are not slices.
This is a performance win by ~+1.1% only.
Use standard GstVideoCodecState throughout GstVaapiDecoder and expose
it with a new gst_vaapi_decoder_get_codec_state() function. This makes
it possible to drop picture size (width, height) information, framerate
(fps_n, fps_d) information, pixel aspect ratio (par_n, par_d) information,
and interlace mode (is_interlaced field).
This is a new API with backwards compatibility maintained. In particular,
gst_vaapi_decoder_get_caps() is still available.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Align gst_vaapi_decoder_get_surface() semantics with the rest of the
API. That is, return a GstVaapiDecoderStatus and the decoded surface
as a handle to GstVaapiSurfaceProxy in parameter.
This is an API/ABI change.
Introduce new decoding process whereby a GstVideoCodecFrame is created
first. Next, input stream buffers are accumulated into a GstAdapter,
that is then passed to the _parse() function. The GstVaapiDecoder object
accumulates all parsed units and when a complete frame or field is
detected, that GstVideoCodecFrame is passed to the _decode() function.
Ultimately, the caller receives a GstVaapiSurfaceProxy if decoding
process was successful.
The start_frame() hook is called prior to traversing all decode-units
for decoding. The unit argument represents the first slice in the frame.
Some codecs (e.g. H.264) need to wait for the first slice in order to
determine the actual VA context parameters.
Split decoding process into two steps: (i) parse incoming bitstreams
into simple decoder-units until the frame or field is complete; and
(ii) decode the whole frame or field at once.
This is an ABI change.
Introduce a new GstVaapiDecoderFrame that is just a list of decoder units
(GstVaapiDecoderUnit objects) that constitute a frame. This object is just
an extension to GstVideoCodecFrame for VA decoder purposes. It is available
as the user-data member element.
This is a libgstvaapi internal object.
Introduce GstVaapiDecoderUnit which represents a fragment of the source
stream to be decoded. For instance, a decode-unit will be a NAL unit for
H.264 streams, an EBDU for VC-1 streams, and a video packet for MPEG-2
streams.
This is a libgstvaapi internal object.
GstVaapiSurfaceProxy does not use any particular functionality from
GObject. Actually, it only needs a basic object type with reference
counting.
This is an API and ABI change.
Introduce a new reference counted object that is very lightweight and
also provides flags and user-data functionalities. Initialization and
finalization times are reduced by up to a factor 5x vs GstMiniObject
from GStreamer 0.10 stack.
This is a libgstvaapi internal object.
Fix decode_slice() to ensure a VA context exists prior to creating a
new GstVaapiSliceH264, which invokes vaCreateBuffer() with some VA
context ID. i.e. the latter was not initialized, thus causing failures
on Cedar Trail for example.
Use glib >= 2.32 semantics for GMutex and GRecMutex wrt. initialization
and termination. Basically, the new mutex objects can be used as static
mutex objects from the deprecated APIs, e.g. GStaticMutex and GStaticRecMutex.
This fixes symbol clashes between the gst-vaapi built-in codecparsers/
library and the system-provided one, mainly used by videoparses/. Now,
only symbols with the gst_vaapi_* prefix will be exported, if they are
not marked as "hidden" to libgstvaapi.
Fix gst_vaapi_image_map() to return TRUE and the GstVaapiImageRaw
structure correctly filled in if the image was already mapped.
Likewise, make gst_vaapi_image_unmap() return TRUE if the image
was already unmapped.
Fix reference leak of surface and image in GstVaapiVideoBuffer wrapper,
thus resulting on actual memory leak of GstVaapiImage when using them
for downloads/uploads from VA surfaces and more specifically surfaces
when the pipeline is shutdown. i.e. vaTerminate() was never called
because the resources were not unreferenced, and thus not deallocated
in the end.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
The picture size signalled by sps->{width,height} is the actual size with
cropping applied, not the original size derived from pic_width_in_mbs_minus1
and pic_height_in_map_units_minus1. VA driver expects that original size,
uncropped.
There is another issue pending: frame cropping information needs to be
taken care of.
Fix commit 18245b4 so that to link and build parserutils.[ch] first.
This is needed since that's the common dependency for actual codec
parsers (gstvc1parser.c for instance).
... this is useful to make sure pixel-aspect-ratio and framerate
information are correctly parsed since we have no means to detect
that at configure time.
Invoke gst_mpeg_video_finalise_mpeg2_sequence_header() to get the
correct PAR values. While doing so, require a newer version of the
bitstream parser library.
Note: it may be necessary to also parse the Sequence_Display_Extension()
header.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>