Implement the new render-to-pixmap API. The only supported pixmap format
that will work is xRGB, with native byte ordering. Others might work but
they were not tested.
Add API to transfer VA urfaces to native pixmaps. Also add an API to
render a native pixmap, for completeness. In general, rendering to
pixmap would only be useful to certain VA drivers and use cases on
X11 display servers. e.g. GLX_EXT_texture_from_pixmap (TFP) handled
in an upper layer.
Mark dummy pictures as output already so that we don't try to submit
them to the upper layer since this is purely internal / temporary
picture for helping the decoder.
Once the picture was output, it is no longer necessary to keep an extra
reference to the underlying GstVideoCodecFrame. So, we can release it
earlier, and maybe subsequently release the associate surface proxy
earlier.
Fix new internal video format API, based on GstVideoFormat, to not
clobber with system symbols. So replace the gst_video_format_* prefix
with gst_vaapi_video_format_ prefix, even if the format type remains
GstVideoFormat.
Fix memory leak when processing interlaced pictures and that occurs
because the first field, represented as a GstVideoCodecFrame, never
gets released. i.e. when the picture is completed, this is generally
the case when the second field is successfully decoded, we need to
propagate the GstVideoCodecFrame of the first field to the original
GstVideoDecoder so that it could reclaim memory.
Otherwise, we keep accumulating the first fields into GstVideoDecoder
private frames list until the end-of-stream is reached. The frames
are eventually released there, but too late, i.e. too much memory
may have been consumed.
https://bugzilla.gnome.org/show_bug.cgi?id=701257
The queue of free objects to used was deallocated with g_queue_free_full().
However, this convenience function shall only be used if the original queue
was allocated with g_queue_new(). This caused memory corruption, eventually
leading to a crash.
The correct solution is to pair the g_queue_init() with the corresponding
g_queue_clear(), while iterating over all free objects to deallocate them.
Fix creation of surface pool objects to honour explicit pixel format
specification. If this operation is not supported, then fallback to
the older interface with chroma format.
If a VA surface was allocated with the chroma-format interface, try to
determine the underlying pixel format on gst_vaapi_surface_get_format(),
or return GST_VIDEO_FORMAT_ENCODED if this is not a supported operation.
Make it possible to create VA surfaces with a specific pixel format.
This is a new capability brought in by VA-API >= 0.34.0. If that
capability is not built-in (e.g. using VA-API < 0.34.0), then
gst_vaapi_surface_new_with_format() will return NULL.
Add gst_video_format_get_chroma_type() helper function to determine
the GstVaapiChromaType from a standard GStreamer video format. It is
possible to reconstruct that from GstVideoFormatInfo but it is much
simpler (and faster?) to use the local GstVideoFormatMap table.
Add new chroma formats available with VA-API >= 0.34.0. In particular,
this includes "RGB" chroma formats, and more YUV subsampled formats.
Also add a new from_GstVaapiChromaType() helper function to convert
libgstvaapi chroma type to VA chroma format.
Make gst_vaapi_image_pool_new() succeed, and thus returning a valid
image pool object, only if the underlying VA display does support the
requested VA image format.
Get rid of GstCaps to create surface/image pool, and use GstVideoInfo
structures instead. Those are smaller, and allows for streamlining
libgstvaapi more.
Add new video format mappings to VA image formats:
- YUV: packed YUV (YUY2, UYVY), grayscale (Y800) ;
- RGB: 32-bit RGB without alpha channel (XRGB, XBGR, RGBX, BGRX).
Fix debug message string with image format expressed with GstVideoFormat
instead of the obsolete format that turned out to be a fourcc.
This is a regression from git commit e61c5fc.
In particular, use gst_video_info_from_caps() helper function in VA image
for implementating gst_vaapi_image_get_buffer() [vaapidownload] and
gst_vaapi_image_update_from_buffer() [subpictures] in GStreamer 0.10 builds.
Drop GstVaapiImageFormat helpers since everything was moved to the new
GstVideoFormat based API. Don't bother with backwards compatibility and
just bump the library major version afterwards.
If the stream has a sequence_display_extenion, then attach the
display_horizontal/display_vertical dimension as the cropping
rectangle width/height to the GstVaapiPicture.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
If the Advanced profile has display_extension fields, then set the display
width/height dimension as cropping rectangle to the GstVaapiPicture.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
If the encoded stream has the frame_cropping_flag set, then associate
the cropping rectangle to GstVaapiPicture.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Make it possible associate an empty cropping rectangle to the surface
proxy, thus resetting any cropping rectangle that was previously set.
This allows for returning plain NULL when no cropping rectangle was
initially set up to the surface proxy, or if it was reset to defaults.
Add helper macros to retrieve the VA surface information like size
(width, height) or chroma type. This is a micro-optimization to avoid
useless function calls and NULL pointer re-checks in internal routines.
Add gst_vaapi_picture_set_crop_rect() helper function to copy the video
cropping information from raw bitstreams to each picture being decoded.
Also add helper function to surface proxy to propagate that information
outside of libgstvaapi. e.g. plug-in elements or standalone applications.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
The MPEG-2 standard specifies (6.3.7) that all quantisation matrices
shall be reset to their default values when a Sequence_Header() is
decoded.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Break the circular references between GstVaapiContext and its children
GstVaapiSurfaces. Since the VA surfaces held an extra reference to the
context, which holds a reference to its VA surfaces, then none of those
were released.
How does this impact support for subpictures?
The only situation when the parent context needs to disappear is when
it is replaced with another one because of a resolution change in the
video stream for instance, or a normal destroy. In this case, it does
not really matter to apply subpictures to the peer surfaces since they
are either gone, or those that are left in the pipe can probably bear
a reinstantiation of the subpictures for it.
So, parent_context is set to NULL when the parent context is destroyed,
other VA surfaces can still get subpictures attached to them, individually
not as a whole. i.e. subpictures for surface S1 will be created from
active composition buffers and associated to S1, subpictures for S2 will
be created from the next active composition buffers, etc. We don't try
to cache the subpictures in those cases (pending surfaces until EOS
is reached, or pending surfaces until new surfaces matching new VA context
get to be used instead).
Make sure to create the GLX context once the window object has completed
its creation. Since gl_resize() relies on the newly created window size,
then we cannot simply overload the GstVaapiWindowClass::create() hook.
So, we just call into gst_vaapi_window_glx_ensure_context() once the
window object is created in the gst_vaapi_window_glx_new*() functions.
Make it possible to add extra an extra filter to most of display cache
lookup functions so that the GstVaapiDisplay instance can really match
a compatible and existing display by type, instead of relying on extra
string tags (e.g. "X11:" prefix, etc.).
Drop obsolete GST_VAAPI_IS_xxx() helper macros since we are no longer
deriving from GObject and so those were only checking for whether the
argument was NULL or not. This is now irrelevant, and even confusing
to some extent, because we no longer have type checking.
Note: this incurs more type checking (review) but the libgstvaapi is
rather small, so this is manageable.
Make GstVaapiID a gsize instead of guessing an underlying integer large
enough to hold all bits of a pointer. Also drop GST_VAAPI_ID_NONE since
this is plain zero and that it is no longer passed as varargs.
Port GstVaapiDecoder and GstVaapiDecoder{MPEG2,MPEG4,JPEG,H264,VC1} to
GstVaapiMiniObject. Add gst_vaapi_decoder_set_codec_state_changed_func()
helper function to let the user add a callback to a function triggered
whenever the codec state (e.g. caps) changes.
Port GstVaapiVideoPool, GstVaapiSurfacePool and GstVaapiImagePool to
GstVaapiMiniObject. Drop gst_vaapi_video_pool_get_caps() since it was
no longer used for a long time. Make object allocators static, i.e.
local to the shared library.
Drop support for user-defined data since this capability was not used
so far and GstVaapiMiniObject represents the smallest reference counted
object type. Add missing GST_VAAPI_MINI_OBJECT_CLASS() helper macro.
Besides, since GstVaapiMiniObject is a libgstvaapi internal object, it
is also possible to further simplify the layout of the object. i.e. merge
GstVaapiMiniObjectBase into GstVaapiMiniObject.
Propagate the picture size from the bitstream to the GstVaapiDecoder,
and subsequent user who installed a signal on notify::caps. This fixes
decoding of TS streams when the demuxer failed to extract the required
information.
Add gst_vaapi_decoder_get_frame_with_timeout() helper function that will
wait for a frame to be decoded, until the specified timeout in microseconds,
prior to returning to the caller.
This is a fix to performance regression from 851cc0, whereby the vaapidecode
loop executed on the srcpad task was called to often, thus starving all CPU
resources.
Rework GstVideoDecoder::handle_frame() to decode the current frame,
while possibly waiting for a free surface, and separately submit all
decoded frames from a task. This makes it possible to pop and render
decoded frames as soon as possible.
Add support for interlaced streams with GStreamer 1.0 too. Basically,
this enables vaapipostproc, though it is not auto-plugged yet. We also
make sure to reply to CAPS queries, and happily handle CAPS events.
Fix support for interlaced contents with GStreamer 0.10. In particular,
propagate GstVaapiSurfaceProxy frame flags to GstVideoCodecFrame flags
correctly.
This is a regression from commit 87e5717.
Rename GstVaapiDecoderFrame to GstVaapiParserFrame because this data
structure was only useful to parsing and a proper GstvaapiDecoderFrame
instance will be created instead.
Fix regression from 0.4-branch whereby GstVaapiSurfaceProxy no longer
held any information about the expected presentation timestamp, frame
duration or additional flags like interlaced or top-field-first.
Use new GstVaapiSurfaceProxy internal helper functions to propagate the
necessary GstVideoCodecFrame flags to vaapidecode (GStreamer 0.10).
Also make GstVaapiDecoder push_frame() operate similarly to drop_frame().
i.e. increase the GstVideoCodecFrame reference count in push_frame rather
than gst_vaapi_picture_output().
Add more attributes for raw decoding modes, i.e. directly through the
libgstvaapi helper library. In particular, add presentation timestamp,
duration and a couple of flags (interlaced, TFF, RFF, one-field).
Ensure libgstvaapi-glx*.so builds against libdl since dlsym() is used
to resolve glXGetProcAddress() from GLX libraries. This fix builds on
Fedora 17.
https://bugzilla.gnome.org/show_bug.cgi?id=698046
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Fix previous commit whereby gst_vaapi_decoder_get_codec_state() was
supposed to make GstVaapiDecoder own the return GstVideoCodecState
object. Only comment was updated, not the actual code.
Make gst_vaapi_decoder_get_codec_state() return the original codec state,
i.e. make the GstVaapiDecoder object own the return state so that callers
that want an extra reference to it would just gst_video_codec_state_ref()
it before usage. This aligns the behaviour with what we had before with
gst_vaapi_decoder_get_caps().
This is an ABI incompatible change, library major version was bumped from
previous release (0.5.2).
Fix the name of the plug-in element reported to gst-inspect-1.0. i.e. we
need an explicit definition for GStreamer >= 1.0 because the GST_PLUGIN_DEFINE
incorrectly uses #name for creating the plug-in name, instead of using macro
expansion (and let further expansion of macros) through e.g. G_STRINGIFY().
Fix make dist to allow build for either GStreamer 0.10 or 1.0. i.e. make
sure to include all source files in either case while generating source
tarballs.
Introduce a new configure option --with-gstreamer-api that determines
the desired GStreamer API to use. By default, GStreamer 1.0 is selected.
Also integrate more compatibility glue into gstcompat.h and plugins.
This integrates support for GStreamer API >= 1.0 only in the libgstvaapi
core decoding library. The changes are kept rather minimal here so that
the library retains as little dependency as possible on core GStreamer
functionality.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Drop the following functions that are now obsolete:
- gst_vaapi_context_get_surface()
- gst_vaapi_context_put_surface()
- gst_vaapi_context_find_surface_by_id()
- gst_vaapi_surface_proxy_new()
- gst_vaapi_surface_proxy_get_context()
- gst_vaapi_surface_proxy_set_context()
- gst_vaapi_surface_proxy_set_surface()
This is an API change.
Now that the surface pool is reference counted in the surface proxy wrapper,
we can safely ignore surface size checks in gst_vaapi_decoder_ensure_context().
Besides, this check is already performed in gst_vaapi_context_reset_full().
Introduce gst_vaapi_surface_proxy_new_from_pool() to allocate a new surface
proxy from the context surface pool. This change also makes sure to retain
the parent surface pool in the proxy.
Besides, it was also totally useless to attach/detach parent context to
VA surface each time we acquire/release it. Since the whole context owns
all associated VA surfaces, we can mark this as such only once and for all.
Move GstVaapiVideoMeta from core libgstvaapi decoding library to the
actual plugin elements. That's only useful there. Also inline reference
counting code from GstVaapiMiniObject.
Make sure libgstvaapi core decoding library doesn't include un-needed
dependencies. So, move out GstVaapiVideoConverterGLX to plugins instead.
Besides, even if the vaapisink element is not used, we are bound to have
a correctly populated GstSurfaceBuffer from vaapidecode.
Also clean-up the file along the way.
In decode_codec_data(), force initialization of format to zero so that
we can catch up cases where codec-data has neither "format" nor "wmvversion"
fields, thus making it possible to gracefully fail in this case.
Add a new GstVaapiDecoder::decode_codec_data() hook to actually decode
codec-data in the decoder sub-class. Provide a common shared helper
function to do the actual work and delegating further to the sub-class.
Drop GstVaapiDecoderUnit buffer field (GstBuffer) since it's totally
useless nowadays as creating sub-buffers doesn't bring any value. It
actually means more memory allocations. We can't do without that in
JPEG and MPEG-4:2 decoders.
Use gst_element_class_set_static_metadata() from GStreamer 1.0, which
basically is the same as gst_element_class_set_details_simple() in
GStreamer 0.10 context.
Use newer gst_video_overlay_rectangle_get_pixels_unscaled_raw() helper
function with GStreamer 0.10 compatible semantics, or that tries to
approach the current meaning. Basically, this is also just about moving
the helper to gstcompat.h.
Force luma_log2_weight_denom and chroma_log2_weight_denom to zero if
there is no pred_weight_table() that was parsed.
This is a workaround for the VA intel-driver on Ivy Bridge.
g_async_queue_timeout_pop() appeared in glib 2.31.18. Implement it as
g_async_queue_timed_pop() with a GTimeVal as the final time to wait for
new data to arrive in the queue.
There shall be only one place to call decode_current_picture(), and this
is in the end_frame() hook. The EOS unit is processed after end_frame()
so this means we cannot have a valid picture to decode/output at this
point.
Improve robustness when some expected packets where not received yet
or that were not correctly decoded. For example, don't try to decode
a picture if there was no valid sequence or picture headers.
Fix gst_vaapi_decoder_get_surface() to only return frames with a valid
surface proxy, i.e. with a valid VA surface. This means that any frame
marked as decode-only is simply skipped.
If the decoder was not able to decode a frame because insufficient
information was available, e.g. missing sequence or picture header,
then allow the frame to be gracefully dropped without generating
any error.
It is also possible that a frame is not meant to be displayed but
only used as a reference, so dropping that frame is also a valid
operation since GstVideoDecoder base class has extra references to
that GstVideoCodecFrame that needs to be released.
The Wayland API is not fully thread-safe and client applications shall
perform locking themselves on key functions. Besides, make sure to
release the lock if the _render() function fails.
Introduce gst_vaapi_window_wayland_sync() helper function to wait for
the completion of the redraw request. Use it in _render() function to
actually block until the previous draw request is completed.
The redraw callback needs to be attached to the surface prior to the
commit. Otherwise, the callback notifies the next surface repaint,
which is not the desired behaviour. i.e. we want to be notified for
the surface we have just filled.
Another isse was the redraw_pending was reset before the actual completion
of the frame redraw callback function, thus causing concurrency issues.
e.g. the callback could have been called again, but with a NULL buffer.
When the Wayland display is shared, we still have to create our own local
shell and compositor objects, since they are not propagated from the cache.
Likewise, we also need to determine the display size or vaapisink would
fail to account for the display aspect ratio, and will try to create a 0x0
window.
Fix build with newer VC-1 codecparser where dqsbedge was renamed to
dqbedge, and now represents either DQSBEDGE or DQDBEDGE depending on
the actual value of DQPROFILE.
Fix size of encapsulated BDUs since GstVC1BDU.size actually represents
the size of the BDU data, starting from offset, i.e. after any start
code is parsed.
This fixes a buffer overflow during the unescaping process.
The AVI demuxer (avidemux) does not set a proper "format" attribute
to the generated caps. So, try to recover the video codec format from
the "wmvversion" property instead.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Don't create temporary GstBuffers for all decoder units, even if they
are lightweight "sub-buffers", since it is not really necessary to keep
the buffer data around.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Use GstVaapiDpb interface instead of maintaining our own prev and next
picture pointers. While doing so, try to derive a sensible POC value.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Avoid usage of goto. Simplify decode_step() process to first accumulate all
pending buffers into the GstAdapter, and then parse and decode units from
that input adapter. Stop the process once a frame is fully decoded or an
error occurred.
Make sure we always have a free surface left to use for decoding the
current frame. This means that decode_step() has to return once a frame
gets decoded. If the current adapter contains more buffers with valid
frames, they will get parsed and decoded on subsequent iterations.
Keep only one DPB interface and rename gst_vaapi_dpb2_get_references()
to gst_vaapi_dpb_get_neighbours() so that to retrieve pictures in DPB
around the specified picture POC.
Move GstVaapiDpbMpeg2 API to a more generic version that could also be
useful to other decoders that require 2 reference pictures, e.g. VC-1.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Fix support for global-alpha subpictures. The previous changes brought
the ability to check for GstVideoOverlayRectangle changes by comparing
the underlying pixel buffer pointers. If sequence number and pixel data
did not change, then this is an indication that only the global-alpha
value changed. Now, try to update the underlying VA subpicture global-alpha
value.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Don't re-upload VA subpicture if only the render rectangle changed.
Rather deassociate the subpicture and re-associate it with the new
render rectangle.
A GstVideoOverlayRectangle is created whenever the underlying pixels data
change. However, when global-alpha is supported, it is possible to re-use
the same GstVideoOverlayRectangle but with a change to the global-alpha
value. This process causes a change of sequence number, so we can no longer
check for that.
Still, if sequence numbers did not change, then there was no change in
global-alpha either. So, we need a way to compare the underlying GstBuffer
pointers. There is no API to retrieve the original pixels buffer from
a GstVideoOverlayRectangle. So, we use the following heuristics:
1. Use gst_video_overlay_rectangle_get_pixels_unscaled_argb() with the same
format flags from which the GstVideoOverlayRectangle was created. This
will work if there was no prior consumer of the GstVideoOverlayRectangle
with alternate (non-"native") format flags.
2. In overlay_rectangle_has_changed_pixels(), we have to use the same
gst_video_overlay_rectangle_get_pixels_unscaled_argb() function but
with flags that match the subpicture. This is needed to cope with
platforms that don't support global-alpha in HW, so the gst-video
layer takes care of that and fixes this up with a possibly new
GstBuffer, and hence pixels data (or) in-place by caching the current
global-alpha value applied. So we have to determine the rectangle
was previously used, based on what previous flags were used to
retrieve the ARGB pixels buffer.
We previously assumed that an overlay composition changed if the number
of overlay rectangles in there actually changed, or that the rectangle
was updated, and thus its seqnum was also updated.
Now, we can cope with cases where the GstVideoOverlayComposition grew
by one or a few more overlay rectangles, and the initial overlay rectangles
are kept as is.
Create the GPtrArray once in the _init() function and destroy it only
in the _finalize() function. Then use overlay_clear() to remove all
subpicture associations for intermediate updates, don't recreate the
GPtrArray.
Make GstVaapiOverlayRectangle a reference counted object. Also make
sure that overlay_rectangle_new() actually creates and associates the
VA subpicture.
Handle global-alpha from GstVideoOverlayComposition API. Likewise,
the same code path could also work for premultiplied-alpha but this
was not tested.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Add the necessary helpers in GstVaapiDisplay to determine whether subpictures
with global alpha are supported or not. Also add accessors in GstVaapiSubpicture
to address this feature.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Add premultiplied-alpha and global-alpha feature flags, along with converters
between VA-API and gstreamer-vaapi definitions. Another round of helpers is
also necessary for GstVideoOverlayComposition API.
Use GPOINTER_TO_SIZE() instead of GPOINTER_TO_UINT() while manipulating
pointers. The latter is meant to be 32-bit only, not uintptr_t like size.
Only a gsize can hold all bits of a pointer.
Thanks to Ouping Zhang for spotting this error.
Heuristic: if the second start-code is available, check whether that
one marks the start of a new frame because e.g. this is a sequence
or picture header. This doesn't save much, since we already cache the
results.
Accelerate scan for start codes by skipping up to 3 bytes per iteration.
A start code prefix is defined by the following bytes: 00 00 01. Thus,
for any group of 3 bytes (xx yy zz), we have the following possible cases:
1. If zz != 1, this cannot be a start code, then skip 3 bytes;
2. If yy != 0, this cannot be a start code, then skip 2 bytes;
3. If xx != 0 or zz != 1, this cannot be a start code, then skip 1 byte;
4. xx == 00, yy == 00, zz == 1, we have match!
This algorithm requires to peek bytes from the adapter. This increases the
amount of bytes copied to a temporary buffer, but this process is much faster
than scanning for all the bytes and using shift/masks. So, overall, this is
a win.
Move parsing back to decoding step, but keep functions separate for now.
This is needed for future optimizations that may introduce some meta data
for parsed info attached to codec frames.
Optimize pre-allocation of decoder units, thus avoiding un-necessary
memory reallocations. The heuristic used is that we could have around
one slice unit per macroblock line.
Use a GArray to hold decoder units in a frame, instead of a single-linked
list. This makes 'append' calls faster, but not that much. At least, this
makes things clearer.
Allocate decoder unit earlier in the main parse() function and don't
delegate this task to derived classes. The ultimate purpose is to get
rid of dynamic allocation of decoder units.
The SPS, PPS and slice headers are not fully zero-initialized in the
codecparsers/ library. Rather, the standard upstream behaviour is to
initialize only certain syntax elements with some inferred values if
they are not present in the bitstream.
At the gstreamer-vaapi decoder level, we need to further initialize
certain syntax elements with some sensible default values so that to
not complicate VA drivers that just pass those verbatim to the HW,
and also avoid an memset() of the whole decoder unit.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Make GstVaapiVideoBuffer a simple wrapper for video meta. This buffer is
no longer necessary but for compatibility with GStreamer 0.10 APIs or users
expecting a GstSurfaceBuffer like Clutter.
Use PTS value computed by the decoder, which could also be derived from
the GstVideoCodecFrame PTS. This makes it possible to fix up the PTS if
the original one was miscomputed or only represented a DTS instead.
Create a new VA context if the encoded surface size changes because we
need to keep the underlying surface pool until the last one was released.
Otherwise, either of the following cases could have happened: (i) release
a VA surface to an inexistent pool, or (ii) release VA surface to an
existing surface pool, but with different size.
Avoid creating a GstBuffer for slice data. Rather, directly use the codec
frame input buffer data. This is possible because the codec frame is valid
until end_frame() where we submit the VA buffers for decoding. Anyway, the
slice data buffer is copied into the VA buffer when it is created.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Parse slice() header and first macroblock position earlier in _parse()
function instead of waiting for the _decode() stage. This doesn't change
anything but readability.
Introduce new GstVaapiDecoderUnitMpeg2 object, which holds the standard
GstMpegVideoPacket and additional parsed header info. Besides, we now
parse as early as in the _parse() function so that to avoid un-necessary
creation of sub-buffers in _decode() for video packets that are not slices.
Theory of operations: all units marked as "slice" are moved to the "units"
list. Since this list only contains slice data units, the prev_slice pointer
was removed. Besides, we now maintain two extra lists of units to be decoded
before or after slice data units.
In particular, all units in the "pre_units" list will be decoded before
GstVaapiDecoder::start_frame() is called and units in the "post_units"
list will be decoded after GstVaapiDecoder::end_frame() is called.
Make GstVaapiSurfaceProxy only a thin wrapper around a VA context and a
VA surface. i.e. drop any other attribute like timestamp, duration,
interlaced or top-field-first.
Maintain decoded surfaces as GstVideoCodecFrame objects instead of
GstVaapiSurfaceProxy objects. The latter will tend to be reduced to
the strict minimum: a context and a surface.
Add new gst_vaapi_decoder_get_frame() function meant to be used with
gst_vaapi_decoder_decode(). The purpose is to return the next decoded
frame as a GstVideoCodecFrame and the associated GstVaapiSurfaceProxy
as the user-data object.
Decoder units were zero-initialized, including the SPS/PPS/slice headers.
The latter don't require zero-initialization since the codecparsers/ lib
will do so for key variables already. This is not a great value per se but
at least it makes it possible to check whether the default initialization
decisions made in the codecparsers/ lib were right or not.
This can be reverted if this exposes too many issues.
Drop explicit initialization of most fields that are implicitly set to
zero. Drop helper macros for casting to GstVaapiPictureH264 or
GstVaapiFrameStore. Also remove some useless checks for NULL pointers.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Introduce new GstVaapiDecoderUnitH264 object, which holds the standard
NAL unit header (GstH264NalUnit) and additional parsed header info.
Besides, we now parse headers as early as in the _parse() function so
that to avoid un-necessary creation of sub-buffers in _decode() for
NAL units that are not slices.
This is a performance win by ~+1.1% only.
Use standard GstVideoCodecState throughout GstVaapiDecoder and expose
it with a new gst_vaapi_decoder_get_codec_state() function. This makes
it possible to drop picture size (width, height) information, framerate
(fps_n, fps_d) information, pixel aspect ratio (par_n, par_d) information,
and interlace mode (is_interlaced field).
This is a new API with backwards compatibility maintained. In particular,
gst_vaapi_decoder_get_caps() is still available.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Align gst_vaapi_decoder_get_surface() semantics with the rest of the
API. That is, return a GstVaapiDecoderStatus and the decoded surface
as a handle to GstVaapiSurfaceProxy in parameter.
This is an API/ABI change.
Introduce new decoding process whereby a GstVideoCodecFrame is created
first. Next, input stream buffers are accumulated into a GstAdapter,
that is then passed to the _parse() function. The GstVaapiDecoder object
accumulates all parsed units and when a complete frame or field is
detected, that GstVideoCodecFrame is passed to the _decode() function.
Ultimately, the caller receives a GstVaapiSurfaceProxy if decoding
process was successful.
The start_frame() hook is called prior to traversing all decode-units
for decoding. The unit argument represents the first slice in the frame.
Some codecs (e.g. H.264) need to wait for the first slice in order to
determine the actual VA context parameters.
Split decoding process into two steps: (i) parse incoming bitstreams
into simple decoder-units until the frame or field is complete; and
(ii) decode the whole frame or field at once.
This is an ABI change.
Introduce a new GstVaapiDecoderFrame that is just a list of decoder units
(GstVaapiDecoderUnit objects) that constitute a frame. This object is just
an extension to GstVideoCodecFrame for VA decoder purposes. It is available
as the user-data member element.
This is a libgstvaapi internal object.
Introduce GstVaapiDecoderUnit which represents a fragment of the source
stream to be decoded. For instance, a decode-unit will be a NAL unit for
H.264 streams, an EBDU for VC-1 streams, and a video packet for MPEG-2
streams.
This is a libgstvaapi internal object.
GstVaapiSurfaceProxy does not use any particular functionality from
GObject. Actually, it only needs a basic object type with reference
counting.
This is an API and ABI change.
Introduce a new reference counted object that is very lightweight and
also provides flags and user-data functionalities. Initialization and
finalization times are reduced by up to a factor 5x vs GstMiniObject
from GStreamer 0.10 stack.
This is a libgstvaapi internal object.
Fix decode_slice() to ensure a VA context exists prior to creating a
new GstVaapiSliceH264, which invokes vaCreateBuffer() with some VA
context ID. i.e. the latter was not initialized, thus causing failures
on Cedar Trail for example.
Use glib >= 2.32 semantics for GMutex and GRecMutex wrt. initialization
and termination. Basically, the new mutex objects can be used as static
mutex objects from the deprecated APIs, e.g. GStaticMutex and GStaticRecMutex.
This fixes symbol clashes between the gst-vaapi built-in codecparsers/
library and the system-provided one, mainly used by videoparses/. Now,
only symbols with the gst_vaapi_* prefix will be exported, if they are
not marked as "hidden" to libgstvaapi.
Fix gst_vaapi_image_map() to return TRUE and the GstVaapiImageRaw
structure correctly filled in if the image was already mapped.
Likewise, make gst_vaapi_image_unmap() return TRUE if the image
was already unmapped.
Fix reference leak of surface and image in GstVaapiVideoBuffer wrapper,
thus resulting on actual memory leak of GstVaapiImage when using them
for downloads/uploads from VA surfaces and more specifically surfaces
when the pipeline is shutdown. i.e. vaTerminate() was never called
because the resources were not unreferenced, and thus not deallocated
in the end.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
The picture size signalled by sps->{width,height} is the actual size with
cropping applied, not the original size derived from pic_width_in_mbs_minus1
and pic_height_in_map_units_minus1. VA driver expects that original size,
uncropped.
There is another issue pending: frame cropping information needs to be
taken care of.
Invoke gst_mpeg_video_finalise_mpeg2_sequence_header() to get the
correct PAR values. While doing so, require a newer version of the
bitstream parser library.
Note: it may be necessary to also parse the Sequence_Display_Extension()
header.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
This patch updates to relect the 1.0 version of the protocol. The main
changes are the switch to wl_registry for global object notifications
and the way that the event queue and file descriptor is processed.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
git am got confused somehow, though the end result doesn't change at
all since we require both SPS and PPS to be parsed prior to decoding
the first slice.
Only start decoding slices when at least one SPS and PPS got activated.
This fixes cases when a source represents a substream of another stream
and no SPS and PPS was inserted before the first slice of the generated
substream.
... for interlaced streams. The short_ref[] and long_ref[] arrays may
contain up to 32 fields but VA ReferenceFrames[] array expects up to
16 reference frames, thus including both fields.
Fix decoding of interlaced streams when adaptive_ref_pic_marking_mode_flag
is equal to 1, i.e. when memory management control operations are used. In
particular, when field_pic_flag is set to 0, the new reference flags shall
be applied to both fields.
Decoded frames are only output when they are complete, i.e. when both
fields are decoded. This also means that the "interlaced" caps is not
propagated to vaapipostproc or vaapisink elements. Another limitation
is that interlaced bitstreams with MMCO are unlikely to work.
Split remove_reference_at() into a function that actually removes the
specified entry from the short-term or long-term reference picture array,
and a function that sets reference flags to the desired value, possibly
zero. The latters marks the picture as "unused for reference".
Fix gst_vaapi_picture_new_field() to preserve the original picture type.
e.g. gst_vaapi_picture_new_field() with a GstVaapiPictureH264 argument
shall generate a GstVaapiPictureH264 object.
Introduce new `structure' field to the H.264 specific picture structure
so that to simplify the reference picture marking process. That local
picture structure is derived from the original picture structure, as
defined by the syntax elements field_pic_flag and bottom_field_flag.
Move DPB flush up if the current picture to decode is an IDR. Besides,
don't bother to check for IDR pictures in dpb_add() function since an
explicit DPB flush was already performed in this case.
... to build the short_ref[] and long_ref[] lists from the DPB, instead
of maintaining them separately. This avoids refs/unrefs while making it
possible to generate the list based on the actual picture structure.
This also ensures that the list of generated ReferenceFrames[] actually
matches what reference frames are available in the DPB. i.e. short_ref[]
and long_ref[] entries are implied from the DPB, so there is no risk of
having "dangling" references.
Use the POC member available in the GstVaapiPicture base class and
get rid of the dependency on the local VAPictureH264 TopFieldOrderCnt
and BottomFieldOrderCnt. Rather, use a simple field_poc[] array
initialized to INT_MAX, so that to simplify picture POC calculation
for non frame pictures.
Further get rid of GstVaapiPictureH264-local VAPictureH264.flags for
reference bits, thus simplifying the reference picture marking process
to only track a single set of reference flags. Also introduce a new
long_term_frame_idx member.
Add vaapi_fill_picture() helper function to convert GstVaapiPictureH264
to VAPictureH264 structure. This is preparatory work to get rid of the
local VAPictureH264 member in GstVaapiPictureH264.
Delay ensure_context() until we actually need a VA context for allocating
new VA surfaces, and then GstVaapiPictures, but also when a real activation
of a new picture parameter set occurs, thus also implying an activation
of the related sequence parameter set.
The most important thing was to drop the global pps and sps pointers since
they may not have matched the currently activated picture parameter or
sequence parameter sets at the specified decode point.
Anoter positive side-effect is that this cleans up all occurrences of
decode_current_picture() to only keep those useful in decode_picture(),
before a new picture is allocated, or in decode_sequence_end() when
an end-of-stream or end-of-sequence condition occurred.
... aka fix regression from efaab79. In particular, ScalingList8x8[]
array was partially copied to the VAIQMatrixBufferH264. While we are
at it, also improve bounds checking and avoid copying 8x8 scaling
lists if transform_8x8_mode_flag is set to 0.
Don't copy scaling lists twice to an intermediate state. Rather, directly
use the scaling lists from GstH264PPS since they would match those provided
by SPS header, if necessary. i.e. if PPS-specific scaling lists are not
available in the bitstream.
Remove exit_picture() and exit_picture_poc() since PicOrderCnt(CurrPic)
is now updated accordingly to the standard. Besides, MMCO = 5 specific
operations are moved up to exec_ref_pic_marking_adaptive_mmco_5().
Fix adaptive memory control decoded reference picture marking process
implementation for operations 2 to 6, thus also fixing support for
long-term reference pictures.
This change only splits each individual MMCO handler into several functions
dedicated for each operation. This is needed to perform further work later
on.
Allow build with strict DPB ordering mode whereby evicted entries
are replaced by the next entries, in order instead of optimizing
it away with the last entry in the DPB.
This is only useful for debugging purpose, against a reference SW
decoder for example.
GstH264SliceHdr.n_emulation_prevention_bytes is bound to exist now that
a newer version of codecparsers/ are used if the system provided one is
now recent enough to have those required extensions.
Propagate pixel-aspect-ratio determined by the GStreamer codecparser
from the sequence headers.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Split decode_buffer() into the core infrastructure that determines
the NAL units contained in the adapter and the actual function that
decodes the NAL unit.