Exclusively use VA drivers that support raw packed headers for encoding.
i.e. simply submit packed headers Subset SPS and Prefix NAL units. This
provides for better compatibility accross the various VA drivers and HW
generations since no particular API is needed beyond what readily exists.
Since we are encoding each view independently from each other, we
need a higher number of pre-allocated surfaces to be used as the
reconstructed frames. For Stereo High profile encoding, this means
to effectively double the number of frames to be stored in the DPB.
Add initial support for Subset SPS, Prefix NAL and Slice Extension NAL
for non-base-view streams encoding, and the usual SPS, PPS and Slice
NALs for base-view encoding.
The H.264 Stereo High profile encoding mode will be turned on when the
"num-views" parameter is set to 2. The source (raw) YUV frames will be
considered as Left/Right view, alternatively.
Each of the two views has its own frames reordering pool and reference
frames list management system. Inter-view references are not supported
yet, so the views are encoded independently from each other.
Signed-off-by: Li Xiaowei <xiaowei.a.li@intel.com>
[limited to Stereo High profile per the definition of MAX_NUM_VIEWS]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Create structures to maintain the reference frames list (RefPool) and
frames reordering (ReorderPool) logic.
This is a prerequisite for H.264 MVC support.
Signed-off-by: Li Xiaowei <xiaowei.a.li@intel.com>
Add provisions to write subset SPS headers to the bitstream in view
to supporting the H.264 MVC specification.
This assumes the libva "staging" branch is in use.
Signed-off-by: Li Xiaowei <xiaowei.a.li@intel.com>
Optimize lookups of view ids / view order indices by caching the result
of the calculatiosn right into the GstVaapiParserInfoH264 struct. This
terribly simplifies is_new_access_unit() and find_first_field() functions.
Add safe fallbacks for MVC profiles:
- all MultiView High profile streams with 2 views at most can be decoded
with a Stereo High profile compliant decoder ;
- all Stereo High profile streams with only progressive views can be
decoded with a MultiView High profile compliant decoder ;
- all drivers that support slice-level decoding could normally support
MVC profiles when the DPB holds at most 16 frames.
In order to have a stricter conforming implementation, we need to carefully
detect access unit boundaries. Additional operations could be necessary to
perform at those boundaries.
Detect the first VCL NAL unit of a picture for MVC, based on the
view_id as per H.7.4.1.2.4. Note that we only need to detect new
view components.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Always cache the previous NAL unit so that we could check whether
there is a Prefix NAL unit immediately preceding the current slice
or IDR NAL unit. In that case, the NAL unit metadata is copied into
the current NAL unit. Otherwise, some default values are inferred,
tentatively. e.g. view_id shall be set to 0 and inter_view_flag to 1.
[infer default values for slice if previous NAL was not a Prefix]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Allow decoding for base views of MVC encoded streams. For now, just skip
the slice extension and prefix NAL units, and skip non-base view frames.
Signed-off-by: Xiaowei Li <xiaowei.a.li@intel.com>
[fixed memory leak, improved check for MVC NAL units]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Factor out process by which the decoded picture with the lowest POC
is found, and possibly output. Likewise, the storage and marking of
a reference decoded, or non-reference decoded picture, into the DPB
could also be simplified as they mostly share the same operations.
Make init_picture_ref_lists() more consistent with other functions
related to the reference marking process by supplying the current
picture as argument.
Add gst_vaapi_display_get_vendor_string() helper function to query
the underlying VA driver name. The display object owns the resulting
string, so it shall not be deallocated.
That function is thread-safe. It could be used for debugging purposes,
for instance.
Make sure to initialize one GstVaapiDisplay at a time, even in threaded
environments. This makes sure the display cache is also consistent
during the whole display creation process. In the former implementation,
there were risks that display cache got updated in another thread.
Add support for dynamic growth of the VA surfaces pool. For decoding,
this implies the recreation of the underlying VA context, as per the
requirement from VA-API. Besides, only increases are supported, not
shrinks.
It is a requirement from VA-API specification that the VA context got
from vaCreateContext(), for decoding purposes, binds the supplied set
of VA surfaces. This means that if the set of VA surfaces is to be
changed for the current decode session, then the VA context needs to
be recreated with the new set of VA surfaces.
Complement fix committed as e95a42e.
The H.264 AVC standard has to say: if the field is part of a reference
frame or a complementary reference field pair, and the other field of
the same reference frame or complementary reference field pair is also
marked as "used for long-term reference", the reference frame or
complementary reference field pair is also marked as "used for long-term
reference" and assigned LongTermFrameIdx equal to long_term_frame_idx.
This fixes decoding of MR9_BT_B in strict mode.
https://bugs.freedesktop.org/show_bug.cgi?id=64624https://bugzilla.gnome.org/show_bug.cgi?id=724518
Request the correct chroma format for decoding grayscale streams.
i.e. make lookups of the VA chroma format more generic, thus possibly
supporting more formats in the future.
This means that, if a VA driver doesn't support grayscale formats,
it is now going to fail. We cannot safely assume that maybe grayscale
was implemented on top of some YUV 4:2:0 with the chroma components
all set to 0x80.
Fix reference picture marking process with memory_management_control_op
set to 3 and 6, i.e. assign LongTermFrameIdx to a short-term reference
picture, or the current picture.
This fixes decoding of FRExt_MMCO4_Sony_B.
https://bugs.freedesktop.org/show_bug.cgi?id=64624https://bugzilla.gnome.org/show_bug.cgi?id=724518
[squashed, edited to use GST_VAAPI_PICTURE_IS_COMPLETE() macro]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
The initialization of reference picture lists (8.2.4.2) applies to all
slices. So, the RefPicList0/1 lists need to be constructed prior to
each slice submission to the HW decoder.
This fixes decoding of video sequences where frames are encoded with
multiple slices of different types, e.g. 4 slices in this order I, P,
I, and P. More precisely, CABAST3_Sony_E and CABASTBR3_Sony_B.
https://bugzilla.gnome.org/show_bug.cgi?id=724518
When NAL units of type 13 (SPS extension) or type 19 (auxiliary slice)
are present in a video, decoders shall perform the (optional) decoding
process specified for these NAL units or shall ignore them (7.4.1).
Implement option 2 (skip) for now, as alpha composition is not
supported yet during the decoding process.
This fixes decoding of the primary coded video in alphaconformanceG.
https://bugzilla.gnome.org/show_bug.cgi?id=703928https://bugzilla.gnome.org/show_bug.cgi?id=728869https://bugzilla.gnome.org/show_bug.cgi?id=724518
[skip NAL units earlier, i.e. at parsing time]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
When MVC slice NAL units (coded slice extension and prefix NAL) are
present, the number of NAL header bytes is 3, not 1 as usual.
Signed-off-by: Li Xiaowei <xiaowei.a.li@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
At the time the first VCL NAL unit of a primary coded picture is found,
and if that NAL unit was parsed to be an SPS or PPS, then the entries
in the parser may have been overriden. This means that, when the picture
is to be decoded, slice_hdr->pps could point to an invalid (the next)
PPS entry.
So, one way to solve this problem is to not use the parser PPS and
SPS info but rather maintain our own activation chain in the decoder.
https://bugzilla.gnome.org/show_bug.cgi?id=724519https://bugzilla.gnome.org/show_bug.cgi?id=724518
Retain the SEI messages that were parsed from the access unit until we
have completely decoded the current frame. This is done so that we can
peek at that data whenever necessary during decoding. e.g. for exposing
3D stereoscopic information at a later stage.
Fix support for grayscale encoded video clips, and possibly others if
the underlying driver supports the non-YUV 4:2:0 formats. i.e. defer
the decision that a surface with the desired chroma format is not
supported to the actual VA driver implementation.
https://bugzilla.gnome.org/show_bug.cgi?id=728144
Don't force allocation of VA surfaces in YUV 4:2:0 format. Rather, allow
for the upper layer to specify the desired chroma type. If the chroma
type field is not set (or yields zero), then YUV 4:2:0 format is used
by default.
Fix possible bug when a per-segment deblocking filter level value
needs to be set in non-absolute mode, i.e. when the loop filter update
value is negative in delta mode.
Also clamp the resulting filter level value to 0..63 range.
Improve condition to disable the loop filter. The previous heuristic
used to check all filter levels, for all segments. It turns out that
only the base filter_level value defined in the frame header needs
to be checked.
This fixes 00-comprehensive-013.
Fix generation of source tarballs when certain conditionals are not
met. e.g. always include all buildable codecparsers sources in the
distribution tarball, fix plug-in element sources set to include X11
and encoder bits.
The built-in libvpx serves multiple purposes, among which the most
important ones could be: track the most up-to-date, and optimized,
range decoder; allow for future hybrid implementations (non-VLD);
and have a completely independent range decoder implementation.
Apply correct patch from fd.o #722760 to fix several issues: update the
license terms to LGPLv2.1+, fix dependencies to built-in libvpx and fix
make dist.
Add libvpx submodule that tracks the upstream version 1.3.0. This is
needed to build a libgstcodecparsers_vpx.so library with all symbols
placed into the GSTREAMER namespace.
The gst_h264_parse_parse_sei() function now returns an array of SEI
messages, instead of a single SEI message. Reason: it is allowed to
have several SEI messages packed into a single SEI NAL unit, instead
of multiple NAL units.
8fadf40 h264: Fix multiple SEI messages in one SEI RBSP parsing.
644825f h265: remove trailling 0x00 bytes as the spec doesn't allow them
95f9f0f h264: remove trailling 0x00 bytes as the spec doesn't allow them
766007b h265: Initialize pointer correctly that is never assigned but freed in error cases
8ec5816 h265: Fix segfault when parsing HRD parameter
5b1730f h265: Fix segfault when parsing VPS
983b7f7 h265: prevent to overrun chroma_weight_l0_flag
7ba641d h265: Fix debug output
d9f9f9b h264: not all startcodes should have 3-byte 0 prefix
Fix parser and decoder state to sync at the right locations. This is
because we could reset the parser state, while the decoder state was
not copied yet, e.g. when parsing several NAL units from multiple frames
whereas the current frame was not decoded yet.
This is a regression brought in by commit 6fe5496.
Fix gstreamer-vaapi includedir for GStreamer 1.2 setups. i.e. use
the pkgconfig version (1.0) instead of the intended API version (1.2).
libgstvaapi1.0-dev and libgstvaapi1.2-dev packages will now conflict,
as would core GStreamer 1.0 and GStreamer 1.2 dev packages anyway.
Add gst_vaapi_get_config_attribute() helper function that takes a
GstVaapiDisplay and the rest of the arguments with VA types. The aim
is to have thread-safe VA helpers by default.
Make sure to configure the encoder with the set of packed headers we
intend to generate and submit. i.e. make selection of packed headers
to submit more robust.
Cache the first compatible GstVaapiProfile found if the encoder is not
configured yet. Next, factor out the code to check for the supported
rate-control modes by moving out vaGetConfigAttributes() to a separate
function, while also making sure that the attribute type is actually
supported by the encoder.
Also fix the default set of supported rate control modes to not the
"none" variant. It's totally useless to expose it at this point.
Introduce GstVaapiContextUsage so that to explicitly determine the
usage of a VA context. This is useful in view to simplifying the
creation of VA context for VPP too.
Unknown attributes, or attributes that are not supported for the given
profile/entrypoint pair have a return value of VA_ATTRIB_NOT_SUPPORTED.
So, return failure in this case.
Move GstVideoOverlayComposition handling to separate source files.
This helps keeing GstVaapiContext core implementation to the bare
minimal, i.e. simpy helpers to create a VA context and handle pool
of associated VA surfaces.
Improve documentation and debug messages. Clean-up APIs, i.e. strip
them down to the minimal set of interfaces. They are private, so no
need expose getters for instance.
Make sure that libgstvaapi private headers remain internally used to
build libgstvaapi libraries only. All header dependencies were reviewed
and checks for IN_LIBGSTVAAPI definition were added accordingly.
Also rename GST_VAAPI_CORE definition to IN_LIBGSTVAAPI_CORE to keep
consistency.
Set a valid GstVideoCodecFrame.system_frame_number when decoding a
stream in standalone mode. While we are at it, improve the debugging
messages to also include that frame number.
When decoding failed, or that the frame was dropped, the associated
surface proxy is not guaranteed to be present. Thus, the GST_DEBUG()
message needs to check whether the proxy is actually present or not.
https://bugzilla.gnome.org/show_bug.cgi?id=722403
[fixed gst_vaapi_surface_proxy_get_surface_id() to return VA_INVALID_ID]
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Don't emit NAL HRD parameters for now in the SPS headers because the
SEI buffering_period() and picture_timing() messages are not handled
yet. Some additional changes are necessary to get it right.
https://bugzilla.gnome.org/show_bug.cgi?id=722734
Fix default CPB buffer size to something more reasonable (1500 ms)
and that still fits the level limits. This is a non configurable
property for now. The initial CPB removal delay is also fixed to
750 ms.
https://bugzilla.gnome.org/show_bug.cgi?id=722087
Round down the calculated, or supplied, bitrate (kbps) into a multiple
of the HRD bitrate scale factor. Use a bitrate scale factor of 64 so
that to have less losses in precision. Likewise, don't round up because
that could be a strict constraint imposed by the user.
Fix the level calculation involving bitrate limits. Since we are
targetting NAL HRD conformance, the check against MaxBR from the
Table A-1 limits shall involve cpbBrNalFactor depending on the
active profile.
Submit sequence parameter buffers only once, or when the bitstream
was reconfigured in a way that requires such. Always submit packed
sequence parameter buffers at I-frame period, if the VA driver needs
those.
https://bugzilla.gnome.org/show_bug.cgi?id=722737
Make sure to submit the packed headers only if the underlying VA driver
requires those. Currently, only handle packed sequence and picture
headers.
https://bugzilla.gnome.org/show_bug.cgi?id=722737
The VAEncSequenceParameterBuffer.ip_period value reprents the distance
between the I-frame and the next P-frame. So, this also accounts for
any additional B-frame in the middle of it.
This fixes rate control heuristics for certain VA drivers.
https://bugzilla.gnome.org/show_bug.cgi?id=722735
Fix level characterisation when the bitrate is automatically computed
from the active coding tools. i.e. ensure the bitrate once the profile
is completely characterized but before the level calculation process.
Document and rename a few functions here and there. Drop code that
caps num_bframes variable in reset_properties() since they shall
have been checked beforehand, during properties initialization.
Clean-up GstBitWriter related utility functions and simplify notations.
While we are at it, also make bitstream writing more robust should an
overflow occur. We could later optimize for writing headers capped to
their maximum possible size by using the _unchecked() helper variants.
Drop private header since it was originally used to expose internals
to the plugin element. The proper interface is now the properties API,
thus rendering private headers totally obsolete.
Add support for hue, saturation, brightness and constrat adjustments.
Also fix cap info local copy to match the really expected cap subtype
of interest.
https://bugzilla.gnome.org/show_bug.cgi?id=720376
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Change the submission order of VA objects so that to make that process
more logical. i.e. submit sequence parameter first, if any; next the
packed headers associated to sequece, picture or slices; and finally
the actual picture and associated slices.
Various clean-ups to improve consistency and readability: rename some
variables, drop unused macro definitions, drop initialization of vars
that are zero-initialized from the base class, drop un-necessary casts,
allocate GPtrArrays with a destroy function.
Fix frame cropping rectangle calculation to handle horizontal resolutions
that don't match a multiple of 16 pixels, but also the vertical resolution
that was incorrectly computed for progressive sequences too.
https://bugzilla.gnome.org/show_bug.cgi?id=722089
For non "Constant-QP" modes, we could provide more reasonable heuristics
for the target bitrate. In general, 48 bits per macroblock with all the
useful coding tools enable looks safe enough. Then, this rate is raised
by +10% to +15% for each coding tool that is disabled.
https://bugzilla.gnome.org/show_bug.cgi?id=719699
Add support for "high-compression" tuning option. First, determine the
largest supported profile by the hardware. Next, check any target limit
set by the user. Then, enable each individual coding tool based on the
resulting profile_idc value to use.
https://bugzilla.gnome.org/show_bug.cgi?id=719696
Allow user to precise the largest profile to use for encoding due
to target decoder constraints. For instance, if CABAC entropy coding
mode is requested by "constrained-baseline" profile only is desired,
then an error is returned during codec configuration.
Also make sure that the suitable profile we derived actually matches
what the HW can cope with.
https://bugzilla.gnome.org/show_bug.cgi?id=719694
Refine the heuristic to determine the maximum size of a coded buffer
to account for the exact number of slices. set_context_info() is the
last step during codec reconfiguration, no additional change is done
afterwards, so re-using the num_slices field here is fine.
https://bugzilla.gnome.org/show_bug.cgi?id=719953
Automatically derive the minimum profile and level to be used for
encoding, based on the activated coding tools. The encoder will
be trying to generate a bitstream that has the best chances to be
decoded on most platforms by default.
Also change the default profile to "constrained-baseline" so that
to ensure maximum compatibility when the stream is decoded.
https://bugzilla.gnome.org/show_bug.cgi?id=719691
Fix lookup for a suitable HW profile, as to be used by the underlying
hardware, based on heuristics that lead to characterize the SW profile,
i.e. the one used by the SW level encoding logic.
Also fix constraint_set0_flag (A.2.1) and constraint_set1_flag (A.2.2)
as they should respectively match the baseline and main profile.
https://bugzilla.gnome.org/show_bug.cgi?id=719827
The libgstvaapi core encoders are meant to support raw bitstreams only.
Henceforth, we are always producing a stream in "byte-stream" format.
However, the "codec-data" buffer which holds SPS and PPS headers is
always available. The "lengthSizeMinusOne" field is always set to 3
so that in-place "byte-stream" format to "avc" format conversion could
be performed.
Various clean-ups to improve consistency and readability: rename some
variables, drop unused macro definitions, drop initialization of vars
that are zero-initialized from the base class, drop un-necessary casts.
Fix lookup for a suitable HW profile, as to be used by the underlying
hardware, based on heuristics that lead to characterize the SW profile,
i.e. the one used by the SW level encoding logic.