Use GPOINTER_TO_SIZE() instead of GPOINTER_TO_UINT() while manipulating
pointers. The latter is meant to be 32-bit only, not uintptr_t like size.
Only a gsize can hold all bits of a pointer.
Thanks to Ouping Zhang for spotting this error.
Heuristic: if the second start-code is available, check whether that
one marks the start of a new frame because e.g. this is a sequence
or picture header. This doesn't save much, since we already cache the
results.
Accelerate scan for start codes by skipping up to 3 bytes per iteration.
A start code prefix is defined by the following bytes: 00 00 01. Thus,
for any group of 3 bytes (xx yy zz), we have the following possible cases:
1. If zz != 1, this cannot be a start code, then skip 3 bytes;
2. If yy != 0, this cannot be a start code, then skip 2 bytes;
3. If xx != 0 or zz != 1, this cannot be a start code, then skip 1 byte;
4. xx == 00, yy == 00, zz == 1, we have match!
This algorithm requires to peek bytes from the adapter. This increases the
amount of bytes copied to a temporary buffer, but this process is much faster
than scanning for all the bytes and using shift/masks. So, overall, this is
a win.
Move parsing back to decoding step, but keep functions separate for now.
This is needed for future optimizations that may introduce some meta data
for parsed info attached to codec frames.
Optimize pre-allocation of decoder units, thus avoiding un-necessary
memory reallocations. The heuristic used is that we could have around
one slice unit per macroblock line.
Use a GArray to hold decoder units in a frame, instead of a single-linked
list. This makes 'append' calls faster, but not that much. At least, this
makes things clearer.
Allocate decoder unit earlier in the main parse() function and don't
delegate this task to derived classes. The ultimate purpose is to get
rid of dynamic allocation of decoder units.
The SPS, PPS and slice headers are not fully zero-initialized in the
codecparsers/ library. Rather, the standard upstream behaviour is to
initialize only certain syntax elements with some inferred values if
they are not present in the bitstream.
At the gstreamer-vaapi decoder level, we need to further initialize
certain syntax elements with some sensible default values so that to
not complicate VA drivers that just pass those verbatim to the HW,
and also avoid an memset() of the whole decoder unit.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Make GstVaapiVideoBuffer a simple wrapper for video meta. This buffer is
no longer necessary but for compatibility with GStreamer 0.10 APIs or users
expecting a GstSurfaceBuffer like Clutter.
Use PTS value computed by the decoder, which could also be derived from
the GstVideoCodecFrame PTS. This makes it possible to fix up the PTS if
the original one was miscomputed or only represented a DTS instead.
Create a new VA context if the encoded surface size changes because we
need to keep the underlying surface pool until the last one was released.
Otherwise, either of the following cases could have happened: (i) release
a VA surface to an inexistent pool, or (ii) release VA surface to an
existing surface pool, but with different size.
Avoid creating a GstBuffer for slice data. Rather, directly use the codec
frame input buffer data. This is possible because the codec frame is valid
until end_frame() where we submit the VA buffers for decoding. Anyway, the
slice data buffer is copied into the VA buffer when it is created.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Parse slice() header and first macroblock position earlier in _parse()
function instead of waiting for the _decode() stage. This doesn't change
anything but readability.
Introduce new GstVaapiDecoderUnitMpeg2 object, which holds the standard
GstMpegVideoPacket and additional parsed header info. Besides, we now
parse as early as in the _parse() function so that to avoid un-necessary
creation of sub-buffers in _decode() for video packets that are not slices.
Theory of operations: all units marked as "slice" are moved to the "units"
list. Since this list only contains slice data units, the prev_slice pointer
was removed. Besides, we now maintain two extra lists of units to be decoded
before or after slice data units.
In particular, all units in the "pre_units" list will be decoded before
GstVaapiDecoder::start_frame() is called and units in the "post_units"
list will be decoded after GstVaapiDecoder::end_frame() is called.
Make GstVaapiSurfaceProxy only a thin wrapper around a VA context and a
VA surface. i.e. drop any other attribute like timestamp, duration,
interlaced or top-field-first.
Maintain decoded surfaces as GstVideoCodecFrame objects instead of
GstVaapiSurfaceProxy objects. The latter will tend to be reduced to
the strict minimum: a context and a surface.
Add new gst_vaapi_decoder_get_frame() function meant to be used with
gst_vaapi_decoder_decode(). The purpose is to return the next decoded
frame as a GstVideoCodecFrame and the associated GstVaapiSurfaceProxy
as the user-data object.
Decoder units were zero-initialized, including the SPS/PPS/slice headers.
The latter don't require zero-initialization since the codecparsers/ lib
will do so for key variables already. This is not a great value per se but
at least it makes it possible to check whether the default initialization
decisions made in the codecparsers/ lib were right or not.
This can be reverted if this exposes too many issues.
Drop explicit initialization of most fields that are implicitly set to
zero. Drop helper macros for casting to GstVaapiPictureH264 or
GstVaapiFrameStore. Also remove some useless checks for NULL pointers.
Implement GstVaapiDecoder.start_frame() and end_frame() semantics so
that to create new VA context earlier and submit VA pictures to the
HW for decoding as soon as possible. i.e. don't wait for the next
frame to start decoding the previous one.
Introduce new GstVaapiDecoderUnitH264 object, which holds the standard
NAL unit header (GstH264NalUnit) and additional parsed header info.
Besides, we now parse headers as early as in the _parse() function so
that to avoid un-necessary creation of sub-buffers in _decode() for
NAL units that are not slices.
This is a performance win by ~+1.1% only.
Use standard GstVideoCodecState throughout GstVaapiDecoder and expose
it with a new gst_vaapi_decoder_get_codec_state() function. This makes
it possible to drop picture size (width, height) information, framerate
(fps_n, fps_d) information, pixel aspect ratio (par_n, par_d) information,
and interlace mode (is_interlaced field).
This is a new API with backwards compatibility maintained. In particular,
gst_vaapi_decoder_get_caps() is still available.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Align gst_vaapi_decoder_get_surface() semantics with the rest of the
API. That is, return a GstVaapiDecoderStatus and the decoded surface
as a handle to GstVaapiSurfaceProxy in parameter.
This is an API/ABI change.
Introduce new decoding process whereby a GstVideoCodecFrame is created
first. Next, input stream buffers are accumulated into a GstAdapter,
that is then passed to the _parse() function. The GstVaapiDecoder object
accumulates all parsed units and when a complete frame or field is
detected, that GstVideoCodecFrame is passed to the _decode() function.
Ultimately, the caller receives a GstVaapiSurfaceProxy if decoding
process was successful.
The start_frame() hook is called prior to traversing all decode-units
for decoding. The unit argument represents the first slice in the frame.
Some codecs (e.g. H.264) need to wait for the first slice in order to
determine the actual VA context parameters.
Split decoding process into two steps: (i) parse incoming bitstreams
into simple decoder-units until the frame or field is complete; and
(ii) decode the whole frame or field at once.
This is an ABI change.
Introduce a new GstVaapiDecoderFrame that is just a list of decoder units
(GstVaapiDecoderUnit objects) that constitute a frame. This object is just
an extension to GstVideoCodecFrame for VA decoder purposes. It is available
as the user-data member element.
This is a libgstvaapi internal object.
Introduce GstVaapiDecoderUnit which represents a fragment of the source
stream to be decoded. For instance, a decode-unit will be a NAL unit for
H.264 streams, an EBDU for VC-1 streams, and a video packet for MPEG-2
streams.
This is a libgstvaapi internal object.
GstVaapiSurfaceProxy does not use any particular functionality from
GObject. Actually, it only needs a basic object type with reference
counting.
This is an API and ABI change.
Introduce a new reference counted object that is very lightweight and
also provides flags and user-data functionalities. Initialization and
finalization times are reduced by up to a factor 5x vs GstMiniObject
from GStreamer 0.10 stack.
This is a libgstvaapi internal object.
Fix decode_slice() to ensure a VA context exists prior to creating a
new GstVaapiSliceH264, which invokes vaCreateBuffer() with some VA
context ID. i.e. the latter was not initialized, thus causing failures
on Cedar Trail for example.
Use glib >= 2.32 semantics for GMutex and GRecMutex wrt. initialization
and termination. Basically, the new mutex objects can be used as static
mutex objects from the deprecated APIs, e.g. GStaticMutex and GStaticRecMutex.
This fixes symbol clashes between the gst-vaapi built-in codecparsers/
library and the system-provided one, mainly used by videoparses/. Now,
only symbols with the gst_vaapi_* prefix will be exported, if they are
not marked as "hidden" to libgstvaapi.
Fix gst_vaapi_image_map() to return TRUE and the GstVaapiImageRaw
structure correctly filled in if the image was already mapped.
Likewise, make gst_vaapi_image_unmap() return TRUE if the image
was already unmapped.
Fix reference leak of surface and image in GstVaapiVideoBuffer wrapper,
thus resulting on actual memory leak of GstVaapiImage when using them
for downloads/uploads from VA surfaces and more specifically surfaces
when the pipeline is shutdown. i.e. vaTerminate() was never called
because the resources were not unreferenced, and thus not deallocated
in the end.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
The picture size signalled by sps->{width,height} is the actual size with
cropping applied, not the original size derived from pic_width_in_mbs_minus1
and pic_height_in_map_units_minus1. VA driver expects that original size,
uncropped.
There is another issue pending: frame cropping information needs to be
taken care of.
Fix commit 18245b4 so that to link and build parserutils.[ch] first.
This is needed since that's the common dependency for actual codec
parsers (gstvc1parser.c for instance).
... this is useful to make sure pixel-aspect-ratio and framerate
information are correctly parsed since we have no means to detect
that at configure time.
Invoke gst_mpeg_video_finalise_mpeg2_sequence_header() to get the
correct PAR values. While doing so, require a newer version of the
bitstream parser library.
Note: it may be necessary to also parse the Sequence_Display_Extension()
header.
Signed-off-by: Sreerenj Balachandran <sreerenj.balachandran@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
This patch updates to relect the 1.0 version of the protocol. The main
changes are the switch to wl_registry for global object notifications
and the way that the event queue and file descriptor is processed.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
git am got confused somehow, though the end result doesn't change at
all since we require both SPS and PPS to be parsed prior to decoding
the first slice.
Only start decoding slices when at least one SPS and PPS got activated.
This fixes cases when a source represents a substream of another stream
and no SPS and PPS was inserted before the first slice of the generated
substream.
... for interlaced streams. The short_ref[] and long_ref[] arrays may
contain up to 32 fields but VA ReferenceFrames[] array expects up to
16 reference frames, thus including both fields.
Fix decoding of interlaced streams when adaptive_ref_pic_marking_mode_flag
is equal to 1, i.e. when memory management control operations are used. In
particular, when field_pic_flag is set to 0, the new reference flags shall
be applied to both fields.
Decoded frames are only output when they are complete, i.e. when both
fields are decoded. This also means that the "interlaced" caps is not
propagated to vaapipostproc or vaapisink elements. Another limitation
is that interlaced bitstreams with MMCO are unlikely to work.
Split remove_reference_at() into a function that actually removes the
specified entry from the short-term or long-term reference picture array,
and a function that sets reference flags to the desired value, possibly
zero. The latters marks the picture as "unused for reference".
Fix gst_vaapi_picture_new_field() to preserve the original picture type.
e.g. gst_vaapi_picture_new_field() with a GstVaapiPictureH264 argument
shall generate a GstVaapiPictureH264 object.
Introduce new `structure' field to the H.264 specific picture structure
so that to simplify the reference picture marking process. That local
picture structure is derived from the original picture structure, as
defined by the syntax elements field_pic_flag and bottom_field_flag.
Move DPB flush up if the current picture to decode is an IDR. Besides,
don't bother to check for IDR pictures in dpb_add() function since an
explicit DPB flush was already performed in this case.
... to build the short_ref[] and long_ref[] lists from the DPB, instead
of maintaining them separately. This avoids refs/unrefs while making it
possible to generate the list based on the actual picture structure.
This also ensures that the list of generated ReferenceFrames[] actually
matches what reference frames are available in the DPB. i.e. short_ref[]
and long_ref[] entries are implied from the DPB, so there is no risk of
having "dangling" references.
Use the POC member available in the GstVaapiPicture base class and
get rid of the dependency on the local VAPictureH264 TopFieldOrderCnt
and BottomFieldOrderCnt. Rather, use a simple field_poc[] array
initialized to INT_MAX, so that to simplify picture POC calculation
for non frame pictures.
Further get rid of GstVaapiPictureH264-local VAPictureH264.flags for
reference bits, thus simplifying the reference picture marking process
to only track a single set of reference flags. Also introduce a new
long_term_frame_idx member.
Add vaapi_fill_picture() helper function to convert GstVaapiPictureH264
to VAPictureH264 structure. This is preparatory work to get rid of the
local VAPictureH264 member in GstVaapiPictureH264.
Delay ensure_context() until we actually need a VA context for allocating
new VA surfaces, and then GstVaapiPictures, but also when a real activation
of a new picture parameter set occurs, thus also implying an activation
of the related sequence parameter set.
The most important thing was to drop the global pps and sps pointers since
they may not have matched the currently activated picture parameter or
sequence parameter sets at the specified decode point.
Anoter positive side-effect is that this cleans up all occurrences of
decode_current_picture() to only keep those useful in decode_picture(),
before a new picture is allocated, or in decode_sequence_end() when
an end-of-stream or end-of-sequence condition occurred.
... aka fix regression from efaab79. In particular, ScalingList8x8[]
array was partially copied to the VAIQMatrixBufferH264. While we are
at it, also improve bounds checking and avoid copying 8x8 scaling
lists if transform_8x8_mode_flag is set to 0.
Don't copy scaling lists twice to an intermediate state. Rather, directly
use the scaling lists from GstH264PPS since they would match those provided
by SPS header, if necessary. i.e. if PPS-specific scaling lists are not
available in the bitstream.
Remove exit_picture() and exit_picture_poc() since PicOrderCnt(CurrPic)
is now updated accordingly to the standard. Besides, MMCO = 5 specific
operations are moved up to exec_ref_pic_marking_adaptive_mmco_5().
Fix adaptive memory control decoded reference picture marking process
implementation for operations 2 to 6, thus also fixing support for
long-term reference pictures.
This change only splits each individual MMCO handler into several functions
dedicated for each operation. This is needed to perform further work later
on.
Allow build with strict DPB ordering mode whereby evicted entries
are replaced by the next entries, in order instead of optimizing
it away with the last entry in the DPB.
This is only useful for debugging purpose, against a reference SW
decoder for example.
GstH264SliceHdr.n_emulation_prevention_bytes is bound to exist now that
a newer version of codecparsers/ are used if the system provided one is
now recent enough to have those required extensions.
Use newer sources from the codecparsers/ submodule for
- GstH264SliceHdr.n_emulation_prevention_bytes: EPBs;
- GstH264VUIParams.{par_n,par_d}: pixel-aspect-ratio.
Propagate pixel-aspect-ratio determined by the GStreamer codecparser
from the sequence headers.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Split decode_buffer() into the core infrastructure that determines
the NAL units contained in the adapter and the actual function that
decodes the NAL unit.
Split decode_buffer() into the core infrastructure that determines
the packets contained in the adapter and the actual function that
decodes the packet data.
Fix memory leakage of empty packets, i.e. packets that only contain
the start code prefix. In particular, free empty user-data packets.
Besides, the codec parser will already fail gracefully if the packet
to parse does not have the minimum required size. So, we can also
completely drop the block of code that used to handle packets of size 4
(including the start code).
Fix return value when the second scan for start code fails. This means
there is not enough data to determine the full extents of the current
packet and the function shall return GST_VAAPI_DECODER_STATUS_ERROR_NO_DATA
in this case, instead of GST_VAAPI_DECODER_STATUS_SUCCESS.
Improve the semantics for gst_vaapi_decoder_put_buffer() when an empty
buffer is passed on. An empty buffer is a buffer with a NULL data pointer
or with a size equals to zero. In this case, that buffer is simply
skipped and the function returns TRUE. A NULL buffer argument still
marks the end-of-stream.
Rather than always making the surface fullscreen instead implement the
set_fullscreen vfunc on GstVaapiWindow and then set the shell surface
fullscreen on not depending on that.
Reviewed-by: Joe Konno <joe.konno@intel.com>
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Mesa recently updated the <GL/glext.h> header version to Khronos version 85.
This caused the PFNGLMULTITEXCOORD2FPROC definition to be moved out of the
GL_VERSION_1_3_DEPRECATED block. However, since <GL/gl.h> also defines
GL_VERSION_1_3 to 1, the definitions in <GL/glext.h> are then not enabled,
thus leaving PFNGLMULTITEXCOORD2FPROC undefined as well.
Provide a PFNGLMULTITEXCOORD2FPROC replacement as an interim solution for
newer versions of the <GL/glext.h> header.
Maintaining the sub-buffer is rather suboptimal especially since we
were also maintaining a GstAdapter. Now, we only use the GstAdapter
thus requiring minor extra parsing when receiving avcC buffers.
This allows the compositor to optimize redraws and cull away changes
obscured by the video surface.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Reset, i.e. destroy then create, the decoder in _setcaps() handler only
if the underlying codec type actually changed. This makes it possible
to be more tolerant with certain MPEG-2 streams that get parsed to
form caps that are compatible with the previous state but minor changes
to "codec-data".
Make it possible to specify the maximum number of references to use within
a single VA context. This helps reducing GPU memory allocations to the useful
number of references to be used.
Forward declaring enums is not allowed by the C standard and aborts
compilation if the header file is included in a C++ project.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Some VA drivers (e.g. EMGD) can have completely random values for initial
display attributes. So, try to improve the discovery process to check the
initial display attribute values actually fall within valid bounds. If not,
try to reset those to some sensible values like the default value reported
through vaQueryDisplayAttributes().
Use g_object_class_install_properties() to install GstVaapiDisplay properties.
It is useful to maintain properties as GParamSpec so that to be able to raise
"notify" signals by id instead of by name in the future.
A rendering mode can be "overlay" or "texture"'ed blit.
The former mode implies that a VA surface used for rendering can't be
re-used right away for decoding, so the sink shall make provisions to
retain the associated surface proxy until the next surface is to be
displayed.
The latter mode implies that the VA surface is implicitly copied to an
intermediate backing store, or back buffer of a frame buffer, so the
associated surface proxy can be disposed right away.
The VA display attributes are mapped to properties so that to maintain the
GStreamer terminology. Properties are to be identified by name, but internal
functions are available to lookup the property by the actual VA display
attribute type.
decode_current_picture() was converted to return a gboolean instead
of a GstVaapiDecoderStatus, so we were not getting out of the decode
loop as expected, or could cause an error instead.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Integrate the start code prefix in the slice data buffer that is submitted
to the hardware. VA-API specifies that slice_data_offset is the offset to
the first byte of slice data. And, for MPEG-2, slice() data begins with
the slice_start_code. Some VA driver implementations (EMGD) expect this.
Use g_object_notify_by_pspec() instead of g_object_notify() so that to
avoid a property name lookup. i.e. this makes notifications faster to
the `vaapidecode' element.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Two elements in the luminance quantization table were wrong. So,
gst_jpeg_get_default_quantization_tables() now reconstructs tables
in zig-zag order from the standard ones (Tables K.1 and K.2).
... instead of having them pre-calculated. This saves around 1.5 KB
of data in the DSO but requires gst_jpeg_get_default_huffman_tables()
to do more work. Though, the client application may have to call that
function at most once, only.
Move display types from gstvaapipluginutil.* to gstvaapidisplay.* so that
we could simplify characterization of a GstVaapiDisplay. Also rename "auto"
type to "any", and add a "display-type" attribute.
This improves display name comparisons by always allocating a valid display
name. This also helps to disambiguate lookups by name in the global display
cache, should a new backend be implemented.
The vdeo buffer creation routines shall actually be internal to gstreamer-vaapi
plugin elements. So deprecate any explicit creation routines that are not the
new *_typed_new*() variants.
Introduce new typed constructors internal to gstreamer-vaapi plugin elements.
This avoids duplication of code, and makes it possible to further implement
generic video buffer creation routines that automatically map to base or GLX
variants.
If GLX window was created from a foreign Display, then that same Display shall
be used for subsequent glXMakeCurrent(). This means that gl_create_context()
will now use the same Display that the parent, if available.
This fixes cluttersink with the Intel GenX VA driver.
This flag is obsolete. It was meant to explicitly enable/disable VA/GLX API
support, or fallback to TFP+FBO if this API is not found. Now, we check for
the VA/GLX API by default if --enable-glx is set. If this API is not found,
we now default to use TFP+FBO.
Note: TFP+FBO, i.e. using vaPutSurface() is now also a deprecated usage and
will be removed in the future. If GLX rendering is requested, then the VA/GLX
API shall be used as it covers most usages. e.g. AMD driver can't render to
an X pixmap yet.
GStreamer -base plugins >= 0.10.31 are now required, so the checks for
new APIs like GstXOverlay::set_window_handle() and ::set_render_rectangle()
are no longer necessary.
GStreamer codecparsers-based decoders are the only supported decoders now.
Though, FFmpeg decoders are still available in gstreamer-vaapi 0.3.x series.
This is a preferred thread-safe version. Also add an inline version of
g_clear_object() if compiling with glib < 2.28.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Add valid flag to GstJpegQuantTable and GstJpegHuffmanTable so that
to determine whether a table actually changed since the last user
synchronization point. That way, this makes it possible for some
hardware accelerated decoding solution to upload only those tables
that changed.
Add new GstJpegHuffmanTables helper structure to hold all possible
AC/DC Huffman tables available to all components.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
gst_jpeg_parse() now gathers all scans available in the supplied
buffer. A scan comprises of the scan header and any entropy-coded
segments or restart marker following it. The size and offset to
the associated data (ECS + RST segments) are append to a new
GstJpegScanOffsetSize structure.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Improve generation of presentation timestamps to be less sensitive
to input stream errors. In practise, GOP is also a synchronization
point for PTS calculation.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
TRD and TRB fields are not large enough to hold the difference of PTS
expressed with nanosecond resolution. So, compute them from the original
VOP info.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Allow MPEG-2 High profile streams only if the HW supports that profile
or no High profile specific bits are used, and thus Main profile could
be used instead. i.e. chroma_format is 4:2:0, intra_dc_precision is not
set to 11 and no sequence_scalable_extension() was parsed.
In P-pictures, prediction shall be made from the two most recently
decoded reference fields. However, when the first I-frame is a field,
the next field of the current picture could be a P-picture but only a
single field was decoded so far. In this case, create a dummy picture
with POC = -1 that will be used as reference.
Some VA drivers would error out if P-pictures don't have a forward
reference picture. This is true in general but not in this very specific
initial case.
Allow fallback from simple to main profile when the HW decoder does
not support the former profile and that no sequence_header_extension()
is available to point out this.
decode_picture() could return an error when an MPEG-4 profile is not
supported for example. In this case, the underlying VA context is not
allocated and no other proper action can be taken. Likewise on exit
from decode_slice().
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Introduce a POC field in GstVaapiPicture so that to store simpler sequential
numbers. A signed 32-bit integer should be enough for 1 year of continuous
video streaming at 60 Hz.
Use this new POC value to maintain the DPB, instead of 64-bit timestamps.
This also aligns with H.264 that will be migrated to GstVaapiDpb infrastructure.
Always prefer PTS from the demuxer layer for GOP times. If this is invalid,
i.e. demuxer could not determine the PTS or the generated PTS is lower than
max PTS from past pictures, then try to fix it up based on the duration of
a frame.
For picture PTS, simply use the GOP PTS formerly computed then use TSN to
reconstruct a current time. Also now handle wrapped TSN correctly.
Some streams, badly constructed, could have signaled an interlaced
frame while the sequence was meant to be progressive. Warn and force
frame to be progressive in this case.
Add first-field (FF) flag to GstVaapiPicture, thus not requiring is_first_field
member in each decoder. Rather, when a GstVaapiPicture is created, it is considered
as the first field. Any subsequent allocated field will become the second field.
Add gst_vaapi_picture_new_field() function that clones a picture, while
preserving the parent picture surface. i.e. the surface proxy reference
count is increased and other fields copied as is. Besides, the picture
is reset into a "non-output" mode.
Add top-field-first (TFF) and interlaced flags to GstVaapiPicture so they
could be propagated to the surface proxy when it is pushed for rendering.
Besides, top and bottom fields are now expressed with picture structure flags
from GstVaapiSurfaceRenderFlags.
If GstVaapiPicture has flag SKIPPED set, this means gst_vaapi_picture_output()
will not push the underlying surface for rendering. Besides, VC-1 skipped P-frame
has nothing to do with rendering. This only means that the currently decoded
picture is just a copy of its reference picture.
Add new "interlaced" attribute to GstVaapiSurfaceProxy. Use this in
vaapipostproc so that to handles cases where bitstream is interlaced
but almost only frame pictures are generated. In this case, we should
not be alternating between top/bottom fields.
Allow rendering flags, as a combination of GstVaapiSurfaceRenderFlags,
to be set to the video buffer. In particular, this is mostly useful for
basic deinterlacing.
Some streams have incorrect GOP timestamps, or nothing set at all.
i.e. GOP time is 00:00:00 for all GOPs. Try to recover in this case
from demuxer timestamps, which are monotonic.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Skip all pictures prior to the first sequence_header(). Besides,
skip all picture_data() if there was no prior picture_header().
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
VA-API expects slice_vertical_position as the initial position from the
bitstream. i.e. the direct slice() information. VA drivers will be fixed
accordingly.
Unlike what VA-API documentation defines, the slice_data_bit_offset
represents the offset to the first macroblock in the slice data, minus
any emulation prevention bytes in the slice_header().
This fix copes with binary-only VA drivers that won't be fixed any
time soon. Besides, this aligns with the current FFmpeg behaviour
that was based on those proprietary drivers implementing the API
incorrectly.
Original values from sequence_header() are 12-bit and the remaining
2 most significant bits are coming from sequence_extension().
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
6.3.15 says that "some slices may have the same slice_vertical_position,
since slices may start and finish anywhere". So, we can't submit the current
picture to the HW right away since subsequent slices would be missing.
vaRenderPicture() implicitly disposes VA buffers. Some VA drivers would
push the VA buffer object into a list of free buffers to be re-used. However,
reference pictures (and data) that was kept would explicitly release the VA
buffer object later on, thus possibly destroying a valid (re-used) object.
Besides, some other VA drivers don't support correctly the vaRenderPicture()
semantics for VA buffers disposal and would leak memory if there is no explicit
vaDestroyBuffer(). The temporary workaround is to explcitily destroy VA buffers
right after vaRenderPicture(). All VA drivers need to be aligned.
This ensures the VA context is clear when the encoded resolution
changes. i.e. make sure older picture is decoded with the older
VA context before it changes.
On sequence end, if the last decoded picture is not output for rendering,
then the proxy surface is not created. In this case, the original surface
must be released explicitly to the context.
VA drivers may have a faster means to transfer user buffers to GPU
buffers than using memcpy(). In particular, on Intel Gen graphics, we
can use pwrite(). This provides for faster upload of bitstream and can
help higher bitrates.
vaapi_create_buffer() helper function was also updated to allow for
un-mapped buffers and pre-initialized data for buffers.
Only the explicit pred_weight_table(), possibly with the inferred default
values, shall be required. e.g. don't fill in the table if weighted_pred_flag
is not set for P/SP slices.
Keep a valid reference to the proxy in GstVaapiPicture so that frames
marked as "used for reference" could be kept during the lifetime of the
picture. i.e. don't release them too soon as they could be re-used right
away.
Drop obsolete gst_vaapi_decoder_push_surface() that was no longer used.
Change gst_vaapi_decoder_push_surface_proxy() semantics to assume PTS
is already set correctly and reference count increased, if necessary.
The new API simplifies a lot reference counting and makes it more
flexible for future additions/changes. The GstVaapiCodecInfo is
also gone. Rather, new helper macros are provided to allocate
picture, slice and quantization matrix parameter buffers.
HACK: qtdemux does not report profiles for H.263. So, assume plain
"video/x-h263" is H.263 Baseline profile.
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
gst_vaapi_surface_get_parent_context() was not meant to be exposed globally.
It's just an internal helper function. However, it's still possible to get
the parent context through the "parent-context" property.