Some GPUs (especially NVIDIA) are complaining that GPU is still busy
even we did 50 times of retry with 1ms sleep per failure.
Because DXVA/D3D11 doesn't provide API for "GPU-IS-READY-TO-DECODE"
like signal, there seems to be still no better solution other than sleep.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1913>
The vabasedec's display and decoder are created/destroyed between
the gst_va_base_dec_open/close pair. All the data and event handling
functions are between this pair and so the accessing to these pointers
are safe. But the query function can be called anytime. So we need to:
1. Make these pointers operation in open/close and query atomic.
2. Hold an extra ref during query function to avoid it destroyed.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1957>
Initial support for d3d11 texture so that encoder can copy
upstream d3d11 texture into encoder's own texture pool without
downloading memory.
This implementation requires MFTEnum2() API for creating
MFT (Media Foundation Transform) object for specific GPU but
the API is Windows 10 desktop only. So UWP is not target
of this change.
See also https://docs.microsoft.com/en-us/windows/win32/api/mfapi/nf-mfapi-mftenum2
Note that, for MF plugin to be able to support old OS versions
without breakage, this commit will load MFTEnum2() symbol
by using g_module_open()
Summary of required system environment:
- Needs Windows 10 (probably at least RS 1 update)
- GPU should support ExtendedNV12SharedTextureSupported feature
- Desktop application only (UWP is not supported yet)
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1903>
Move d3d11 device, memory, buffer pool and minimal method
to gst-libs so that other plugins can access d3d11 resource.
Since Direct3D is primary graphics API on Windows, we need
this infrastructure for various plugins can share GPU resource
without downloading GPU memory.
Note that this implementation is public only for -bad scope
for now.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/464>
Add the helper function _get_surface_id() which extracts the
VASurfaceID from the passed picture. This function gets the surface of
the next and previous reference picture.
Instead of if-statements, this refactor uses a switch-statement with a
fall-through, for P-type pictures, making the code a bit more readable.
Also it adds quirks for gallium driver, which cannot handle invalid
surfaces as forwarding nor backwarding references, so the function fails.
Also iHD cannot handle them, but to avoid failing, the current picture
is used as self-reference.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1939>
When missing the reference frames, we should not just discard the current
frame. Some streams have group of picture header. It is an optional header
that can be used immediately before a coded I-frame to indicate to the decoder
if the first consecutive B-pictures immediately following the coded I-frame can
be reconstructed properly in the case of a random access.
In that case, the B frames may miss the previous reference and can still be
correctly decoded. We also notice that the second field of the I frame may
be set to P type, and it only ref its first field.
We should not skip all those frames, and even the frame really misses the
reference frame, some manner such as inserting grey picture should be used
to handle these cases.
The driver crashes when it needs to access the reference picture while we set
forward_reference_picture or backward_reference_picture to VA_INVALID_ID. We
now set it to current picture to avoid this. This is just a temp manner.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1929>
The behavior for zero AVEncMPVGOPSize value would be
varying depending on GPU vendor implementation and some
GPU will produce keyframe only once at the beginning of encoding.
That's unlikely expected result for users.
To make this property behave consistently among various GPUs,
this commit will change default value of "gop-size" property to -1
which means "auto". When "gop-size" is unspecified, then
mfvideoenc will calculate GOP size based on framerate
like that of our x264enc implementation.
See also
https://docs.microsoft.com/en-us/windows/win32/directshow/avencmpvgopsize-property
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1911>
Add a new video source element "d3d11desktopdupsrc" for capturing desktop image
via Desktop Duplication based on Microsoft's Desktop Duplication sample available at
https://github.com/microsoft/Windows-classic-samples/tree/master/Samples/DXGIDesktopDuplication
This element is expected to be a replacement of existing dxgiscreencapsrc
element in winscreencap plugin.
Currently this element can support (but dxgiscreencapsrc cannot)
- Copying captured D3D11 texture to output buffer without download
- Support desktop session transition
e.g., can capture desktop without error even in case that
"Lock desktop" and "Permission dialog"
- Multiple d3d11desktopdupsrc elements can capture the same monitor
Not yet implemented features
- Cropping rect is not implemented, but that can be handled by downstream
- Mult-monitor is not supported. But that is also can be implemented by
downstream element for example via multiple d3d11desktopdup elements
with d3d11compositor
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1855>
Hide most of symbols of GstD3D11Memory object.
GstD3D11Memory is one of primary resource for imcoming d3d11 library
and it's expected to be a extensible feature.
Hiding implementation detail would be helpful for later use case.
Summary of this commit:
* Now all native Direct3D11 resources are private of GstD3D11Memory.
To access native resources, getter methods need to be used
or generic map (e.g., gst_memory_map) API should be called
apart from some exceptional case such as d3d11decoder case.
* Various helper methods are added for GstBuffer related operations
and in order to remove duplicated code.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1892>
... instead of READY state. READY state is too early for setting
overlay window handle especially playbin/playsink scenario
since playsink will set given overlay handle on videosink once
READY state change of videosink is ensured.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1893>
Unlike software MFT (Media Foundation Transform) which is synchronous
in terms of processing input and output data, hardware MFT works
in asynchronous mode. output data might not be available right after
we pushed one input data into MFT.
Note that async MFT will fire two events, one is "METransformNeedInput"
which happens when MFT can accept more input data,
and the other is "METransformHaveOutput", that's for signaling
there's pending data which can be outputted immediately.
To listen the events, we can wait synchronously via
IMFMediaEventGenerator::GetEvent() or make use of IMFAsyncCallback
object which is asynchronous way and the event will be notified
from Media Foundation's internal worker queue thread.
To handle such asynchronous operation, previous working flow was
as follows (IMFMediaEventGenerator::GetEvent() was used for now)
- Check if there is pending output data and push the data toward downstream.
- Pulling events (from streaming thread) until there's at least
one pending "METransformNeedInput" event
- Then, push one data into MFT from streaming thread
- Check if there is pending "METransformHaveOutput" again.
If there is, push new output data to downstream
(unlikely there is pending output data at this moment)
Above flow was processed from upstream streaming thread. That means
even if there's available output data, it could be outputted later
when the next buffer is pushed from upstream streaming thread.
It would introduce at least one frame latency in case of live stream.
To reduce such latency, this commit modifies the flow to be fully
asynchronous like hardware MFT was designed and to be able to
output encoded data whenever it's available. More specifically,
IMFAsyncCallback object will be used for handling
"METransformNeedInput" and "METransformHaveOutput" events from
Media Foundation's internal thread, and new output data will be
also outputted from the Media Foundation's thread.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1520>
Add a new property "render-stats" to allow rendering statistics
data on window for debugging and/or development purpose.
Text rendering will be accelerated by GPU since this implementation
uses Direct2D/DirectWrite API and Direct3D inter-op for minimal overhead.
Specifically, text data will be rendered on swapchain backbuffer
directly without any copy/allocation of extra texture.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1830>
Since GstVaDecodePicture is destroyed completely with its free() function and
it's used as destroy notify by codecs picture, there's no need to call
gst_va_decoder_destroy_buffers() externally, since the codecs base classes
destroy the codec picture when it's required.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1841>
The current way of GstVaDecodePicture's finalize will leak some
resource such as parameter buffers and slice data.
The current way deliberately leaves these resource releasing logic
to va decoder related function and trigger a warning if we free the
GstVaDecodePicture without releasing these resources.
But in practice, sometimes, you do not have the chance to release
these resource before picture is freed. For example, H264/Mpeg2
support multi slice NALs/Packets for one frame. It is possible that
we already succeed to parse and generate the first several slices
data by _decode_slice(), but then we get a wrong slice NAL/packet
and fail to parse it. We decide to discard the whole frame in the
decoder's base class, it just free the current picture and does not
trigger sub class's function again. In this kind of cases, we do
not have the chance to cleanup the resource, and the resource will
be leaked.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1841>
Even if resolution and/or bitdepth is not updated, required
DPB size can be changed per SPS update and it could be even
larger than previously configured size of DPB. If so, we need
to reconfigure DPB d3d11 texture pool again.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1839>
In order to honor GST_BUFFER_POOL_ACQUIRE_FLAG_DONTWAIT in VA pool, allocators'
wait_for_memory() has to be decoupled from their prepare_buffer() so it could be
called in pools' acquire_buffer() if the flag is not set.
wait_for_memory() functions are blocking so the received memories are assigned
to the fist requested buffer, if multithreaded calls. For this a new mutex were
added.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1815>
An application, using for example appsink, can hold buffers from any
va allocator after setting the pipeline to NULL. We need to destroy
the allocator when that memory is unrefed.
This patch juggles a bit with the allocator reference count in
memories in order to achieve this:
1. When memory is created no alloc ref is modified
2. When memory is released, alloc ref is decreased
3. When memory is reassiged to a buffer, alloc ref is increased
4. When memory is flushed, alloc ref is increased becase it is going
to be decreased in gst_memory_unref()
Also this patch moves the deallocation of member variables to
finalize() rather than dispose()
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1815>
In case that "pic_order_cnt_type" is equal to zero, ref picture
list for B slice should not include non-existing picture
as per spec 8.2.4.2.3. And, the second field is not needed
for the process of frame picture reference list construction
since it needs to be frame unit, not field picture in that case.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1812>
force_videometa should mean that the buffer must use video meta to
map correctly. When the stride or the offset of the alloc_info is
different from the src caps, the downstream must use video meta.
So this flag should not link with the RAW caps only. All kinds of
caps(memory:VAMemory, memory:DMABuf) should have this flag.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1711>
When the downstream element reports an ANY caps, and it also fails to
support VideoMeta, we should fallback to the system memory.
Note: the basetransform kind elements never return valid allocation
query before set_caps(). So, if a basetransform return an ANY sink
caps, we always fallback to system memory for it.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1711>
Allocate a GArray which is used to fill
VAPictureParameterBufferH264.ReferenceFrames (called per frame),
instead of alloc/free per frame.
Also this commit is to fix the condition where long-term reference
picture is needed for VAPictureParameterBufferH264.ReferenceFrames
entry.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1813>
We don't need to preserve input color range for transformed target
color space. Also some GPUs doesn't seem to be happy with 16-235
color range for RGB color space.
Also, since our default display target color space is
DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709, choosing full color range
would make more sense.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1814>
When gst_va_dmabuf_allocator_setup_buffer_full() receives info (not NULL) it is
supposed that this buffer is not part of the allocator pool, so it has to be
de-allocated as soon it is freed.
This patch sets the destroy notify of the assigned GstVaBufferSurface if info is
not NULL.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1811>
Managing reference picture type by using two variables
(ref and long_term) seems to be redundant and that can be
represented by using a single enum value.
This is to sync this implementation with gstreamer-vaapi so that
make comparison between this and gstreamer-vaapi easier and also
in order to minimize the change required for subclass to be able
to support interlaced.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1534>
As per spec 7.4.3 Slice header semantics, the flag value is derived as
MbaffFrameFlag = (mb_adaptive_frame_field_flag && !field_pic_flag)
and DXVA uses the value.
Regarding FrameNumList, in case of long-term ref, FrameNumList[i]
value should be long_term_frame_idx not long_term_pic_num.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1780>
This add HEVC decoding support into the new VA plugin. This implementation has
been tested using the ITU comformance test (through fluster). It fails all
MAIN10 tests, as this is not implemented yet along with the following:
CONFWIN_A_Sony_1 (looks fine, but md5sum is incorrect)
PICSIZE_A_Bossen_1 (height too high)
PICSIZE_B_Bossen_1 (same)
VPSSPSPPS_A_MainConcept_1 (parser issue)
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1714>
This causes no changes to the profile but keeps the existing settings.
The profile can also be changed from e.g. the card's configuration
application and in that case probably should be left alone.
The default is the new value as it keeps the profile setting as it is,
which is consistent with the previous behaviour in 1.18.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1721>
In our va implemenation, we just use image's info to map the buffer.
The padding info just plays a role as a place holder to expand the
allocation size in caps when decoding size is bigger than display
size. So the padding_right or padding_left does not change the result.
But we find if using padding_left, it is hard to meet the requirement
of gst_video_meta_validate_alignment(), when the video meta's stride
is different from the allocation width.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
We have already done the jobs in gst_va_base_dec_decide_allocation()
and no need to call base class' decide_allocation() again. The base
class' decide_allocation() will set_format() again and let use do the
image/surface testing again, which is low performance and no needed.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Use this standalone function to update the allocator info and make
all ensure_image() and mem_alloc() API clean.
We also change the default way of using image. We now set the non
derive manner as the default manner, and if it fails, then fallback
to the derived image manner.
On a lot of platforms, the derived image does not have caches, so the
read and write operations have very low performance.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Moving the parameters testing and setting from the allocator_alloc_full()
to the allocator_try(). The allocator_alloc_full() will be called every
time when we need to allocate a new memory. But all these parameters such
as the surface and the image format, rt_format, etc, are unchanged during
the whole allocator lifetime. Just setting them in set_format() is enough.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Output texture of d3d11 decoder cannot have the bind flag
D3D11_BIND_SHADER_RESOURCE (meaning that it cannot be used for shader
input resource). So d3d11convert (and it's subclasses) was copying
texture into another internal texture to use d3d11 shader.
It's obviously overhead and we can avoid texture copy for
colorspace conversion or resizing via ID3D11VideoProcessor
as it supports decoder output texture.
This commit would be a visible optimization for d3d11 decoder with
d3d11compositor use case because we can avoid texture copy per frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1718>
GstMemory object could be disposed if GstBuffer is not allocated
by GstD3D11BufferPool such as via gst_buffer_copy() and/or
gst_buffer_make_writable(). So attaching qdata on GstMemory
object would cause unnecessary view alloc/free.
By using view pool which is implemented in GstD3D11Allocator,
we can avoid redundant view alloc/free.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1716>
In order to know the chroma format, besides profile, subsampling_x and
subsampling_y are needed (Spec 7.2.2 Color config semantics). These values are
in GstVp9Parser but not in GstVp9Framehdr.
Also, bit_depth is available in parser but not frame header. Evenmore, those
values are copied to picture structure later.
In case of VA-API, to configure the pipeline, it is require to know the chroma
format and depth.
It is possible to know chroma and depth through caps coming from vp9parser, but
it requires string parsing. It would be less error prone to get these values
through the parser structure at new_sequence() virtual method.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1700>
Add new video composition element which is equivalent to compositor
and glvideomixer elements. When d3d11 decoder elements are used,
d3d11compositor can do efficient graphics memory handling
(zero copying or at least copying memory on GPU memory space).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
New d3d11colorconvert and d3d11scale elements will perform only
colorspace conversion and rescale, respectively. Those new elements
would be useful when only colorspace conversion or rescale is required
and the other part should be done by another elements.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
Add util functions for runtime CUDA kernel source compilation
using NVRTC library. Like other nvcodec dependent libraries,
NVRTC library will be loaded via g_module_open.
Note that the NVRTC library naming is not g_module_open friendly
on Windows.
(i.e., nvrtc64_{CUDA major version}{CUDA minor version}.dll).
So users can specify the dll name using GST_NVCODEC_NVRTC_LIBNAME
environment.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>