We have already done the jobs in gst_va_base_dec_decide_allocation()
and no need to call base class' decide_allocation() again. The base
class' decide_allocation() will set_format() again and let use do the
image/surface testing again, which is low performance and no needed.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Use this standalone function to update the allocator info and make
all ensure_image() and mem_alloc() API clean.
We also change the default way of using image. We now set the non
derive manner as the default manner, and if it fails, then fallback
to the derived image manner.
On a lot of platforms, the derived image does not have caches, so the
read and write operations have very low performance.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Moving the parameters testing and setting from the allocator_alloc_full()
to the allocator_try(). The allocator_alloc_full() will be called every
time when we need to allocate a new memory. But all these parameters such
as the surface and the image format, rt_format, etc, are unchanged during
the whole allocator lifetime. Just setting them in set_format() is enough.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1698>
Output texture of d3d11 decoder cannot have the bind flag
D3D11_BIND_SHADER_RESOURCE (meaning that it cannot be used for shader
input resource). So d3d11convert (and it's subclasses) was copying
texture into another internal texture to use d3d11 shader.
It's obviously overhead and we can avoid texture copy for
colorspace conversion or resizing via ID3D11VideoProcessor
as it supports decoder output texture.
This commit would be a visible optimization for d3d11 decoder with
d3d11compositor use case because we can avoid texture copy per frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1718>
GstMemory object could be disposed if GstBuffer is not allocated
by GstD3D11BufferPool such as via gst_buffer_copy() and/or
gst_buffer_make_writable(). So attaching qdata on GstMemory
object would cause unnecessary view alloc/free.
By using view pool which is implemented in GstD3D11Allocator,
we can avoid redundant view alloc/free.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1716>
In order to know the chroma format, besides profile, subsampling_x and
subsampling_y are needed (Spec 7.2.2 Color config semantics). These values are
in GstVp9Parser but not in GstVp9Framehdr.
Also, bit_depth is available in parser but not frame header. Evenmore, those
values are copied to picture structure later.
In case of VA-API, to configure the pipeline, it is require to know the chroma
format and depth.
It is possible to know chroma and depth through caps coming from vp9parser, but
it requires string parsing. It would be less error prone to get these values
through the parser structure at new_sequence() virtual method.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1700>
Add new video composition element which is equivalent to compositor
and glvideomixer elements. When d3d11 decoder elements are used,
d3d11compositor can do efficient graphics memory handling
(zero copying or at least copying memory on GPU memory space).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
New d3d11colorconvert and d3d11scale elements will perform only
colorspace conversion and rescale, respectively. Those new elements
would be useful when only colorspace conversion or rescale is required
and the other part should be done by another elements.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
Add util functions for runtime CUDA kernel source compilation
using NVRTC library. Like other nvcodec dependent libraries,
NVRTC library will be loaded via g_module_open.
Note that the NVRTC library naming is not g_module_open friendly
on Windows.
(i.e., nvrtc64_{CUDA major version}{CUDA minor version}.dll).
So users can specify the dll name using GST_NVCODEC_NVRTC_LIBNAME
environment.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
Introducing CUDA buffer pool with generic CUDA memory support.
Likewise GL memory, any elements which are able to access CUDA device
memory directly can map this CUDA memory without upload/download
overhead via the "GST_MAP_CUDA" map flag.
Also usual GstMemory map/unmap is also possible with internal staging memory.
For staging, CUDA Host allocated memory is used (see CuMemAllocHost API).
The memory is allowing system access but has lower overhead
during GPU upload/download than normal system memory.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
Fix a bug in _create_other_pool(). The old way of checking the
base->other_pool make that other_pool never be changed until the
gst_va_base_dec_stop() to stop the current decoding context.
But in some stream, the resolution may change during the decoding
process, and we need to re-negotiate the buffer pool. Then, the
old other_pool can not be clean correctly and the new correct one
can not be created.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1692>
The current bufferpool wastes all pre-allocate buffers when the
buffer pool is actived.
The pool->priv->size is 0 for va buffer pool. And every time, the
reset_buffer() will clean all mem and make the buffer size 0, that
can cache the gst_buffer in the buffer pool.
But when the buffer pool is activing, the default_start() just
allocate the buffer and release_buffer() immediately, all the pre
allocated buffers and surfaces are destroyed because of
gst_buffer_get_size (buffer) != pool->priv->size.
We need to use release_buffer() to do the clean job at the pool
start time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1686>
Since allocators keep an available memory queue to reuse, video format and usage
hint are now persistant while allocator's memories are around.
This patch adds _set_format() and _get_format() for both VA allocators.
_set_format() validates if given format can be used or reused. If no allocated
surface previously it creates a dummy one to fetch its offsets and
strides. Updated info is returned to callee.
GstVaPool uses _set_format() at config to verify the allocator capacity and to
get the surfaces offsets and strides, which are going to be used by the video
meta.
Allocator extracted caps are compared with caps from config and if they have
different strides or offsets, force_videometa is set.
A new bufferpool method gst_va_pool_requires_video_meta() is added return the
value of force_videometa. This value is checked in order to know if decoders
need to copy the surface if downstream doesn't announce video meta support.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1667>
Without preallocating buffers and memories a deadlock in pool allocator is
highly probably since it might hit the case were buffer is returned to the pool
but their memories are still hold by a copy downstream, without other
preallocated buffers available.
This kind of a hack, where buffer_reset() follow the normal path if it's called
from start().
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1667>
Every time a new surface is created the counter increases by one, and when it is
destroyed (or will be destroyed in case of GstVaAllocator), the counter is
decreased. Then, at allocator dispose, it is warning if the counter is not zero.
This counter will be also used to check if the allocator can change its
configuration if the counter is zero or can not.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1667>
Updated Decklink SDK to version 11.2 in order to support newer cards like the Decklink 8K Pro.
This required to replace the duplex property by a profile property.
Profile values can be the following:
- bmdProfileOneSubDeviceFullDuplex
- bmdProfileOneSubDeviceHalfDuplex
- bmdProfileTwoSubDevicesFullDuplex
- bmdProfileTwoSubDevicesHalfDuplex
- bmdProfileFourSubDevicesHalfDuplex
Fixes#987
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1665>
1. Allocators don't implement memory free() methods since all the memories will
implement dispose() returning FALSE
2. Memory/miniobject dispose() will act as memory release, enqueueing the
release memory
3. A new allocator's method prepare_buffer() which queries the released memory
queue and will add the requiered memories to the buffer.
4. Allocators added a GCond to synchronize dispose() and prepare_buffer()
5. A new allocator's method flush() which will free for real the memories.
While the bufferpool will
1. Remove all the memories at reset_buffer()
2. Implement acquire_buffer() calling allocator's prepare_buffer()
3. Implement flush_start() calling allocator's flush()
4. start() is disabled since it pre-allocs buffers but also calls
our reset_buffer() which will drop the memories and later the
buffers are ditched, something we don't want. This approach avoids
buffer pre-allocation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1626>
Renamed the first variable member of GstVaMemory from parent to mem in
order to avoid confusion with GstMemory's parent.
When freeing the structure, memory's parent is check in order to
decide if surfaces has to be destroyed or not, since only the parent
class have to destroy it.
Removed GST_MEMORY_FLAG_NO_SHARE in memory initialization, since it is
deprecated.
Implemented allocator's share virtual method which creates a new
shallow GstVaMemory structure based on the passed one which will be
it's parent.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1626>
Staging texture is used for memory transfer between system and
gpu memory. Apart from d3d11{upload,download} elements, however,
it should happen very rarely.
Before this commit, d3d11bufferpool was allocating at least one
staging texture in order to calculate cpu accessible memory size,
and it wasn't freed for later use of the texture unconditionally.
But it will increase system memory usage. Although GstD3D11memory
object is implemented so that support CPU access, most memory
transfer will happen in d3d11{upload,download} elements.
By this commit, the initial staging texture will be freed immediately
once cpu accessible memory size is calculated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1627>
These function were repeated in the different implemented
elements. This patch centralize them.
The side effect is dmabuf memory type is no longer checked with the
current VAContext, but assuming that dmabuf is a consequence of caps
negotiation from dynamic generated caps templates, where the context's
memory types are validated, there's no need to validate them twice.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1644>
An oddness of wasapi loopback feature is that capture client will not
produce any data if there's no outputting sound to corresponding
render client. In other words, if there's no sound to render,
capture task will stall. As an option to solve such issue, we can
add timeout to wake up from capture thread if there's no incoming data
within given time interval. But it seems to be glitch prone.
Another approach is that we can keep pushing silence data into
render client so that capture client can keep capturing data
(even if it's just silence).
This patch will choose the latter one because it's more straightforward
way and it's likely produce glitchless sound than former approach.
A bonus point of this approach is that loopback capture on Windows7/8
will work with this patch. Note that there's an OS bug prior to Windows10
when loopback capture client is running with event-driven mode.
To work around the bug, event signalling should be handled manually
for read thread to wake up.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1588>
Add a global mutex to exclusive access to shared stream buffers, such
as DMABufs or VASurfaces after a tee:
LIBVA_DRIVER_NAME=iHD \
gst-launch-1.0 v4l2src ! tee name=t t. ! queue ! \
vapostproc skin-tone=9 ! xvimagesink \
t. ! queue ! vapostproc ! xvimagesink
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1529>
This function will take an array of DMABuf GstMemory and an array of
fd, and create a VASurfaceID with those fds. Later that VASurfaceID is
attached to each DMABuf through GstVaBufferSurface.
In order to free the surface GstVaBufferSurface now have GstVaDisplay
member, and _buffer_surface_unref() were added.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1529>
There are, in VPP, surfaces that doesn't support 4:2:2 fourccs but it
supports the chroma. So this patch gives that opportunity to the
driver.
This patch also simplifiies
gst_va_video_surface_format_from_image_format() to just an iterator
for surfaces available formats.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1529>
Add a new parameter to _create_surfaces(): a pointer to
VASurfaceAttribExternalBuffers.
If it's defined the memory type is changed to DRM_PRIME, also a new item is
added to the VASurfaceAttrib array with
VASurfaceAttribExternalBufferDescriptor.
Also, the VASurfaceAttrib for pixel format is not mandatory anymore. If fourcc
parameter is 0, is not added in the array, relying on the chroma. This is
useful when creating surfaces for uploading or downloading images.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1529>
The width/height from the video meta can be padded width, height. But when
sourcing from padded buffer, we only want to use the valid pixels. This
rectangle is from the crop meta, orther it is deduces from the caps. The width
and height from the caps is save in the parent class, use these instead of the
GstVideoInfo when settting the src rectangle.
This fixes an issue with 1080p video displaying repeated or green at the
padded bottom 8 lines (seen with v4l2codecs).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1580>
Note that newly added formats (YUY2, UYVY, and VYUY) are not supported
render target view formats. So such formats can be only input of d3d11convert
or d3d11videosink. Another note is that YUY2 format is a very common
format for hardware en/decoders on Windows.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1581>
Add custom IMFMediaBuffer and IMF2DBuffer implementation in order to
keep track of lifecycle of Media Foundation memory object.
By this new implementation, we can pass raw memory of upstream buffer
to Media Foundation without copy.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1518>
The min latency is calculated with the maximum number of frames that
precede any frame, if available, and it is lower than the maximum
number of frames in DBP.
The max latency is calculated with the maxium size of frames in DBP.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1500>
For the allocator to create surfaces with the correct chroma an
fourcc, it should use a surface format, not necessarily the negotiated
format.
Instead of the previous arbitrary extra formats list, the decoder
extracts the valid pixel formats from the given VA config, and pass
that list to the allocator which stores it (full transfer).
Then, when the allocator allocates a new surface, it looks for a
surface color format chroma-compatible with the negotiated image color
format.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1483>
Instead of adding a list of ad-hoc formats for raw caps (I420 and
YV12), the display queries the available image formats and we assume
that driver can download frames in that available format with same
chroma of the config surfaces chroma.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1483>
Some devices (e.g., Surface Book 2, Surface Pro X) will expose
both MediaStreamType_VideoPreview and MediaStreamType_VideoRecord types
for a logical device. And for some reason, MediaStreamType_VideoPreview
seems to be selected between them while initiailzing device.
But I cannot find any documentation for the decision rule.
To be safe, we will select common formats between them.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1478>
Microsoft defines two I420 formats, one is I420, and the other is
IYUV (but both are same, just names are different).
Since both will be converted to GST_VIDEO_FORMAT_I420,
we should check both I420 and IYUV subtypes during
GstVideoFormat to Microsoft's format conversion.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1478>
In case of UWP, documentation from MS is saying that
ActivateAudioInterfaceAsync() method should be called from UI thread.
And the resulting callback might not happen until user interaction
has been made.
So we cannot wait the activation result on constructed() method.
and therefore we should return gst_wasapi2_client_new()
immediately without waiting the result if wasapi2 elements are
running on UWP application.
In addition to async operation fix, this commit includes COM object
reference counting issue around ActivateAudioInterfaceAsync() call.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1466>
Since the commit c29c71ae9d,
device activation method will be called from an internal thread.
A problem is that, CoreApplication::GetCurrentView()
method will return nullptr if it was called from non-UI thread,
and as a result, currently implemented method for accessing ICoreDispatcher
will not work in any case. There seems to be no robust way for
accessing ICoreDispatcher other then setting it by user.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1466>
If the run_async() method is expected to be called from streaming
thread and not from application thread, use INFINITE as timeout value
so that d3d11window can wait UI dispatcher thread in any case.
There is no way to get a robust timeout value from library side.
So the fixed timeout value might not be optimal and therefore
we should avoid it as much as possible.
Rule whether a timeout value can be INFINITE or not is,
* If the waiting can be cancelled by GstBaseSink:unlock(), use INFINITE.
GstD3D11Window:on_resize() is one case for example.
* Otherwise, use timeout value
Some details are, GstBaseSink:start() and GstBaseSink:stop() will be called
when NULL to READY or READY to NULL state change, so there will be no
chance for GstBaseSink:unlock() and GstBaseSink:unlock_stop()
to be called around them. So there is no other way then timeout way.
GstD3D11Window:consturcted() and GstD3D11Window:unprepare() are the case.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1461>
All subclasses are retrieving list to get target output frame, which
can be done by baseclass. And pass the ownership of the GstH264Picture
to subclass so that subclass can clear implementation dependent resources
before finishing the frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1449>
While retrieving supported formats by device, the last return might
not be S_OK in case that it's not supported one by us (e.g., H264, JPEG or so).
But if we've found at least one supported raw video format,
we can keep going on.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1451>
We don't need to duplicate a method for HRESULT error code to string
conversion. This patch is intended to
* Remove duplicated code
* Ensure FormatMessageW (Unicode version) and avoid FormatMessageA
(ANSI version), as the ANSI format is not portable at all.
Note that if "UNICODE" is not defined, FormatMessageA will be aliased
as FormatMessage by default.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1442>
... if target OS version was specified as Windows 10.
When enabled, desktop application can select target capture
implementation between WinRT and Win32
via GST_USE_MF_WINRT_CAPTURE environment
(e,g., GST_USE_MF_WINRT_CAPTURE=1 for WinRT impl.).
Default is Win32 implementation in case of desktop target.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1434>
Because the valid input formats for screen content coding extension is
a subset of input formats for range extension, user must specify the
profile for screen content coding extension in the caps filter
Example:
gst-launch-1.0 videotestsrc ! video/x-raw,format=NV12 ! msdkh265enc
low-power=1 ! video/x-h265,profile=screen-extended-main ! fakesink
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1389>
With the asynchronous slice decoding, we only queue up to 2 slices
per frames. That side effect is that now we are dequeuing bitstream
buffers in both decoding and presentation order. This would lead to
a bitstream buffer from a previous frame being dequeued instead of
the expected last slice buffer and lead to us trying to queue an
already queued bitstream buffer.
We now fix this by tracking pending requests. As request are executed
in decoding order, we marking a request done, we can effectively
dequeue bitstream buffer from all previous request, as they have been
executed already.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1395>
The decoder is not being access from multiple threads, instead it is
always protected by the streaming lock. For this reason, a
GstAtomicQueue for the request pool is overkill and may even introduce
unneeded overhead. Use a GstQueueArray in replacement, the
GstQueueArray is a good fit since the number of item is predictable and
unlikely to vary at run-time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1395>
In slice mode, we'll do one request per slice. In order to recycle
bitstream buffer, and not run-out, wait for the last pending
request to complete and mark it done.
We only wait after having queued the current slice in order to reduce
that potential driver starvation and maintain performance (using dual
buffering).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1395>
New plugin with an element for H.264 decoding with VA-API. This novel
approach, different from gstreamer-vaapi, uses gstcodecs library for
state handling.
The code is expected to looks cleaner because it uses VA-API without
further layers or wrappers.
* It uses the first supported DRM device as default VA display (other
displays will be supported through user's GstContext)
* Requires libva >= 1.6
* No multiview/stereo profiles neither interlaced streams because
gstcodecs doesn't handle them yet
* It is incompatible with gstreamer-vaapi
* Even if memory:VAMemory is exposed, it is not handled yet by any
other element
* Caps templates are generated dynamically querying VAAPI, but YV12
and I420 are added for system memory caps because they seem to be
supported for all the drivers when downloading frames onto main
memory, as they are used by xvimagesink and others, avoiding color
conversion.
* Surfaces aren't bounded to context, so they can grow beyond the DBP
size, allowing smooth reverse playback.
* There isn't yet error handling and recovery.
* 10-bit H.264 streams aren't supported by libva.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1379>
Current shader code is not compatible with HLSL profile "ps_4_0_level_9_3"
or lower. So d3dcompiler cannot compile our shader code in that case.
Note that VirtualBox is one known driver which doesn't support currently
implemented shader code.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1343>