DMABuf allocator already implements DMABuf Sync, meaning that doing
mmap/munmap (unless the mode have changed) is not required. In fact, on
systems with IOMMU it makes the kernel redo the mmu table which is visible
in the CPU usage.
This change reduces CPU usage when decoding
bbb_sunflower_2160p_60fps_normal.mp4 on RK3399 SoC from over 30% to
around 15%.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2152>
Implementation of mem_copy() virtual method for GstVaAllocator.
It's a deep copy where a new VA memory is popped out from the pool or,
if pool is empty, a new memory is allocated. The original memory is
mapped to read, and if its VAImage is not derived and size to copy is
the whole surface, the mapped VAImage of the original memory is put in
the new memory. Otherwise a slow memcpy is done between both memories.
Fixes: #1568
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2136>
The picture buffer (V4L2 CAPTURE buffer) was being released immediatly
when the request was done. This was problematic since even after the
request is done, the picture buffer might still be used as a reference
and should not be reused for further decoding yet.
This change effectively bind the picture buffer lifetime to the request.
So that if the picture is never showned (decode only frame) or the request
queue is full before the buffer is displayed, the picture buffer will
remain alive.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2142>
Different VA drives might have different definitions for RGB32 color
formats because different bit interpretation. Sadly the specification
doesn't clarify these interpretations. So VA users have to figure out
what's the correct mapping with it's rendering color format
definition.
This patch aims to fix the static map structure after the
VAImageFormats are queried. There is another static map with the
different interpretations of the RGB32 formats, and compare them with
the given VAImageFormat, then with the GStreamer color format, update
the mapping table.
Finally, some RGB32 color formats were added.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2129>
Add missing include for sys/ioctl.h so that these warnings dissapear
when compiling:
../subprojects/gst-plugins-bad/sys/v4l2codecs/gstv4l2decoder.c:179:9:
warning: implicit declaration of function ‘ioctl’
[-Wimplicit-function-declaration]
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2140>
If the application did not define one yet, define our own
VK_DEFINE_NON_DISPATCHABLE_HANDLE that is independent of the
architecture.
Vulkan, by default, provides a define that depends on the architecture,
which causes the symbol type to be different. This causes an
architecture dependent .gir file, which then causes multilib
installation problems because the .gir files can't be shared.
Make it possible to override the format specifier and provide
a default one that is compatible with the default non dispatchable
handle.
Return VK_NULL_HANDLE from functions that return a non-dispatchable
handle.
Fixes#1566
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2130>
vapostproc tries to be in passthrough mode as much as possible. But
they might be situations where the user might force to process the
frames. For example, when upstream sets the crop meta and the user
wants VA do that cropping, rather than downstream.
For those situations this property will disable the passthrough mode,
if it's enabled.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2058>
If incoming buffers have crop meta it's done by vapostproc, iif
vapostproc is not in passthrough mode and downstream doesn't handle
it.
This patch announces the crop meta API in proposed bufferpool, while
it stops filtering meta APIs, since it was only filter crop api.
Also if downstream supports crop and video metas, vapostporoc
announces both meta APIs in upstream bufferpool.
Finally, the meta is removed from the buffer if the crop is enabled.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2058>
This new struct describes the input and output GstBuffers to
post-process, including VA flags. It also contains the VASurfaceID and
VARectangle, but those are private, completed inside GstVaFilter.
It is used for pass arguments to gst_va_filter_convert_surface() function.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2058>
When an input buffer needs to be copied into a VA memory, it's
required to create a buffer pool. This patch uses the
propose_allocation() caps to instantiate the allocator and pool,
instead of the negotiated caps, which rather represents the resolution
to display.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2058>
Derived images are direct maps to surfaces bits, but in Intel Gen7 to
Gen9, that memory is not cachable, thus reading can be very slow (it
might produce timeout is tests such as fluster).
This patch tries first to define if derived images are possible, and
later use them only if mapping is not for reading.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2128>
This plugin, for decoders more concretely, assumes that a VA config
can do certain color conversions when mapping frames onto CPU's
memory.
This assumption was valid for i965 and Gallium drivers which generates
valid outputs in bitstreams testers (v.gr. fluster). Nonetheless, iHD,
even when it generates acceptable rendered frames, output's MD5 of
tests weren't valid.
This patch append the image formats, for color conversion when mapping
to memory, for non-iHD drivers.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2128>
Enable zero-copy if downstream proposed pool and therefore decoder
can know the amount of buffer required by downstream.
Otherwise decoder will copy when our DPB pool has no sufficient
buffers for later decoding operation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2097>
Major changes:
* GstD3D11Allocator: This allocator is now device-independent object
which can allocate GstD3D11Memory object for any GstD3D11Device.
User can get this object via gst_allocator_find(GST_D3D11_MEMORY_NAME)
* GstD3D11PoolAllocator: A new allocator implementation for texture pool.
From now on GstD3D11BufferPool will make use of this memory pool allocator
to avoid frequent texture reallocation. That usually happens because
of buffer copy (gst_buffer_make_writable for example)
In addition to that, GstD3D11BufferPool will provide GstBuffer with
GstVideoMeta, because CPU access to a GstD3D11Memory without GstVideoMeta
is almost impossible since GPU drivers needs padding for stride alignment.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2097>
We've been doing retry with 1ms sleep if DecoderBeginFrame()
returned E_PENDING which means application should call
DecoderBeginFrame() again because GPU is busy.
The 1ms sleep() during retry would result in usually about 15ms delay
in reality because of bad clock precision on Windows.
To improve throughput performance, this commit will enable
high precision clock only for NVIDIA platform since
DecoderBeginFrame() call on the other GPU vendors seems to
succeed without retry.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2099>
After the VA filter creation, when changing the element's state from NULL
to READY, immediatly checks for any filter operation requested by the user.
If any, the passthrough mode is disabled early, so there's no need for a
future renegotiation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2094>
When we transform the caps from the sink to src, or vice versa, the
"caps" passed to us may only contain parts of the features. Which
makes our vpp lose some feature in caps and get a negotiation error.
The correct way should be:
Cleaning the format and resolution of that caps, but adding all VA,
DMA features to it, making it a full feature caps. Then, clipping it
with the pad template.
fixes: #1551
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2081>
For reverse playback, we are always copying decoded
frame to downstream buffer. So the pool size can be
and need to be large enough.
In case that forward playback, however, we need to restrict
the max pool size for performance reason. Otherwise decoder
will keep copying decoded texture to downstream buffer pool
if decoding is faster than downstream throughput
performance and also there are queue element between them.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2083>
Decoder might be able to copy decoded texture to the other buffer pool
during playback depending on context. In that case, copied one
has no D3D11_BIND_DECODER bind flag.
If we used ID3D11VideoProcessor previously for decoder texture,
and incoming texture supports ID3D11VideoProcessor as well even if it has no
D3D11_BIND_DECODER flag (having D3D11_BIND_RENDER_TARGET for example),
allow zero-copying instead of using our fallback texture.
Frequent conversion tool change (between ID3D11VideoProcessor and generic shader)
might result in inconsistent image quality.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2084>
... instead of QueryInterface-ing per elements. Note that
ID3D11VideoDevice and ID3D11VideoContext objects might not be available
if device doesn't support video interface.
So GstD3D11Device object will create those objects only when requested.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2079>
Direct3D11 objects are COM, and most COM C APIs are verbose
(C++ is a little better). So, by using C++ APIs, we can make code
shorter and more readable.
Moreover, "ComPtr" helper class (which is C++ only) can be
utilized, that is very helpful for avoiding error-prone COM refcounting
issue/leak.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2077>
Added helper function _update_passthrough() which will define and set
the pass-through mode of the filter, and it'll either reconfigure both
pads or it will just mark the src pad for renegotiation or nothing at
all.
There are cases where both pads have to be reconfigured (direction
changed, for example), other when just src pad has to (filters
updated) or none (changing to ready state).
The requirement of renegotiation depends on the need to enable/disable
its VA buffer pools.
This patch sets pass-through mode by default, so the buffer pools
aren't allocated if no filtering/direction operations are defined,
which is the correct behavior.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2074>
... instead of the largest we ever seen.
Note that d3d11h264dec element holds previously configured DPB size
for later decoder object re-open decision.
This is to fix below case:
1) Initial SPS, required DPB size is 6
- decoder object is opened with DPB size 6
- max_dpb_size is now 6
2) SPS update with resolution change, required DPB size is 1
- decoder object is re-opened with DPB size 1
- max_dpb_size should be updated to 1, but it didn't happen (BUG)
3) SPS update without resolution change, only required DPB size is updated to 6
- decoder object should be re-opened but didn't happen
because we didn't update max_dpb_size at 2).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2056>
The new H.264 uAPI requires that all drivers support
scaling matrix only as an option, when a non-flat
scaling matrix is provided in the bitstream headers.
Take advantage of this and avoid passing the scaling
matrix if not needed.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1624>
Frame-based decoding mode doesn't require SLICE_PARAMS and
PRED_WEIGHTS controls.
Moreover, if the driver doesn't support these two controls, trying
to set them will fail. Fix this by only setting these on
slice-based decoding mode.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1624>
To convert decoded texture into other format, downstream would use
video processor instead of shader. In order for downstream to
be able to use video processor even if we copied decoded texture
into downstream pool, we should set this bind flag. Otherwise,
downstream would keep switching video processor and shader
to convert format which would result in inconsistent image quality.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2051>
The AV1 codec needs to support the film grain feature. When the film
grain feature is enabled, we need two surfaces as the output of the
decoded picture, one without film grain effect and the other one with
it. The first one acts as the reference and is needed for later pictures'
reconstruction, and the second one is the real display output.
So we need to attach another aux surface to the gst buffer/mem and make
that aux surface as the target of vaBeginPicture.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1636>
AS-IS:
D3D11Convert class is baseclass of D3D11ColorConvert and D3D11Scale
* GstD3D11Convert
|_ GstD3D11ColorConvert
|_ GstD3D11Scale
TO-BE:
Introducing a new base class for color conversion and/or rescale elements
* GstD3D11BaseConvert
|_ GstD3D11Convert
|_ GstD3D11ColorConvert
|_ GstD3D11Scale
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2029>
In 6ae24948 the pipeline buffer destroy were removing assuming it
wasn't required. Nonetheless, debugging the code it looks like a
buffer leak in iHD driver since the ID of the buffer kept increasing.
The difference now is that first the filter buffers are destroy first
and later the pipeline buffer.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2023>
Just like the decoder, the vapostproc also needs to copy the output
buffer to raw buffer if downstream elements only supports raw caps
and does not support the video meta.
The pipeline like:
gst-launch-1.0 filesrc location=xxxx ! h264parse ! vah264dec ! \
vapostproc ! capsfilter caps=video/x-raw,width=55,height=128 ! \
filesink location=xxx
needs this logic to dump the data correctly.
fixes: #1523
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2026>
Both wasapi2 and wasapi plugins use WASAPI API. So "device.api=wasapi"
would make sense for the wasapi2 plugin as well. But people would be
confused by the identical "device.api=wasapi" property if intended
plugin is wasapi, not wasapi2. This change will make them distinguishable
by using "device.api" device property.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2024>
One problem that va dmabuf allocator had is when preparing a buffer from
dmabuf memories in the allocator pool, specially when a buffer is composed by
several memories. This memories have to be by certain number and in certain
order.
This patch stores the number of memories and their address in order when a
dmabuf-based buffer is created and when preparing a buffer, it is reconstructed
with this info.
Finally, instead of pushing the memories as soon as they are unrefed, they are
hold until GstVaBufferSurface's ref_mems_count reaches zero (all the memories
related with that buffer/surface are unrefed). Until that happen, all the
memories are pushed back into the queue, locked, assuring that all the memories
related with a single buffer (with the same surface) remain contiguous, so the
buffer reconstruction is assured.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2013>
Instead of removing memories from buffers at reset_buffer()/release_buffer() the
bufferpool operation is kept as originally designed, still the allocator pool is
used too. Thus, this patch restores the buffer size configuration while removing
release_buffer(), reset_buffer() and acquire_buffer() vmethods overloads.
Then, when the bufferpool base class decides to discard a buffer, the VA
surface-based memory is returned to the allocator pool when its last reference
is freed, and later reused if a new buffer is allocated again.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2013>
Add a new element d3d11deinterlace to support deinterlacing.
Similar to d3d11videosink and d3d11compositor, this element is
a wrapper bin of set of child elements including helpful
conversion elements (upload/download and color convert)
to make this element configurable between non-d3d11 elements.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2016>
Since our decoder DPB texture pool cannot be grown once it's
configured, we should pre-allocate sufficient number of textures
for zero-copy playback (but not too many).
The "min buffers" allocation query parameter can be a hint for
the number of required textures in addition to DPB size.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2017>
There were two problems with frame copy:
1. The input video info are from the format color, not form the allocated VA
surface, it's needed to update the sink video info according with the
allocator's data.
2. The parameters of `gst_video_frame_copy()` were backwards.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2007>
transform_size() basetransform vmethod is used when there's no output buffer
pool and allocates a system memory buffer. With VA this cannot be allowed, since
it needs VASurfaces to process.
Thus transform_size() is not required, but to play safe let's return FALSE.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2007>
Unlike other stateless decoder implementations (e.g., VA),
our DPB pool cannot be grown since we are using
texture array (pre-allocated, fixed-size d3d11 texture pool).
So, if there's no more available texture to use,
there's no way other than copying it to downstream's
d3d11 buffer pool. Otherwise deadlock will happen.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2003>
This adds a non-thread safe refcount to the GstV4l2Request. This will
allow holding on more then one request in order to implement render
delay. This is made non-thread safe for speed as we know this will all
happen on the same streaming thread.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1881>
Starting from this patch, all queue and dequeue operation happening
on V4L2 is now abstracted with the request. Buffers are dequeued
automatically when pending requests are marked done and only 1 in-flight
request is now used.
Along with fixing issues with request not being reused with slice
decoders, this change reduces the memory footprint by allocating only
two bitstream buffers.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1881>
* Don't warn for live object, since ID3D11Debug itself seems to be
holding refcount of ID3D11Device at the moment we called
ID3D11Debug::ReportLiveDeviceObjects(). It would report live object
always
* Device might not be able to support some formats (e.g., P010)
especially in case of WARP device. We don't need to warn about that.
* gst_d3d11_device_new() can be used for device enumeration. Don't warn
even if we cannot create D3D11 device with given adapter index therefore.
* Don't warn for HLSL compiler warning. It's just noise and
should not be critical thing at all
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1986>
Add a way to support drawing on application's texture instead of
usual window handle.
To make use of this new feature, application should follow below step.
1) Enable this feature by using "draw-on-shared-texture" property
2) Watch "begin-draw" signal
3) On "begin-draw" signal handler, application can request drawing
by using "draw" signal action. Note that "draw" signal action
should be happen before "begin-draw" signal handler is returned
NOTE 1) For texture sharing, creating a texture with
D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX flag is strongly recommend
if possible because we cannot ensure sync a texture
which was created with D3D11_RESOURCE_MISC_SHARED
and it would cause glitch with ID3D11VideoProcessor use case.
NOTE 2) Direct9Ex doesn't support texture sharing which was
created with D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX. In other words,
D3D11_RESOURCE_MISC_SHARED is the only option for Direct3D11/Direct9Ex interop.
NOTE 3) Because of missing synchronization around ID3D11VideoProcessor,
If shared texture was created with D3D11_RESOURCE_MISC_SHARED,
d3d11videosink might use fallback texture to convert DXVA texture
to normal Direct3D texture. Then converted texture will be
copied to user-provided shared texture.
* Why not use generic appsink approach?
In order for application to be able to store video data
which was produced by GStreamer in application's own texture,
there would be two possible approaches,
one is copying our texture into application's own texture,
and the other is drawing on application's own texture directly.
The former (appsink way) cannot be a zero-copy by nature.
In order to support zero-copy processing, we need to draw on
application's own texture directly.
For example, assume that application wants RGBA texture.
Then we can imagine following case.
"d3d11h264dec ! d3d11convert ! video/x-raw(memory:D3D11Memory),format=RGBA ! appsink"
^
|_ allocate new Direct3D texture for RGBA format
In above case, d3d11convert will allocate new texture(s) for RGBA format
and then application will copy again the our RGBA texutre into
application's own texture. One texture allocation plus per frame GPU copy will hanppen
in that case therefore.
Moreover, in order for application to be able to access
our texture, we need to allocate texture with additional flags for
application's Direct3D11 device to be able to read texture data.
That would be another implementation burden on our side
But with this MR, we can configure pipeline in this way
"d3d11h264dec ! d3d11videosink".
In that way, we can save at least one texture allocation and
per frame texutre copy since d3d11videosink will convert incoming texture
into application's texture format directly without copy.
* What if we expose texture without conversion and application does
conversion by itself?
As mentioned above, for application to be able to access our texture
from application's Direct3D11 device, we need to allocate texture
in a special form. But in some case, that might not be possible.
Also, if a texture belongs to decoder DPB, exposing such texture
to application is unsafe and usual Direct3D11 shader cannot handle
such texture. To convert format, ID3D11VideoProcessor API needs to
be used but that would be a implementation burden for application.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1873>
gstd3d11window_corewindow.cpp(408): warning C4189:
'storage': local variable is initialized but not referenced
gstd3d11window_corewindow.cpp(490): warning C4189:
'self': local variable is initialized but not referenced
gstd3d11window_swapchainpanel.cpp(481): warning C4189:
'self': local variable is initialized but not referenced
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1962>