Observed Intel GPU driver crash when multiple decoders are
configured in a process. It might be because of frequent
command queue alloc/free or too many in-flight decoding commands.
In order to make command queue persistent and limit the number of
in-flight command lists, holds global decoding command queue.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7019>
This was already being used in handle_frame() for errors that happen when queueing a frame for decoding,
let's do the same when a frame is flagged with an error in the output callback.
From quick testing, this makes seeking more reliable (previously, it would sometimes cause a decoding error
and shut the whole decoder down due to GST_FLOW_ERROR).
Also manually sets the max error count to actually stop processing if too many errors occur.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6446>
ReferenceMissingErr is not critical and the simplest solution is to just ignore it. The frame has
the FrameDropped flag set when it occurs, so we can just drop it as usual.
BadDataErr is also not immediately critical, but in its case let's set the ERROR flag,
so the output loop can use GST_VIDEO_DECODER_ERROR to count and error out if it happens too many times.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6446>
Musl does not implement GNU basename and have fixed a bug where the
prototype was leaked into string.h [1], which resullts in compile errors
with GCC-14 and Clang-17+
| sys/uvcgadget/configfs.c:262:21: error: call to undeclared function 'basename'
ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
| 262 | const char *v = basename (globbuf.gl_pathv[i]);
| | ^
Use glib function instead makes it portable across musl and glibc on
linux
[1] https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7a
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7006>
It can be seen as a WA in the case of multi-channel transcoding (like
decoder output to two channels, one for encoder and one for vpp).
Normally, encoder sets min pts of a huge value to avoid negative dts,
while vpp set pts without this addtional huge value, which are likely to
cause input surface pts does not fit with encoder (since both encoder
and vpp accept the same buffer from decoder, means they modify the timestamp
of one mfx surface). So we add this huge value to vpp to ensure enc and
vpp set the same value to input mfx surface meanwhile does not break
encoder's setting min pts for dts protection.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6971>
Adding a property to control error reporting behavior when output
window is closed in playing or paused state. This can be useful
for apps where an app wants to close window even if it's playing
a stream, and the closed window is expected.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6939>
drm_fourcc.h should be picked up via the pkgconfig include, not the
system includedir directly.
Also consolidate the libdrm usage in va and msdk.
All this allows it to be picked up consistently (via the subproject,
for example).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6932>
This patch fixes this critical warning when registering MSDK:
_dma_fmt_to_dma_drm_fmts: assertion 'fmt != GST_VIDEO_FORMAT_UNKNOWN' failed
It was because the HEVC string with possible output formats has an extra space
that could not be parsed correctly.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6853>
Adds a separate vtenc_h265a element (with a _hw variant as usual) for the HEVCWithAlpha codec type.
Decided to go with a separate element to not break existing uses of the normal HEVC encoder.
The preserve_alpha property is still only used for ProRes, no need for it here because we explicitly say we want alpha
when using the new element.
For now, the HEVCWithAlpha has an issue where it does not throttle the amount of input frames queued internally.
I added a quick workaround where encode_frame() will block until enqueue_frame() callback notifies it that some space
has been freed up in the internal queue. The limit was set to 5, which should be enough I guess? Hopefully this is not
too prone to race conditions.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6664>
Conceptually identical to the present signal of d3d11videosink.
This signal will be emitted with current render target
(i.e., swapchain backbuffer) and command queue. Signal handler
can record GPU commands for an overlay image or to blend
an image to the render target.
In addition to d3d12 resources, videosink will send
d3d11 and d2d resources depending on "overlay-mode"
property, so that signal handler can render by using
preferred/required DirectX API.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6838>
The idea of using separate command queue per videosink was that
swapchain is bound to a command queue and we need to flush the
command queue when window size is changed. But the separate
queue does not seem to improve performance a lot.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6838>
The prime_fds for multi planes may be the same. For example, on Intel's
platform, the NV12 surface may have the same FD for the plane0 and the
plane1. Then, the DRM_IOCTL_GEM_CLOSE will close the same handle twice
and get an "Invalid argument 22" error the second time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6914>
The encoder can specify the a preferred_output_delay value to get better throughput
performance. The higher delay may get better HW performance, but it may increases
the encoder and pipeline latency.
When the output queue length is smaller than preferred_output_delay, the encoder
will not block to waiting for the encoding output. It will continue to prepare and
send more commands to GPU, which may improve the encoder throughput performance.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4359>
The FORCE_KEYFRAME frame which has GST_VIDEO_CODEC_FRAME_FLAG_FORCE_KEYFRAME
bit set should be the sync point. So we should let it be an IDR frame to begin
a new GOP, rather than just promote it to an I frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6619>
The FORCE_KEYFRAME frame which has GST_VIDEO_CODEC_FRAME_FLAG_FORCE_KEYFRAME
bit set should be the sync point. So we should let it be an IDR frame to begin
a new GOP, rather than just promote it to an I frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6619>
Set the P and B frame qp to I frame value to avoid generating delta
QP between different frame types. For ICQ and QVBR modes, we can
only set the qpi value, so the qpp and qpb values should be set to
the same value as the qpi.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6841>
Address below message reported by SDK debug layer.
ID3D12Device::CheckFeatureSupport: Unsupported Decode Profile Specified.
Use ID3D12VideoDevice::CheckFeatureSupport with D3D12_FEATURE_VIDEO_DECODE_PROFILES
to retrieve a list of supported profiles
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6839>
Certain V4L2 fourccs don't (yet) have DRM counter parts, in which case
we can't create DMA_DRM caps for them. This is usually the case for
specific tilings, which are represented as modifiers for DMA formats.
While using these tilings is generally preferable - because of e.g.
lower memory usage - it can result in additional conversion steps when
interacting with DMA based APIs such as GL, Vulkan or KMS. In such cases
using a DMA compatible format usually ends up being the better option.
Before the addition of DMA_DRM caps, this was what playbin3 ended up
requesting in various cases - e.g. prefering NV12 over NV12_4L4 - but
the addition of DMA_DRM caps seems to confuse the selection logic.
As a simple and quite robust solution, assume that peers supporting
DMA_DRM caps always prefer these and reorder the caps accordingly.
In the future we plan to have a translation layer for cases where
there is a matching fourcc+modifier pair for a V4L2 fourcc, ensuring
optimal results.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6645>
1. The PTS of all frames should not be changed.
2. Just update the DTS based on the PTS. For the frame which is not
reordered, the DTS is equal to PTS. For frame which is reordered,
the DTS is equal to previous DTS. For example:
Input: F0[D0, P0] -- F1[D1, P1] -- F2[D2, P2] -- F3[D3, P3]
Output: F0[I, D0, P0] -- F3[P, D0, P3] -- F1[B, D1, P1] -- F2[B, D2, P2]
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6703>
1. The PTS of all frames should not be changed.
2. Just update the DTS based on the PTS. For the frame which is not
reordered, the DTS is equal to PTS. For frame which is reordered,
the DTS is equal to previous DTS. For example:
Input: F0[D0, P0] -- F1[D1, P1] -- F2[D2, P2] -- F3[D3, P3]
Output: F0[I, D0, P0] -- F3[P, D0, P3] -- F1[B, D1, P1] -- F2[B, D2, P2]
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6703>
Fixing deadlock in below case
* GC lock is taken by background thread, and the background thread calls
gst_d3d12_ipc_client_release_imported_data() which takes ipc lock
* ipc lock is already taken in ipc thread and trying to pushing GC data
via gst_d3d12_command_queue_set_notify()
* gst_d3d12_command_queue_set_notify() is trying to take GC lock
but it's already taken by background thread
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6749>
Ideally, GPU waiting should be scheduled just before executing command list.
But handling the case outside of converter is a bit complicated.
Under an assumption that constructed command list will be executed
immediately, schedules GPU-side waiting inside of conversion method
to simplify the flow.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6749>
A DPB buffer held by codec picture object may not be writable
at the moment, then gst_buffer_make_writable() will unref passed buffer.
Specifically, the use after free or double free can happen if:
* Crop meta of buffer copy is required because of non-zero
top-left crop position
* zero-copy is possible with crop meta
* A picture was duplicated, interlaced h264 stream for example
Interlaced h264 stream with non-zero top-left crop position
is not very common but it's possible configuration in theory.
Thus gst_buffer_make_writable() should be called with
GstVideoCodecFrame.output_buffer directly.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6706>
A DPB buffer held by codec picture object may not be writable
at the moment, then gst_buffer_make_writable() will unref passed buffer.
Specifically, the use after free or double free can happen if:
* Crop meta of buffer copy is required because of non-zero
top-left crop position
* zero-copy is possible with crop meta
* A picture was duplicated, interlaced h264 stream for example
Interlaced h264 stream with non-zero top-left crop position
is not very common but it's possible configuration in theory.
Thus gst_buffer_make_writable() should be called with
GstVideoCodecFrame.output_buffer directly.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6706>
The caps obtained from parsing the allocation query is borrowed and
should not be unreffed. This fixes criticals assertion introduced in
1.24.1.
(gst-launch-1.0:242): GStreamer-CRITICAL **: 19:48:02.667:
gst_mini_object_unref: assertion 'GST_MINI_OBJECT_REFCOUNT_VALUE (mini_object) > 0' failed
Fixes: 5189e8b956 ("v4l2codecs: decoders: Add DMA_DRM caps support")
Closes#3462
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6679>
GstD3D12Device objetct's internal resources are singletons per adapter
already though, the object itself is not a singleton.
Due to the singleton design (unlike other APIs such as d3d11),
d3d12 device context sharing is not a strict requirement
for zero-copy, but handles context ones to make things less noisy.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6513>
Adding RGBA, RGBx, BGRA, BGRx, VUYA and RGB10A2_LE format support for performance.
However, these formats are not still recommended if upstream can support
native YUV formats (e.g., NV12, P010) since NVENC does not expose
conversion related optiones. Note that VUYA format is 4:4:4 YUV format
already but NVENC runtime will convert it to 4:2:0 format internally
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6417>
If parameters remain similar enough to avoid either encoder reopening
or downstream renegotiation, avoid it.
This is going to be useful for dynamic parameters setting.
To check if the stream parameters changed, so the internal encoder has
to be closed and opened again, are required two steps:
1. If input caps, format, profile, chroma or rate control mode have changed.
2. If any of the calculated variables and element properties have changed.
Later on, only if the output caps also changed, the pipeline
is renegotiated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6441>
If parameters remain similar enough to avoid either encoder reopening
or downstream renegotiation, avoid it.
This is going to be useful for dynamic parameters setting.
To check if the stream parameters changed, so the internal encoder has
to be closed and opened again, are required two steps:
1. If input caps, format, profile, chroma or rate control mode have changed.
2. If any of the calculated variables and element properties have changed.
Later on, only if the output caps also changed, the pipeline
is renegotiated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6441>
The last frame which has the smallest diff should be consider as
the first choice rather than the golden frame. Especially when only
one reference available, this way can improve the BD rate about 5
percentage.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6379>
Some extreme case such as "videotestsrc pattern=1" can generate pure
white noise videoes, for which encoder may generate too big output
for current coded buffer size. We now consider the qindex and bitrate
to avoid that.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6483>
It might happen that the key event arrives when the d3d11videosink
is stopping. In case of GstD3D11WindowWin32 it can raise a
navigation event even when the sink is already freed, because the
window object's refcount may reach 0 in the window thread. In
other words sometimes the GstD3D11WindowWin32 lives few ms more
then the GstD3D11VideoSink, because it's freed asynchronously.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6476>
In the case of multi-channels transcoding, a context with child
sesseion can be parent for others, so we need to check if the
msdkcontext has any child session in the list to avoid session
leaks. Otherwise, we will see the failure of closing a parent
session because one of its child's child session not released.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6259>
Calling gst_pad_peer_query_caps() without a filter can give us EMPTY caps, whereas all the code below
assumes that's not the case. Replacing query+intersect with a filtered query ensures we always get a subset
of the template caps back.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6429>
In an early non-linked scenario, this was causing a ton of criticals about the queue array,
because the output callback would still fire for leftover frames that were still being processed by VT
at the time the output loop stopped. This makes sure they're flushed correctly as well.
Also renames gst_vtdec_loop to gst_vtdec_output_loop for consistency with related functions.
wip
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6397>
Sometimes a call to negotiate (and thus drain) can happen from the output loop
(via finish_frame()), which will tell VT to output all internal frames, but that won't succeed
if we happen to decide to wait for the queue to empty (because the loop is waiting for draining to finish and
will not make space in the queue!). This commit adds an override for the queue size limit if we're draining/flushing.
This bug could happen for any formats, but was especially obvious for ProRes, which has dpb_size of 0.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6397>
Because ID3D12Device objects are singletons per adapter,
GstD3D12Device was following the API design, that is, keep track
of global GstD3D12Device objects and reuses it.
That means ID3D12Device object can be released at the time
when GstD3D12Device is destroyed.
But exetrnal APIs such as NVENC does not seem to be happy
with the released ID3D12Device, that could be a driver bug though.
Let's hold already opened ID3D12Device permanently without releasing
it for now.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6395>
In order to simplify caps negotiations for clients and, notably, be more
compatible with va* decoders.
Crucially this allows clients to know ahead of time whether buffers will
actually be DMABufs.
Similar to GstVaBaseDec we only announce system memory caps if the peer
has ANY caps. Further more, and again like va decoders, we fail in
`decide_allocation()` if DMA_DRM caps are used without VideoMeta.
Apart from buggy peers this can happen e.g. when a peer with ANY caps
is used in combination with caps filters.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
Most importantly rely on video info helpers instead of manual parsing
of caps, which will allow us to use additional helpers in the future.
While on it, tighen the check for supported formats - failing that
indicates a bug in caps negotiation - and make some style changes.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
This ensures we don't create filter caps that are not supported by the
individual codec implementations, as well as that the resulting caps
have the required fields so they can be turned into a GstVideoFormat.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
Do not chain up to parent's GstBufferPool::start() which will do
preallocation. We don't want it to be preallocated
since there are various cases where negotiated downstream buffer pool is
not used at all (e.g., zero-copy decoding, IPC elements).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6326>
This fixes a crash in `gst_va_h264_enc_class_init` and `gst_va_h265_enc_class_init`
(and probably also in gst_va_av1_enc_class_init) when calling
`g_object_class_install_properties (object_class, n_props, properties);`
When rate_control_type is 0, the following code is executed in :
```
} else {
n_props--;
properties[PROP_RATE_CONTROL] = NULL;
}
```
n_props has initially a value of N_PROPERTIES but PROP_RATE_CONTROL
is not the last element in the array, so it's making
g_object_class_install_properties fail to iterate over the
properties array.
This applies the same fix to gstvah264enc.c, gstvah265enc.c and
gstvaav1enc.c.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6319>
osxaudio has a few helper methods potentially useful in atdec (or future atenc), like GStreamer -> CoreAudio
channel mapping. Doesn't make sense to duplicate them in applemedia, and atdec is the only audio-oriented
element there anyway.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6223>
Provide a clock from the source that is a monotonic system clock with
the rate corrected based on the measured and ideal capture rate of the
frames.
If this clock is selected as pipeline clock, then provide perfect
timestamps to downstream.
Otherwise, if the pipeline clock is the monotonic system clock, use the
internal clock for converting back to the monotonic system clock.
Otherwise, use the monotonic system clock time calculated in the above
case and convert that to the pipeline clock.
In all cases this will give a smoother time than the previous code,
which simply took the difference between the driver provided capture
time and the current real-time clock time, and applied that to the
current pipeline clock time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6208>
Otherwise there's a small window between querying the state and doing
the transfer in which a frame could be dropped, and we would then output
the frame right after the dropped one as if it was the dropped frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6208>
In low delay B mode, the P frame is converted as B frame with forward
references. For example, One P frame may refers to P-1, P-2 and P-3 in
list0 and refers to P-3, P-2 and P-1 in list1.
So the num in list0 and list1 does not reflect the forward_num and
backward_num. The vaapi does not provide ref num for forward or backward
so far. In this case, we just consider the backward_num to be 1 conservatively.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6249>