GstD3D12Device objetct's internal resources are singletons per adapter
already though, the object itself is not a singleton.
Due to the singleton design (unlike other APIs such as d3d11),
d3d12 device context sharing is not a strict requirement
for zero-copy, but handles context ones to make things less noisy.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6513>
Adding RGBA, RGBx, BGRA, BGRx, VUYA and RGB10A2_LE format support for performance.
However, these formats are not still recommended if upstream can support
native YUV formats (e.g., NV12, P010) since NVENC does not expose
conversion related optiones. Note that VUYA format is 4:4:4 YUV format
already but NVENC runtime will convert it to 4:2:0 format internally
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6417>
If parameters remain similar enough to avoid either encoder reopening
or downstream renegotiation, avoid it.
This is going to be useful for dynamic parameters setting.
To check if the stream parameters changed, so the internal encoder has
to be closed and opened again, are required two steps:
1. If input caps, format, profile, chroma or rate control mode have changed.
2. If any of the calculated variables and element properties have changed.
Later on, only if the output caps also changed, the pipeline
is renegotiated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6441>
If parameters remain similar enough to avoid either encoder reopening
or downstream renegotiation, avoid it.
This is going to be useful for dynamic parameters setting.
To check if the stream parameters changed, so the internal encoder has
to be closed and opened again, are required two steps:
1. If input caps, format, profile, chroma or rate control mode have changed.
2. If any of the calculated variables and element properties have changed.
Later on, only if the output caps also changed, the pipeline
is renegotiated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6441>
The last frame which has the smallest diff should be consider as
the first choice rather than the golden frame. Especially when only
one reference available, this way can improve the BD rate about 5
percentage.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6379>
Some extreme case such as "videotestsrc pattern=1" can generate pure
white noise videoes, for which encoder may generate too big output
for current coded buffer size. We now consider the qindex and bitrate
to avoid that.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6483>
It might happen that the key event arrives when the d3d11videosink
is stopping. In case of GstD3D11WindowWin32 it can raise a
navigation event even when the sink is already freed, because the
window object's refcount may reach 0 in the window thread. In
other words sometimes the GstD3D11WindowWin32 lives few ms more
then the GstD3D11VideoSink, because it's freed asynchronously.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6476>
In the case of multi-channels transcoding, a context with child
sesseion can be parent for others, so we need to check if the
msdkcontext has any child session in the list to avoid session
leaks. Otherwise, we will see the failure of closing a parent
session because one of its child's child session not released.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6259>
Calling gst_pad_peer_query_caps() without a filter can give us EMPTY caps, whereas all the code below
assumes that's not the case. Replacing query+intersect with a filtered query ensures we always get a subset
of the template caps back.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6429>
In an early non-linked scenario, this was causing a ton of criticals about the queue array,
because the output callback would still fire for leftover frames that were still being processed by VT
at the time the output loop stopped. This makes sure they're flushed correctly as well.
Also renames gst_vtdec_loop to gst_vtdec_output_loop for consistency with related functions.
wip
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6397>
Sometimes a call to negotiate (and thus drain) can happen from the output loop
(via finish_frame()), which will tell VT to output all internal frames, but that won't succeed
if we happen to decide to wait for the queue to empty (because the loop is waiting for draining to finish and
will not make space in the queue!). This commit adds an override for the queue size limit if we're draining/flushing.
This bug could happen for any formats, but was especially obvious for ProRes, which has dpb_size of 0.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6397>
Because ID3D12Device objects are singletons per adapter,
GstD3D12Device was following the API design, that is, keep track
of global GstD3D12Device objects and reuses it.
That means ID3D12Device object can be released at the time
when GstD3D12Device is destroyed.
But exetrnal APIs such as NVENC does not seem to be happy
with the released ID3D12Device, that could be a driver bug though.
Let's hold already opened ID3D12Device permanently without releasing
it for now.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6395>
In order to simplify caps negotiations for clients and, notably, be more
compatible with va* decoders.
Crucially this allows clients to know ahead of time whether buffers will
actually be DMABufs.
Similar to GstVaBaseDec we only announce system memory caps if the peer
has ANY caps. Further more, and again like va decoders, we fail in
`decide_allocation()` if DMA_DRM caps are used without VideoMeta.
Apart from buggy peers this can happen e.g. when a peer with ANY caps
is used in combination with caps filters.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
Most importantly rely on video info helpers instead of manual parsing
of caps, which will allow us to use additional helpers in the future.
While on it, tighen the check for supported formats - failing that
indicates a bug in caps negotiation - and make some style changes.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
This ensures we don't create filter caps that are not supported by the
individual codec implementations, as well as that the resulting caps
have the required fields so they can be turned into a GstVideoFormat.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5890>
Do not chain up to parent's GstBufferPool::start() which will do
preallocation. We don't want it to be preallocated
since there are various cases where negotiated downstream buffer pool is
not used at all (e.g., zero-copy decoding, IPC elements).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6326>
This fixes a crash in `gst_va_h264_enc_class_init` and `gst_va_h265_enc_class_init`
(and probably also in gst_va_av1_enc_class_init) when calling
`g_object_class_install_properties (object_class, n_props, properties);`
When rate_control_type is 0, the following code is executed in :
```
} else {
n_props--;
properties[PROP_RATE_CONTROL] = NULL;
}
```
n_props has initially a value of N_PROPERTIES but PROP_RATE_CONTROL
is not the last element in the array, so it's making
g_object_class_install_properties fail to iterate over the
properties array.
This applies the same fix to gstvah264enc.c, gstvah265enc.c and
gstvaav1enc.c.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6319>
osxaudio has a few helper methods potentially useful in atdec (or future atenc), like GStreamer -> CoreAudio
channel mapping. Doesn't make sense to duplicate them in applemedia, and atdec is the only audio-oriented
element there anyway.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6223>
Provide a clock from the source that is a monotonic system clock with
the rate corrected based on the measured and ideal capture rate of the
frames.
If this clock is selected as pipeline clock, then provide perfect
timestamps to downstream.
Otherwise, if the pipeline clock is the monotonic system clock, use the
internal clock for converting back to the monotonic system clock.
Otherwise, use the monotonic system clock time calculated in the above
case and convert that to the pipeline clock.
In all cases this will give a smoother time than the previous code,
which simply took the difference between the driver provided capture
time and the current real-time clock time, and applied that to the
current pipeline clock time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6208>
Otherwise there's a small window between querying the state and doing
the transfer in which a frame could be dropped, and we would then output
the frame right after the dropped one as if it was the dropped frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6208>
In low delay B mode, the P frame is converted as B frame with forward
references. For example, One P frame may refers to P-1, P-2 and P-3 in
list0 and refers to P-3, P-2 and P-1 in list1.
So the num in list0 and list1 does not reflect the forward_num and
backward_num. The vaapi does not provide ref num for forward or backward
so far. In this case, we just consider the backward_num to be 1 conservatively.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6249>
In b_pyramid mode, B frames can be ref and prevPicOrderCntLsb can
be the B frame POC which is smaller than the P frame. This can cause
POC diff bigger than MaxPicOrderCntLsb/2 and generate wrong POC value.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6249>
Gets being released memory back to queue even if allocator is flushing
in order to count the number of outstanding memory objects.
Also, clear queue if there's no outstanding memory object and
allocator is flushing
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6240>
Unprepare method posts WM_GST_D3D11_DESTROY_INTERNAL_WINDOW
command to the window queue, and from that moment considers
internal_hwnd to be released, and so it sets it to null.
The problem is that it's possible that right at that moment
the window thread might be already processing some other
command, or just another command might be already in the queue.
On practice we met a crash when WM_PAINT got processed in between
(unprepare already finished and WM_GST_D3D11_DESTROY_INTERNAL_WINDOW
was not handled yet)
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6187>
When the conversion is only caps feature from memory:VAMemory to system memory,
it's possible to optimize by doing a pseudo pass-through since the va-backed
buffers are the same for system memory buffers.
This change will also mitigates #2940
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6174>
If the allocation query received from downstream doesn't handle GstVideoMeta but
it requests memory:DMABuf caps feature, it's incomplete, so we rather reject the
negotiation.
Both in base decoder, base transform and compositor.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6155>
This is a simplification of the venerable
gst_va_base_dec_get_preferred_format_and_caps_features() function, which
predates since gstreamer-vaapi. It's used to select the format and the
capsfeature to use when setting the output state. It was complex and hard to
follow. This refactor simplifies a lot the algorithm.
The first thing to remove _downstream_has_video_meta() since, most of the time
it will be called before the caps negotiation, and allocation queries make sense
only after caps negotiation. It might work during renegotiation but, in that
case, caps feature change is uncommon. Better a simple and common approach.
Also, for performance, instead of dealing with caps features as strings, GQuarks
are used.
The refactor works like this:
1. If peer pad returns any caps, the returned caps feature is system memory and
looks for a proper format in the allowed caps.
2. The allowed caps are traversed at most 3 times: one per each valid caps
feature. First VAMemory, later DMABuf, and last system memory. The first to
match in allowed caps is picked, and the first format matching with the
chroma is picked too.
Notice that, right now, using playbin videoconvert never return any.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6154>
This reverts questionable commit 009bc15f33
which looks completely wrong.
The GstWasapi2RingBuffer:buffer_size variable is used to
calculate available buffer size we can write
(i.e., available size = buffer_size - padding_size).
But the commit makes the size to be exactly same as buffer period.
Then, it can confuse this element as if the endpoint buffer is full on
I/O event callback (if padding size is equal to buffer period)
but it's not true.
Fixes: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/2870
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6132>
The global semaphore was never closed/unlinked, causing permission
denied issue if the device is later used by another user. Properly
removing the semaphore when stopping the pipeline would still leave it
open in case of a crash.
With a GStreamer specific name, it was also not preventing other apps to access
the device concurrently.
Finally, if the system has multiple cards, the lock should be per card
and not global (to be confirmed).
Fixes: #3283.
Sponsored-by: Netflix Inc.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6117>
According to recommendation from MS, IDXGIOutputDuplication::ReleaseFrame()
needs to be called just before IDXGIOutputDuplication::AcquireNextFrame()
for performance reasons, so that driver can accumulate dirty rects
and update texture at once. But it seems to cause choppy output.
Do release acquired frame immediately once processing done,
like d3d11 implementation does.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6092>
Fence data could hold GstD3D12Device directly or indirectly.
Then if it's holding last refcount, the device object will
be released from the device object's internal thread,
and will try join self thread.
Delegates it to other global background thread to avoid
self thread joining.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6042>
The output of VP9 and AV1 encoder is a little different from the H264
and H265 encoder, it may contain repeat frames and so the output frame
number may be more than the input. We need to call finish_subframe()
when some frame will be repeated later. So we need to extend the
current prepare_output() virtual function.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3015>
Explicitly calls gst_vtenc_pause_output_loop when going PAUSED->READY to make sure GST_PAD_STREAM_LOCK is not taken.
Before this change, a deadlock would occur if pipeline got stopped right after one output buffer was generated by vtenc.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5933>
Modify the fix_output_format in vpp to directly generate caps with
negotiated src caps, and we have the correct dma caps negotiation in
fix_output_format function. And thus, we can remove the redundant
negotiation of using function pad_accept_memory in vpp.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5845>
The pool currently defaults to performing a layout transition to
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, with some special exceptions for
video usages. This may not be a legal transition depending on the usage.
Provide an API to explicitly control the initial image layout.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5881>
When implementing NDK media support, it would be useful to also have JNI
implementation in the same binary as NDK media compatibility is lower.
As such, implement a rudimentary vtable system for gstamc-codec and
gstamc-format, and allow choosing the implementation at static_init()
time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4115>
This allows the implementations to do custom logic behind the hood. For
example, when NDK implementation is added, the entrypoint can chooses to
statically initialize the NDK implementations or the JNI one.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4115>
With this patch, the caps is registered in the order of memory features
as: VAMemory, DMABuf then raw caps in linux path, and D3D11Memory then
raw caps in windows path. It helps to prioritize the video memory for all
msdk elements when doing negotiation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5898>
Fixing below debug layer report
ID3D12Device::CreateCommittedResource: Ignoring InitialState D3D12_RESOURCE_STATE_COPY_DEST.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
Buffer resource will be automatically promoted to D3D12_RESOURCE_STATE_COPY_DEST
at the very first COPY operation time.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5895>
Since DXGI desktop duplication API does not work with Direct3D12 device,
this element will use Direct3D11 device to acquire frame.
Then other rendering operations (e.g., texture copy, render pipeline) will
happen using Direct3D12 API
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5883>
In case of tier 1 decoder, always use reference-only picture to avoid
fixed-size pool limitation and output decoded picture without
copy even for negative rate. Also do not use copy queue for GPU to GPU
copy. Copy queue is specialized for upload/download and may occupy
PCIE bandwidth. Use direct queue as recommended by vendors.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
On flush event, baseclass will discard all pictures from DPB
but there can be still in-flight commands not finished yet.
Use our command queue, allocator and fence data helper objects
to keep resource available during command execution.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
Shader visible descriptors occupy GPU resource and there are hardware
limits. Thus, in order to minimize the amount of shader visible heaps,
only non shader visible descriptor heap (staging) will be held by d3d12memory.
Then converter will copy the staging descriptor to shader visible
descriptor heap per draw.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
Zero initialization would have overhead and it's not required
most cases except for textures. Use CREATE_NOT_ZEROED flag
in case of buffer resource or if a texture will be rendered without any
prior read operation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
Conversion will happen when constructed command list is executed,
not by converter element. Thus this object should not map output buffer
with write flag which will result in error if multiple threads
are building commands for the same output target frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
Add macro which converts picture frame number to suitable timestamp in
nanoseconds for use in V4L2 VB2 buffer lookup. Since multiple codecs do
the same operation and almost all got it wrong, do it in one place so it
can be fixed in one place again, if needed.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
The GstMpeg2Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecMpeg2Dec *_ref_ts multiplication result is u64 .
```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```
so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during MPEG2 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .
Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
The GstAV1Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecAV1Dec v4l2_av1_frame.*_frame_ts multiplication result is u64 .
```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```
so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during AV1 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .
Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
The GstVp9Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecVp9Dec v4l2_vp9_frame.*_frame_ts multiplication result is u64 .
```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```
so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during VP9 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .
Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
The GstVp8Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecVp8Dec v4l2_vp8_frame.*_frame_ts multiplication result is u64 .
```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```
so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during VP8 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .
Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
The input of the vacompositor may be DMA buffers. And in this case, the input
caps has the format=DMA_DRM, which can not be recognized by base video
aggregator class' find_best_format() function. So we need to override the
update_caps() virtual function.
Also we consider the DMA kind caps in negotiated_src_caps() for output.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
To achieve maximum throughput, waiting on command commit thread
is not ideal. And render-delay will introduce unwanted latency.
Best is to split thread and wait finished decoding job in a dedicated
output thread
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5812>
When creating a new VA pool set config size to zero because it's not used.
Also, given the potential different sizes from software buffer pools and VA
buffer pools, this patch handle that potential different values.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5805>
When creating a new VA pool set config size to zero because it's not used.
Also, given the potential different sizes from software buffer pools and VA
buffer pools, this patch handle that potential different values.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5805>
When creating a new VA pool set config size to zero because it's not used.
Also, given the potential different sizes from software buffer pools and VA
buffer pools, this patch handle that potential different values.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5805>
VA drivers allocate surfaces given their properties, so there's no need to
provide a buffer size to the VA pool.
Though, the buffer size is provided by the driver, or the canonical size
is used for single planed surfaces.
This patch removes the need to provide a size for the function
gst_va_pool_new_with_config() and adds a helper method to retrieve the surface
size, gst_va_pool_get_buffer_size(). Also change the callers accordingly.
Changes for custom VA pool creation will be addressed in the following commits.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5805>