User is seeing corrupted display when running `videotestsrc !
video/x-raw,format=NV12,width=xxx,height=xxx ! msdkh265enc ! msdkh265dec
! glimagesink` with changed frame size, e.g. from 1920x1080 to 1920x240
The root cause is a same dmabuf fd is used for frames with
different size, which causes some unexpected result. This patch requires
cached response is used for frames with same size only for DMABuf, so a
dmabuf fd can't be used for frames with different size any more.
Don't specify the resolution of backbuffer. Then dxgi will let us know the
actual client area. When upstream resolution is chagned, updating the size
of backbuffer without the consideration for client size would cause mismatch
between them.
Setting the CUVID_PKT_DISCONTINUITY implies clearing any past information
about the stream in the decoder. The GStreamer discont flag is used for
discontinuity caused by a seek, for first buffer and if a buffer was
dropped. In the first two cases, the parsers and demuxers should ensure we
start from a synchronization point, so it's unlikely that delta will be
matched against the wrong state.
For packet lost, the discontinuity flag will prevent the decoder from doing
any concealment, with a result that ca be much worst visually, or freeze the
playback until an IDR is met. It's better to let the decoder handle that for
us.
Removing this flag, also workaround a but in NVidia parser that makes it
ignore our ENDOFFRAME flag and increase the latency by one frame.
This sets the CUVID_PKT_ENDOFPICTURE flag in order to inform the decoder that
we have a complete picture. This should remove one frame latency otherwise
introduce by NVidia parser.
This patch fixed compiler warning below:
[1/4] Compiling C object 'sys/msdk/dc44ea0@@gstmsdk@sha/gstmsdkvpp.c.o'.
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c: In function
‘gst_msdkvpp_context_prepare’:
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c:214:7: warning: suggest
parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ to ‘~’
[-Wparentheses]
Our context is non-persistent, and we propagate it throughout the
pipeline. This means that if we try to reuse any gstmsdk element by
removing it from the pipeline and then re-adding it, we'll clone the
mfxSession and create a new gstmsdk context as a child of the old one
inside `gst_msdk_context_new_with_parent()`.
Normally this only allocates a few KB inside the driver, but on
Windows it seems to allocate tens of MBs which leads to linearly
increasing memory usage for each PLAYING->NULL->PLAYING state cycle
for the process. The contexts will only be freed when the pipeline
itself goes to `NULL`, which would defeat the purpose of dynamic
pipelines.
Essentially, we need to optimize the case in which the element is
removed from the pipeline and re-added and the same context is re-set
on it. To detect that case, we set the context on `old_context`, and
compare it to the new one when preparing the context. If they're the
same, we don't need to do anything.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/946
Split it out into a separate function with early exits to make the
flow clearer, and document what the function is doing clearly.
No functional changes.
We weren't using the correct calling convention when calling CUDA and
CUVID APIs. `CUDAAPI` is `__stdcall` on Windows. This was working fine
on x64 because `__stdcall` is ignored and there's no special calling
convention. However, on x86, we need to use `__stdcall`.
Before this change decoder used the oldest frame in the list to pair it
with the decoded surface. This only works when there's a perfect stream
like HEADERS,SYNCPOINT,DELTA...
When playing RTSP streams we can get imperfect streams like HEADERS,
DELTA,SYNCPOINT,DELTA... In this case decoder drops the frames
between HEADERS and SYNCPOINT which leads into using wrong PTS on
the output frames.
With this change we inject the input PTS into the bitstream and use it
to align the internal frame list with the actually decoded position.
Fixes playback with:
```
gst-launch-1.0 rtspsrc location=... latency=0 drop-on-latency=1 ! ...
```
Hard-coded 16x16 resolution is likely to differ from the device's support
in most cases. If we can use NV_ENC_CAPS_WIDTH_MIN and NV_ENC_CAPS_HEIGHT_MIN,
update pad template with returned value.
need_reconfig is added to allow sub class requires a reconfig when
the input frame or the MetaData (e.g. GstVideoRegionOfInterestMeta)
attached to the input frame is changed.
`pipe()` isn't used since 15927b6511,
and `socketpair()` from `#include <sys/socket.h>` is used only in the
examples. In practice, you can use probably also use anything that
allows you to create fd pairs, such as named pipes or anonymous pipes.
We use the cross-platform GstPollFD API in the plugin.
Use consistent memory layout between dxva and other shader use case.
For example, use DXGI_FORMAT_NV12 texture format instead of
two textures with DXGI_FORMAT_R8_UNORM and DXGI_FORMAT_R8G8_UNORM.
This reverts commit ddd13fc7c0
Dynamic usage can reduce the number of copy per frame but make
things complicated and the benefit seems to not significant.
Also since we don't provide _map() method for the dynamic usage,
application cannot read buffers which make "last-sample" property
unusable in case of d3d11videosink.
Commit a1584b6 caused big performance drop if the downstream element
is not a msdk element because it is very slow to read data from video
memory directly.
This reverts commit a1584b6f99.
If 8 bit are required by the device/mode then it will be converted internally
by the SDK, but the SDK won't automatically convert from 8 to 10 bit. As
such, always use 10 bit VANC.
Some devices require configuring also a 10 bit video format when using
10 bit VANC is required but those would fail regardless and the
application would have to configure the correct video format.
With newer versions of the SDK this information can be retrieved via the
BMDDeckLinkVANCRequires10BitYUVVideoFrames flag but we don't use a new
enough SDK version yet to extract this information.
Although the target platform of D3D11 decoding API are both desktop and UWP app,
DXVA header is blocked by "WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)"
which is meaning that that's only for desktop app.
To workaround this inconsistent annoyingness, we need to define WINAPI_PARTITION_DESKTOP
regardless of target WinAPI partition.
The codec profile should be consistent with the frame fourcc code, this
fixes pipeline below:
gst-launch-1.0 videotestsrc ! \
video/x-raw,width=320,height=240,format=P010_10LE ! msdkvp9enc ! \
fakesink
The frame width and height is rounded up to 128 and 32 since commit
8daac1c, so the width, height for initialization should be rounded up to
128 and 32 too because the MSDK VP9 encoder will do some check on width
and height.
Sample pipeline:
gst-launch-1.0 videotestsrc ! \
video/x-raw,width=320,height=240,format=NV12 ! msdkvp9enc ! fakesink
Renegotiation was implemented for bitrate change. We can re-use
the same sequence when video info changes except that this can be
executed right away when receiving the new input format. I.e. no
need to wait for the next call to handle_frame.
The block that sets use_video_memory flag is after the
the condition `if gst_msdk_context_prepare` but it
always returns false when there is no other msdk elements.
So the decoder ends up with use_video_memory as FALSE.
Note that msdkvpp always set use_video_memory as TRUE.
When use_video_memory is FALSE then the msdkdec allocates
the output frames with posix_memalign (see gstmsdksystemmemory.c).
The result is then copied back to the GstVideoPool's buffers
(or to the downstream pool's buffers if any).
When use_video_memory is TRUE then the msdkdec uses vaCreateSurfaces
to create vaapi surfaces for the hw decoder to decode into
(see gstmsdkvideomemory.c). The result is then copied to either
the internal GstVideoPool and to the downstream pool if any.
(vaDeriveImage/vaMapBuffer is used in order to read the surfaces)
Use boolean instead of GstFlowReturn as declared.
Note that since base class does not check return value of GstVideoDecoder::flush(),
this would not cause any change of behavior.
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/924
is trying to use video memory for decoding on Linux, which reveals a
hidden bug in msdkdec.
For video memory, it is possible that a locked mfx surface is not used
indeed and it will be un-locked later in MSDK, so we have to check the
associated MSDK surface to find out and free un-used surfaces, otherwise
it is easy to exhaust all pre-allocated mfx surfaces and get errors below:
0:00:00.777324879 27290 0x564b65a510a0 ERROR default
gstmsdkvideomemory.c:77:gst_msdk_video_allocator_get_surface: failed to
get surface available
0:00:00.777429079 27290 0x564b65a510a0 ERROR msdkbufferpool
gstmsdkbufferpool.c:260:gst_msdk_buffer_pool_alloc_buffer:<msdkbufferpool0>
failed to create new MSDK memory
Note the sample code in MSDK does similar thing in
CBuffering::SyncFrameSurfaces()
A ID3D11Texture2D memory can consist of multiple planes with array.
For array typed memory, GstD3D11Allocator will allocate new GstD3D11Memory
with increased reference count to the ID3D11Texture2D but different array index.
Even if one of downstream d3d11 elements can support dynamic-usage memory,
another one might not support it. Also, to support dynamic-usage,
both upstream and downstream d3d11device must be the same object.
If d3d11colorconvert element is configured, do color space conversion
regardless of the device type whether it's S/W emulation or real H/W.
Since d3d11colorconvert is no more a child of d3d11videosinkbin,
we don't need this behavior. Note that previous code was added to
avoid color space conversion from d3d11videosink if no hardware
device is available (S/W emulation of d3d11 is too slow).
d3d11upload should be able to support upstream d3d11 memory, not only system memory.
Fix for following pipeline
d3d11upload ! "video/x-raw(memory:D3D11Memory)" ! d3d11videosink
borderless top-most style full screen mode support.
Basically fullscreen toggle mode is disabled by default. To enable it
use "fullscreen-toggle-mode" property to allow fullscreen mode change
by user input and/or property.
In some cases, rendering and dxgi (e.g., swapchain) APIs should be
called from window message pump thread, but current design (dedicated d3d11 thread)
make it impossible. To solve it, change concurrency model to locking based one
from single-thread model.
In earlier implementation of d3d11videosink where no shader was implemented,
the aspect ratio and render size were adjusted by manipulating the backbuffer size
with unintuitive formula. Since now we do color conversion and resize using
shader, we can remove the hack.
window event queue now does not lock on the class lock, so we can now shut
it down without releasing the class lock, thus avoiding a potential race when
stopping the sink.
... and use SetParent() WIN32 API when external window is used.
Depending on DXGI swap effect, the external window might not be
reusable by another backend. To preserve the external window's property
and setting, drawing to internal window seems to be safer way.
Otherwise GstVideoDecoder is not finalized and
resources are leaked.
Somehow GST_TRACERS="leaks" GST_DEBUG="GST_TRACER:7" did not catch it.
Valgrind output:
==31645== 22,480 (1,400 direct, 21,080 indirect) bytes in 5 blocks are definitely lost in loss record 5,042 of 5,049
==31645== at 0x4C2FB0F: malloc
==31645== by 0x51D9E88: g_malloc
==31645== by 0x51FA7B5: g_slice_alloc
==31645== by 0x51FAC68: g_slice_alloc0
==31645== by 0x58D9984: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
==31645== by 0x58BADA0: g_object_new
==31645== by 0x8ECA966: gst_video_decoder_init
==31645== by 0x58D99E7: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
An issue can be seen when using msdkh265enc with bitrate change in
playing state. The root cause is the corresponding plugin is loaded
again.
Returning MFX_ERR_UNDEFINED_BEHAVIOR from MSDK just means the plugin has
been loaded, so we may ignore this error when doing configuation again
in the sub class, otherwise the pipeline will be interrupted
If d3d11window does not convert format internally, shader resource view
is not required. Note that shader resource view is used for
color conversion using shader but when conversion is not required,
we just copy input input texture to backbuffer.
In theory it should not happen but it happened to me
in some cases where it failed to allocate some video
buffers so this was a consequence of a corner case.
Better to be safe than sorry.
Can happen if the oldest frame is the current frame
and if gst_video_decoder_finish_frame failed in which
case the current is unref and then drop instead of
just drop.
This patch also removes some assumptions, it was strange
to call unref and finish_frame in gst_msdkdec_finish_task.
In principle when owning a frame, the code should either
unref, or drop or finish.
D3D11 dynamic texture is a special memory type, which is mainly used for
frequent CPU write access to the texture. For now, this texture type
does not support gst_memory_{map,unmap}
* Create staging texture only when the CPU access is requested.
Note that we should avoid the CPU access to d3d11 memory as mush as possible.
Incoming d3d11upload and d3d11download will take this GPU memory upload/download.
* Upload/Download texture memory from/to staging only if it needed, similar to
GstGL PBO implementation.
* Define more dxgi formats for future usage (e.g., color conversion, dxva2 decoder).
Because I420_* formats are not supported formats by dxgi, each plane should
be handled likewise GstGL separately, but NV12/P10 formats might be supported ones.
So we decide the number of d3d11memory per GstBuffer for video memory depending on
OS version and dxgi format. For instance, if NV12 is supported by OS,
only one d3d11memory with DXGI_FORMAT_NV12 texture can be allocated by this commit.
One use case of such texture is DXVA. In case DXVA decoder, it might need to produce decoded data
to one DXGI_FORMAT_NV12 instead of seperate Y and UV planes.
Such behavior will be controlled via configuration of GstD3D11BufferPool and
default configuration is separate resources per plane.
Depending on selected feature level, d3d11 API usage can be different.
Instead of querying the selected feature level by user whenever required,
store it once by d3d11device.
Do not accept any GstD3D11Device context which has different adapter
index from the required one. For example, if a d3d11 element is expecting
d3d11 device with adapter 1 (i.e., the second GPU), any d3d11 device
context having different adapter could not be shared with
the d3d11 element.
Make them consistent with cuda context utils functions.
Put in-only parameter before all in-out parameters, and add _handle()
suffix to native handle getter functions.
In certain cases, the sink's buffer pool will not call the parent's
release_buffer method, so the pool does not clean up properly
after the buffer is released.
Since macOS Mojave (10.14), video permissions have to be explicitly
granted by a user in order to open a video device such as a camera.
This commit adds a check for the current permission status, and tries
to request for permission if applicable.
The whole `src_read()` function is a hot loop since the ringbuffer
thread is waiting on us, and printing to the console from inside it
can easily cause us to miss our deadline.
F.ex., if you had GST_DEBUG=3 and we accidentally missed a device
period, we'd trigger the "reported glitch" warning, which would cause
us to miss another device period, and so on. Let's reduce the log
level so that GST_DEBUG=3 is more usable, and only print buffer flag
info when it's actually relevant.