Because the size of texture array cannot be updated dynamically,
allocator should block the allocation request. This cannot be
done at buffer pool side if this d3d11 memory is shared among
multiple buffer objects. Note that setting NO_SHARE flag to
d3d11 memory is very inefficient. It would cause most likey
copy of the d3d11 texture.
...for color space conversion if available
ID3D11VideoProcessor is equivalent to DXVA-HD video processor
which might use specialized blocks for video processing
instead of general GPU resource. In addition to that feature,
we need to use this API for color space conversion of DXVA2 decoder
output memory, because any d3d11 texture arrays that were
created with D3D11_BIND_DECODER cannot be used for shader resource.
This is prework for d3d11decoder zero-copy rendering and also
for conditional HDR tone-map support.
Note that some Intel platform is known to support tone-mapping
at the driver level using this API on Windows 10.
We've been using NvEncodeAPICreateInstance method to find the supported API
version, but that seems to be insufficient since there is a case
where plugin failed in opening encoding session even if NvEncodeAPICreateInstance
succeeded. Asking driver about the version would be the most certain way.
User is seeing corrupted display when running `videotestsrc !
video/x-raw,format=NV12,width=xxx,height=xxx ! msdkh265enc ! msdkh265dec
! glimagesink` with changed frame size, e.g. from 1920x1080 to 1920x240
The root cause is a same dmabuf fd is used for frames with
different size, which causes some unexpected result. This patch requires
cached response is used for frames with same size only for DMABuf, so a
dmabuf fd can't be used for frames with different size any more.
Don't specify the resolution of backbuffer. Then dxgi will let us know the
actual client area. When upstream resolution is chagned, updating the size
of backbuffer without the consideration for client size would cause mismatch
between them.
Setting the CUVID_PKT_DISCONTINUITY implies clearing any past information
about the stream in the decoder. The GStreamer discont flag is used for
discontinuity caused by a seek, for first buffer and if a buffer was
dropped. In the first two cases, the parsers and demuxers should ensure we
start from a synchronization point, so it's unlikely that delta will be
matched against the wrong state.
For packet lost, the discontinuity flag will prevent the decoder from doing
any concealment, with a result that ca be much worst visually, or freeze the
playback until an IDR is met. It's better to let the decoder handle that for
us.
Removing this flag, also workaround a but in NVidia parser that makes it
ignore our ENDOFFRAME flag and increase the latency by one frame.
This sets the CUVID_PKT_ENDOFPICTURE flag in order to inform the decoder that
we have a complete picture. This should remove one frame latency otherwise
introduce by NVidia parser.
This patch fixed compiler warning below:
[1/4] Compiling C object 'sys/msdk/dc44ea0@@gstmsdk@sha/gstmsdkvpp.c.o'.
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c: In function
‘gst_msdkvpp_context_prepare’:
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c:214:7: warning: suggest
parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ to ‘~’
[-Wparentheses]
Our context is non-persistent, and we propagate it throughout the
pipeline. This means that if we try to reuse any gstmsdk element by
removing it from the pipeline and then re-adding it, we'll clone the
mfxSession and create a new gstmsdk context as a child of the old one
inside `gst_msdk_context_new_with_parent()`.
Normally this only allocates a few KB inside the driver, but on
Windows it seems to allocate tens of MBs which leads to linearly
increasing memory usage for each PLAYING->NULL->PLAYING state cycle
for the process. The contexts will only be freed when the pipeline
itself goes to `NULL`, which would defeat the purpose of dynamic
pipelines.
Essentially, we need to optimize the case in which the element is
removed from the pipeline and re-added and the same context is re-set
on it. To detect that case, we set the context on `old_context`, and
compare it to the new one when preparing the context. If they're the
same, we don't need to do anything.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/946
Split it out into a separate function with early exits to make the
flow clearer, and document what the function is doing clearly.
No functional changes.
We weren't using the correct calling convention when calling CUDA and
CUVID APIs. `CUDAAPI` is `__stdcall` on Windows. This was working fine
on x64 because `__stdcall` is ignored and there's no special calling
convention. However, on x86, we need to use `__stdcall`.
Before this change decoder used the oldest frame in the list to pair it
with the decoded surface. This only works when there's a perfect stream
like HEADERS,SYNCPOINT,DELTA...
When playing RTSP streams we can get imperfect streams like HEADERS,
DELTA,SYNCPOINT,DELTA... In this case decoder drops the frames
between HEADERS and SYNCPOINT which leads into using wrong PTS on
the output frames.
With this change we inject the input PTS into the bitstream and use it
to align the internal frame list with the actually decoded position.
Fixes playback with:
```
gst-launch-1.0 rtspsrc location=... latency=0 drop-on-latency=1 ! ...
```
Hard-coded 16x16 resolution is likely to differ from the device's support
in most cases. If we can use NV_ENC_CAPS_WIDTH_MIN and NV_ENC_CAPS_HEIGHT_MIN,
update pad template with returned value.
need_reconfig is added to allow sub class requires a reconfig when
the input frame or the MetaData (e.g. GstVideoRegionOfInterestMeta)
attached to the input frame is changed.
`pipe()` isn't used since 15927b6511,
and `socketpair()` from `#include <sys/socket.h>` is used only in the
examples. In practice, you can use probably also use anything that
allows you to create fd pairs, such as named pipes or anonymous pipes.
We use the cross-platform GstPollFD API in the plugin.
Use consistent memory layout between dxva and other shader use case.
For example, use DXGI_FORMAT_NV12 texture format instead of
two textures with DXGI_FORMAT_R8_UNORM and DXGI_FORMAT_R8G8_UNORM.
This reverts commit ddd13fc7c0
Dynamic usage can reduce the number of copy per frame but make
things complicated and the benefit seems to not significant.
Also since we don't provide _map() method for the dynamic usage,
application cannot read buffers which make "last-sample" property
unusable in case of d3d11videosink.
Commit a1584b6 caused big performance drop if the downstream element
is not a msdk element because it is very slow to read data from video
memory directly.
This reverts commit a1584b6f99.
If 8 bit are required by the device/mode then it will be converted internally
by the SDK, but the SDK won't automatically convert from 8 to 10 bit. As
such, always use 10 bit VANC.
Some devices require configuring also a 10 bit video format when using
10 bit VANC is required but those would fail regardless and the
application would have to configure the correct video format.
With newer versions of the SDK this information can be retrieved via the
BMDDeckLinkVANCRequires10BitYUVVideoFrames flag but we don't use a new
enough SDK version yet to extract this information.