Add util functions for runtime CUDA kernel source compilation
using NVRTC library. Like other nvcodec dependent libraries,
NVRTC library will be loaded via g_module_open.
Note that the NVRTC library naming is not g_module_open friendly
on Windows.
(i.e., nvrtc64_{CUDA major version}{CUDA minor version}.dll).
So users can specify the dll name using GST_NVCODEC_NVRTC_LIBNAME
environment.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
Introducing CUDA buffer pool with generic CUDA memory support.
Likewise GL memory, any elements which are able to access CUDA device
memory directly can map this CUDA memory without upload/download
overhead via the "GST_MAP_CUDA" map flag.
Also usual GstMemory map/unmap is also possible with internal staging memory.
For staging, CUDA Host allocated memory is used (see CuMemAllocHost API).
The memory is allowing system access but has lower overhead
during GPU upload/download than normal system memory.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
All subclasses are retrieving list to get target output frame, which
can be done by baseclass. And pass the ownership of the GstH264Picture
to subclass so that subclass can clear implementation dependent resources
before finishing the frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1449>
Introduce GST_USE_NV_STATELESS_CODEC environment to allow user to select
primary nvidia decoder implementation. In case the environment
GST_USE_NV_STATELESS_CODEC=h264 was set, old nvidia h264 decoder element
will not be registered. Instead, both nvh264dec and nvh264sldec
factory name will create gstcodecs based new nvidia h264 decoder element.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1198>
Introduce GstH264Decoder based Nvidia H.264 decoder element.
Similar the element factory name of to v4l2 stateless codec,
this element can be configured with factory name "gstnvh264sldec".
Note that "sl" in the name stands for "stateless"
For now, existing nvh264dec covers more profile and formats
(e.g., interlaced stream) than this implementation.
However, this implementation allows us to control lower level
parameters such as decoded picture buffer management and therefore
we can get a chance to improve performance in terms of latency.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1198>
Too many decode surface would waste GPU memory. Also it seems to be
introducing additional latency depending on stream. Since nvcodec
sdk version 9.0, CUVID parser API has been providing the minimum
required number of surface. By using it, we can save GPU memory
and reduce possible latency.
The class data with the caps in it will be leaked if the element is
registered but never instantiated. There is no way around this. Mark
the caps as such so that the leaks tracer does not warn about it.
This is the same as pad template caps getting leaked, which are also
marked as may-be-leaked. These objects are initialized exactly once,
and are 'global' data.
We've been using NvEncodeAPICreateInstance method to find the supported API
version, but that seems to be insufficient since there is a case
where plugin failed in opening encoding session even if NvEncodeAPICreateInstance
succeeded. Asking driver about the version would be the most certain way.
Setting the CUVID_PKT_DISCONTINUITY implies clearing any past information
about the stream in the decoder. The GStreamer discont flag is used for
discontinuity caused by a seek, for first buffer and if a buffer was
dropped. In the first two cases, the parsers and demuxers should ensure we
start from a synchronization point, so it's unlikely that delta will be
matched against the wrong state.
For packet lost, the discontinuity flag will prevent the decoder from doing
any concealment, with a result that ca be much worst visually, or freeze the
playback until an IDR is met. It's better to let the decoder handle that for
us.
Removing this flag, also workaround a but in NVidia parser that makes it
ignore our ENDOFFRAME flag and increase the latency by one frame.
This sets the CUVID_PKT_ENDOFPICTURE flag in order to inform the decoder that
we have a complete picture. This should remove one frame latency otherwise
introduce by NVidia parser.
We weren't using the correct calling convention when calling CUDA and
CUVID APIs. `CUDAAPI` is `__stdcall` on Windows. This was working fine
on x64 because `__stdcall` is ignored and there's no special calling
convention. However, on x86, we need to use `__stdcall`.
Hard-coded 16x16 resolution is likely to differ from the device's support
in most cases. If we can use NV_ENC_CAPS_WIDTH_MIN and NV_ENC_CAPS_HEIGHT_MIN,
update pad template with returned value.