Add util functions for runtime CUDA kernel source compilation
using NVRTC library. Like other nvcodec dependent libraries,
NVRTC library will be loaded via g_module_open.
Note that the NVRTC library naming is not g_module_open friendly
on Windows.
(i.e., nvrtc64_{CUDA major version}{CUDA minor version}.dll).
So users can specify the dll name using GST_NVCODEC_NVRTC_LIBNAME
environment.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
Introducing CUDA buffer pool with generic CUDA memory support.
Likewise GL memory, any elements which are able to access CUDA device
memory directly can map this CUDA memory without upload/download
overhead via the "GST_MAP_CUDA" map flag.
Also usual GstMemory map/unmap is also possible with internal staging memory.
For staging, CUDA Host allocated memory is used (see CuMemAllocHost API).
The memory is allowing system access but has lower overhead
during GPU upload/download than normal system memory.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1633>
Introduce GstH264Decoder based Nvidia H.264 decoder element.
Similar the element factory name of to v4l2 stateless codec,
this element can be configured with factory name "gstnvh264sldec".
Note that "sl" in the name stands for "stateless"
For now, existing nvh264dec covers more profile and formats
(e.g., interlaced stream) than this implementation.
However, this implementation allows us to control lower level
parameters such as decoded picture buffer management and therefore
we can get a chance to improve performance in terms of latency.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1198>
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
By adding system memory support for nvdec, both en/decoder
in the nvcodec plugin are able to be usable regardless of
OpenGL dependency. Besides, the direct use of system memory
might have less overhead than OpenGL memory depending on use cases.
(e.g., transcoding using S/W encoder)
... and add our stub cuda header.
Newly introduced stub cuda.h file is defining minimal types in order to
build nvcodec plugin without system installed CUDA toolkit dependency.
This will make cross-compile possible.
... and put them into new nvcodec plugin.
* nvcodec plugin
Now each nvenc and nvdec element is moved to be a part of nvcodec plugin
for better interoperability.
Additionally, cuda runtime API header dependencies
(i.e., cuda_runtime_api.h and cuda_gl_interop.h) are removed.
Note that cuda runtime APIs have prefix "cuda". Since 1.16 release with
Windows support, only "cuda.h" and "cudaGL.h" dependent symbols have
been used except for some defined types. However, those types could be
replaced with other types which were defined by "cuda.h".
* dynamic library loading
CUDA library will be opened with g_module_open() instead of build-time linking.
On Windows, nvcuda.dll is installed to system path by CUDA Toolkit
installer, and on *nix, user should ensure that libcuda.so.1 can be
loadable (i.e., via LD_LIBRARY_PATH or default dlopen path)
Therefore, NVIDIA_VIDEO_CODEC_SDK_PATH env build time dependency for Windows
is removed.