Adding cudaipc{src,sink} element for CUDA IPC support.
Implementation note:
* For the communication between end points, Win32 named-pipe
and unix domain socket will be used on Windows and Linux respectively.
* cudaipcsink behaves as a server, and all GPU resources will be owned by
the server process and exported for other processes, then cudaipcsrc
(client) will import each exported handle.
* User can select IPC mode via "ipc-mode" property of cudaipcsink.
There are two IPC mode, one is "legacy" which uses legacy CUDA IPC
method and the other is "mmap" which uses CUDA virtual memory API
with OS's resource handle sharing method such as DuplicateHandle()
on Windows. The "mmap" mode might be better than "legacy" in terms
of stability since it relies on OS's resource management but
it would consume more GPU memory than "legacy" mode.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4510>
Decoder bounded CUDA memory is allocated by driver and the pool size
is fixed. Since we don't know how many buffers would be held by
downstream non-CUDA element, we should download such CUDA memory
and release it back to decoder.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4810>
Instead of creating new decoder instance per new sequence,
re-use configured decoder instance via cuvidReconfigureDecoder()
API. It will make output surface reusable without re-allocation.
Also, in order for application to be able to reserve higher resolution
output surface, "init-max-width" and "init-max-height" properties are
added to each decoder.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3884>
Call input resource map functions (i.e., nvEncRegisterResource,
nvEncUnregisterResource, nvEncMapInputResource, and
nvEncUnmapInputResource) only once and reuse the mapped resources,
instead of per input frame map/unmap
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3884>
Wrap mapped decoder output surface using GstCudaMemory and
output without any copy operation. Also, for application to be able to
control the number of zero-copyable output surfaces,
"num-output-surfaces" property is added.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3884>
It is really difficult for people to figure out why nvcodec has
0 features. Even the debug log is cryptic. Also make sure the errors
go to the ERROR log level, which is more likely to be enabled by
default.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3776>
NVDEC launches CUDA kernel function (ConvertNV12BLtoNV12 or so)
when CuvidMapVideoFrame() is called. Which seems to be
NVDEC's internal post-processing kernel function, maybe
to convert tiled YUV to linear YUV format or something similar.
A problem if we don't pass CUDA stream to the CuvidMapVideoFrame()
call is that the NVDEC's internel kernel function will use default CUDA stream.
Then lots of the other CUDA API calls will be blocked/serialized.
To avoid the unnecessary blocking, we should pass our own
CUDA stream object to the CuvidMapVideoFrame() call
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3605>
Rewriting GstCudaConverter object, since the old implementation was not
well organized and it's hard to add new features.
Moreover, the conversion operations were not very optimized.
Major change of this implementation:
* Remove redundant intermediate conversion operations such as
any RGB -> ARGB(64) conversion or any YUV -> Y444 (or 16bits Y444).
That's not required most of cases. The only required case is
converting 24bits (such as RGB/BGR) packed format to 32bits format
because CUDA texture object does not support sampling 24bits format
* Use normalized sample fetching (i.e., [0, 1] range float value)
and also normalized coordinates system for CUDA texture.
It's consistent with the other graphics APIs such as Direct3D
and OpenGL, that makes sampling operations much easier.
* Support a kind of viewport and adopt math for colorspace conversion
from GstD3D11 implementation
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3389>