Rewriting GstCudaConverter object, since the old implementation was not
well organized and it's hard to add new features.
Moreover, the conversion operations were not very optimized.
Major change of this implementation:
* Remove redundant intermediate conversion operations such as
any RGB -> ARGB(64) conversion or any YUV -> Y444 (or 16bits Y444).
That's not required most of cases. The only required case is
converting 24bits (such as RGB/BGR) packed format to 32bits format
because CUDA texture object does not support sampling 24bits format
* Use normalized sample fetching (i.e., [0, 1] range float value)
and also normalized coordinates system for CUDA texture.
It's consistent with the other graphics APIs such as Direct3D
and OpenGL, that makes sampling operations much easier.
* Support a kind of viewport and adopt math for colorspace conversion
from GstD3D11 implementation
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3389>
GstCudaConverter object can do colorspace conversion and scale at once.
Adding new element "cudaconvertscale" to do that, this can
save unnecessary GPU operation if colorspace conversion and
rescale is required for given input stream format.
Most of codes are taken from d3d11convert element
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3389>
Adding new encoder elements nvd3d11{h264,h265}enc for Direct3D11
input support and re-written nvcuda{h264,h265}enc elements.
Newly writeen elements have some differences compared with old
nv{h264,h265}enc including non-backward compatible changes.
* RGBA is not a supported input format any more:
New elements will support only YUV formats to avoid implicit conversion
done by hardware. Ideally it should be done by upstream element
in order to have more control on it. Moreover, RGBA support can cause
redundant RGBA -> YUV conversion if multiple encoders are
used for the same RGBA input
* Subsampled planar format support is dropped:
I420 and YV12 format are not supported formats for Direct3D11.
Although it's supported in CUDA mode, it's not a hardware friendly
memory layout and it will waste GPU memory since UV planes
will have large padding due to the memory layout requirement of NVENC.
* GL support is dropped: Similar to the RGBA case,
GL support in encoder would be suboptimal if GL input is
used by multiple encoders, because each encoder will copy GL memory
into CUDA memory.
Upstream cudaupload element can be used for GL <-> CUDA
interop instead.
* No more pre-allocation of encoder input surfaces. New implementation
will use input CUDA memory without copy (zero-copy) or
will copy into a NVENC's input buffer struct in case of
system memory input.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/1997>