Rewriting GstCudaConverter object, since the old implementation was not
well organized and it's hard to add new features.
Moreover, the conversion operations were not very optimized.
Major change of this implementation:
* Remove redundant intermediate conversion operations such as
any RGB -> ARGB(64) conversion or any YUV -> Y444 (or 16bits Y444).
That's not required most of cases. The only required case is
converting 24bits (such as RGB/BGR) packed format to 32bits format
because CUDA texture object does not support sampling 24bits format
* Use normalized sample fetching (i.e., [0, 1] range float value)
and also normalized coordinates system for CUDA texture.
It's consistent with the other graphics APIs such as Direct3D
and OpenGL, that makes sampling operations much easier.
* Support a kind of viewport and adopt math for colorspace conversion
from GstD3D11 implementation
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3389>
GstCudaConverter object can do colorspace conversion and scale at once.
Adding new element "cudaconvertscale" to do that, this can
save unnecessary GPU operation if colorspace conversion and
rescale is required for given input stream format.
Most of codes are taken from d3d11convert element
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3389>