Default CUDA memory allocation will cause implicit global
synchronization. This stream ordered allocation can avoid it
since memory allocation and free operations are asynchronous
and executed in the associated cuda stream context
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7427>
This will be used for CUDA stream sharing.
* Adding GstCudaPoolAllocator object. The pool allocator will
control synchronization of allocated memory objects.
* Modify gst_cuda_allocator_alloc() API so that caller can specify/set
GstCudaStream object for the newly allocated memory.
* GST_CUDA_MEMORY_TRANSFER_NEED_SYNC flag is added in addition to
existing GST_CUDA_MEMORY_TRANSFER_NEED_{UPLOAD,DOWNLOAD}.
The flag indicates that any GPU command queued in the CUDA stream
may not be finished yet, and caller should take care of the
synchronization.
The flag is controlled by GstCudaMemory object if the memory holds
GstCudaStream. (Otherwise, GstCudaMemory will do synchronization
as before this commit). Specifically, GstCudaMemory object will set
the new flag automatically when memory is mapped with
(GST_MAP_CUDA | GST_MAP_WRITE) flags. Caller will need to unset
the flag via GST_MEMORY_FLAG_UNSET() if it's already synchronized
by client code.
* gst_cuda_memory_sync() helper function is added to perform synchronization
* Why not use CUevent object to keep track of synchronization status?
CUDA provides fence-like interface already via CUevent object,
but cuEventRecord/cuEventQuery APIs are not zero-cost operations.
Instead, in this version, the status is tracked by using map and
object flags.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3629>