Allows reducing the initial stack size of GPU threads. Cuda should
automatically increase this value if a kernel requires a larger stack.
Can save roughly 40MB of GPU memory for a single nvh264enc instance.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8158>
1. Add some comments explaining what headers and libs are expected on
what systems
2. Only look in default incdirs if no incdir is specified
3. Require libnvbufsurface.so on Jetson when cuda-nvmm=enabled
4. Require libatomic on Jetson when cuda-nvmm=enabled
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8021>
Adding prefer-stream-ordered-alloc property to GstCudaContext.
If stream ordered allocation buffer pool option is not configured
and this property is enabled, buffer pool will enable the stream
ordered allocation. Otherwise it will follow default behavior.
If GST_CUDA_ENABLE_STREAM_ORDERED_ALLOC env is set,
default behavior is enabling the stream ordered allocation.
Otherwise sync alloc/free method will be used.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7427>
Default CUDA memory allocation will cause implicit global
synchronization. This stream ordered allocation can avoid it
since memory allocation and free operations are asynchronous
and executed in the associated cuda stream context
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/7427>
Do not chain up to parent's GstBufferPool::start() which will do
preallocation. We don't want it to be preallocated
since there are various cases where negotiated downstream buffer pool is
not used at all (e.g., zero-copy decoding, IPC elements).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6326>
Adding GST_CUDA_CRITICAL_ERRORS env variable so that program can be
terminated on unrecoverable error.
Example)
GST_CUDA_CRITICAL_ERRORS=2,700 gst-launch-1.0 ...
In this example, CUDA_ERROR_OUT_OF_MEMORY(2) and
CUDA_ERROR_ILLEGAL_ADDRESS(700) are registered as critical error
and program will be aborted on those errors
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4729>
Decoder bounded CUDA memory is allocated by driver and the pool size
is fixed. Since we don't know how many buffers would be held by
downstream non-CUDA element, we should download such CUDA memory
and release it back to decoder.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4810>
gstcudaloader.cpp defines GST_DEBUG_CATEGORY (gst_cudaloader_debug);
but it wasn't initializing it anywhere.
This caused the following error to be logged by gst-plugin-scanner when
libcuda.so.1/nvcuda.dll couldn't be loaded, e.g. in systems without
CUDA:
(gst-plugin-scanner:39618): GStreamer-CRITICAL **: 14:40:22.346:
gst_debug_log_full_valist: assertion 'category != NULL' failed
This patch fixes the bug by initializing the category in
gst_cuda_load_library_once_func() before any logging occurs.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/4154>
This will be used for CUDA stream sharing.
* Adding GstCudaPoolAllocator object. The pool allocator will
control synchronization of allocated memory objects.
* Modify gst_cuda_allocator_alloc() API so that caller can specify/set
GstCudaStream object for the newly allocated memory.
* GST_CUDA_MEMORY_TRANSFER_NEED_SYNC flag is added in addition to
existing GST_CUDA_MEMORY_TRANSFER_NEED_{UPLOAD,DOWNLOAD}.
The flag indicates that any GPU command queued in the CUDA stream
may not be finished yet, and caller should take care of the
synchronization.
The flag is controlled by GstCudaMemory object if the memory holds
GstCudaStream. (Otherwise, GstCudaMemory will do synchronization
as before this commit). Specifically, GstCudaMemory object will set
the new flag automatically when memory is mapped with
(GST_MAP_CUDA | GST_MAP_WRITE) flags. Caller will need to unset
the flag via GST_MEMORY_FLAG_UNSET() if it's already synchronized
by client code.
* gst_cuda_memory_sync() helper function is added to perform synchronization
* Why not use CUevent object to keep track of synchronization status?
CUDA provides fence-like interface already via CUevent object,
but cuEventRecord/cuEventQuery APIs are not zero-cost operations.
Instead, in this version, the status is tracked by using map and
object flags.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3629>