New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
Encoded bitstream might not have valid framerate. If upstream
provided non-variable-framerate (i.e., fps_n > 0 and fps_d > 0)
use upstream framerate instead of parsed one.
Encoding thread is terminated without any notification so
upstream streaming thread is locked because there is nothing
to pop from GAsyncQueue. If downstream returns error,
we need put SHUTDOWN_COOKIE to GAsyncQueue for chain function
can wakeup.
By adding system memory support for nvdec, both en/decoder
in the nvcodec plugin are able to be usable regardless of
OpenGL dependency. Besides, the direct use of system memory
might have less overhead than OpenGL memory depending on use cases.
(e.g., transcoding using S/W encoder)
Any plugin which returned FALSE from plugin_init will be blacklisted
so the plugin will be unusable even if an user install required runtime
dependency next time. So that's the reason why nvcodec returns TRUE always.
This commit is to remove possible misreading code.