For some hevc 10bit 4K encoding cases, the encoding process may be
slow, and MediaSDK surface can't be released in time before one other
available surface is needed. So add an extra surface for hevc encoding
to avoid this issue.
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).
To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.
New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
- If NvEncEncodePicture returned success, then move all pair in pending_queue
to final stage
- Otherwise, wait more input raw pictures.
Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Add new macro for sink/src pad template to ensure no DMABuf caps
features are exposed on Windows. Some DMABuf caps features
were not handled by the commit 9ec62418c3
gst_buffer_make_writable() requires exclusive reference to the
GstMemory so the _make_writable() for the msdk buffer will result
to fallback system memory copy, because the msdk memory were initialized
with GST_MEMORY_FLAG_NO_SHARE flag.
Note that, disable sharing GstMemory brings high overhead but actually
the msdk memory objects can be shared over multiple buffers.
If the memory is not shareable, newly added GstAllocator::mem_copy will
create copied msdk memory.
Sometimes a HEVC/H265 stream doesn't have a valid profile but MSDK can
handle this stream. Like vaapih265dec, msdkh265dec may advertise the sink
caps without profile
DecodedOrder was deprecated in msdk-2017 version, but some customers
still use this for low-latency streaming of non-b-frame encoded streams,
which needs to output the frame at once
Asking decklink to render audio data seems to be based entirely on
the sample counts which completely disregards the timestamps
we pass to decklink. As a result, we need to explicitly check
for late buffers and drop them ourselves.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
Both g_list_delete_link and g_list_remove remove an element and free it,
so l->next is invalid (catched by valgrind) after calling g_list_delete_link
or g_list_remove
Returning MFX_WRN_INCOMPATIBLE_VIDEO_PARAM means MSDK detects some
incompatible parameters but it is resolved, and we may not regard
MFX_WRN_INCOMPATIBLE_VIDEO_PARAM as a fatal error. In this fix,
GST_FLOW_OK is returned but with a warning message so that a pipeline
may run to the end.
Fixes werror build:
In file included from ../sys/androidmedia/gstahcsrc.c:70:
../gst-libs/gst/interfaces/photography.h:27:2: error: "The GstPhotography interface is unstable API and may change in future." [-Werror,-W#warnings]
#warning "The GstPhotography interface is unstable API and may change in future."
^
../gst-libs/gst/interfaces/photography.h:28:2: error: "You can define GST_USE_UNSTABLE_API to avoid this warning." [-Werror,-W#warnings]
#warning "You can define GST_USE_UNSTABLE_API to avoid this warning."
^
video-direction property is common property in gstreamer. In addition,
both mirroring & rotation properties are marked as deprecated,
video-direction will override mirroring & rotation properties when they
are set explicitly
Fix https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1058
gst_msdkdec_finish_task() may release all frames in
GstVideoDecoder object. In this case, allocate_output_buffer()
cannot get the oldest frame to allocate buffer.
So gst_msdkdec_handle_frame() should return GST_FLOW_OK for
letting gst_video_decoder_decode_frame() to send a new frame
for decoding.
Fixes#664.
Fixes#665.
When vpp rotation is 90 or 270, the output frame
should be rotated, too.
Example:
gst-launch-1.0 -vf videotestsrc \
! video/x-raw,width=720,height=480 \
! msdkvpp rotation=90 ! vaapisink
There is no NdkMediaCodecList API yet, but it is still better to isolate
JNI code. This will facilitate porting to a native API if Google ever
release one.
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
Fix unexpected cropping with non 1:1 pixel aspect-ratio.
The actual buffer width/height should be passed to gst_d3d11_window_render(),
instead of the calculated resolution. The width/height
values are parameters for copying d3d11 video memory.
Also, aspect-ratio should be considered on resize callback
to decide render rectangle size.
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
Currently h264parser produces a field or a frame for
alignment=au for interlaced streams, but the flag
MFX_BITSTREAM_COMPLETE_FRAME needs a complete frame
or complementary field pair of data, this results in
broken images being output.
Some patches have been sent out to fix h264parser,
but they are pending on some unfinished work. In
order to make gstreamer-msdk decoding work properly
for interlaced streams before h264parser is fixed,
this flag will be removed temporarily and will be
added back once h264parser if fixed.
Related to:
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/399https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/228
Instead of using the information we stored ourselves for the video frame
itself. Which was also the wrong one: it was the mode from the property,
not the autodetected one.
This fixes vanc extraction with mode=auto
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
We don't support negotiation with downstream but simply set caps based
on the buffers we receive. This prevents renegotiation to other formats,
and negotiation to NTSC in mode=auto in the beginning until the first
buffer is received.
As side-effect of this, also remove various other caps handling code
that was working around the behaviour of the default
BaseSrc::negotiate().
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
Encoded bitstream might not have valid framerate. If upstream
provided non-variable-framerate (i.e., fps_n > 0 and fps_d > 0)
use upstream framerate instead of parsed one.