For some hevc 10bit 4K encoding cases, the encoding process may be
slow, and MediaSDK surface can't be released in time before one other
available surface is needed. So add an extra surface for hevc encoding
to avoid this issue.
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).
To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.
New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
- If NvEncEncodePicture returned success, then move all pair in pending_queue
to final stage
- Otherwise, wait more input raw pictures.
Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Add new macro for sink/src pad template to ensure no DMABuf caps
features are exposed on Windows. Some DMABuf caps features
were not handled by the commit 9ec62418c3
gst_buffer_make_writable() requires exclusive reference to the
GstMemory so the _make_writable() for the msdk buffer will result
to fallback system memory copy, because the msdk memory were initialized
with GST_MEMORY_FLAG_NO_SHARE flag.
Note that, disable sharing GstMemory brings high overhead but actually
the msdk memory objects can be shared over multiple buffers.
If the memory is not shareable, newly added GstAllocator::mem_copy will
create copied msdk memory.
Sometimes a HEVC/H265 stream doesn't have a valid profile but MSDK can
handle this stream. Like vaapih265dec, msdkh265dec may advertise the sink
caps without profile
DecodedOrder was deprecated in msdk-2017 version, but some customers
still use this for low-latency streaming of non-b-frame encoded streams,
which needs to output the frame at once
Asking decklink to render audio data seems to be based entirely on
the sample counts which completely disregards the timestamps
we pass to decklink. As a result, we need to explicitly check
for late buffers and drop them ourselves.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
Both g_list_delete_link and g_list_remove remove an element and free it,
so l->next is invalid (catched by valgrind) after calling g_list_delete_link
or g_list_remove
Returning MFX_WRN_INCOMPATIBLE_VIDEO_PARAM means MSDK detects some
incompatible parameters but it is resolved, and we may not regard
MFX_WRN_INCOMPATIBLE_VIDEO_PARAM as a fatal error. In this fix,
GST_FLOW_OK is returned but with a warning message so that a pipeline
may run to the end.
Fixes werror build:
In file included from ../sys/androidmedia/gstahcsrc.c:70:
../gst-libs/gst/interfaces/photography.h:27:2: error: "The GstPhotography interface is unstable API and may change in future." [-Werror,-W#warnings]
#warning "The GstPhotography interface is unstable API and may change in future."
^
../gst-libs/gst/interfaces/photography.h:28:2: error: "You can define GST_USE_UNSTABLE_API to avoid this warning." [-Werror,-W#warnings]
#warning "You can define GST_USE_UNSTABLE_API to avoid this warning."
^
video-direction property is common property in gstreamer. In addition,
both mirroring & rotation properties are marked as deprecated,
video-direction will override mirroring & rotation properties when they
are set explicitly
Fix https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1058
gst_msdkdec_finish_task() may release all frames in
GstVideoDecoder object. In this case, allocate_output_buffer()
cannot get the oldest frame to allocate buffer.
So gst_msdkdec_handle_frame() should return GST_FLOW_OK for
letting gst_video_decoder_decode_frame() to send a new frame
for decoding.
Fixes#664.
Fixes#665.
When vpp rotation is 90 or 270, the output frame
should be rotated, too.
Example:
gst-launch-1.0 -vf videotestsrc \
! video/x-raw,width=720,height=480 \
! msdkvpp rotation=90 ! vaapisink
There is no NdkMediaCodecList API yet, but it is still better to isolate
JNI code. This will facilitate porting to a native API if Google ever
release one.
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
Fix unexpected cropping with non 1:1 pixel aspect-ratio.
The actual buffer width/height should be passed to gst_d3d11_window_render(),
instead of the calculated resolution. The width/height
values are parameters for copying d3d11 video memory.
Also, aspect-ratio should be considered on resize callback
to decide render rectangle size.
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
Currently h264parser produces a field or a frame for
alignment=au for interlaced streams, but the flag
MFX_BITSTREAM_COMPLETE_FRAME needs a complete frame
or complementary field pair of data, this results in
broken images being output.
Some patches have been sent out to fix h264parser,
but they are pending on some unfinished work. In
order to make gstreamer-msdk decoding work properly
for interlaced streams before h264parser is fixed,
this flag will be removed temporarily and will be
added back once h264parser if fixed.
Related to:
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/399https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/228
Instead of using the information we stored ourselves for the video frame
itself. Which was also the wrong one: it was the mode from the property,
not the autodetected one.
This fixes vanc extraction with mode=auto
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
We don't support negotiation with downstream but simply set caps based
on the buffers we receive. This prevents renegotiation to other formats,
and negotiation to NTSC in mode=auto in the beginning until the first
buffer is received.
As side-effect of this, also remove various other caps handling code
that was working around the behaviour of the default
BaseSrc::negotiate().
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
Encoded bitstream might not have valid framerate. If upstream
provided non-variable-framerate (i.e., fps_n > 0 and fps_d > 0)
use upstream framerate instead of parsed one.
Encoding thread is terminated without any notification so
upstream streaming thread is locked because there is nothing
to pop from GAsyncQueue. If downstream returns error,
we need put SHUTDOWN_COOKIE to GAsyncQueue for chain function
can wakeup.
By adding system memory support for nvdec, both en/decoder
in the nvcodec plugin are able to be usable regardless of
OpenGL dependency. Besides, the direct use of system memory
might have less overhead than OpenGL memory depending on use cases.
(e.g., transcoding using S/W encoder)
False warning from MSVC, or it does not understand that
g_assert_not_reached() does not return.
...\gst-plugins-bad-1.0-1.17.0.1\sys\decklink\gstdecklink.cpp(1647) : warning C4715: 'gst_decklink_configure_duplex_mode': not all control paths return a value
Any plugin which returned FALSE from plugin_init will be blacklisted
so the plugin will be unusable even if an user install required runtime
dependency next time. So that's the reason why nvcodec returns TRUE always.
This commit is to remove possible misreading code.
Since we build nvcodec plugin without external CUDA dependency,
CUDA and en/decoder library loading failure can be natural behavior.
Emit error only when the module was opend but required symbols are missing.
This commit includes h265 main-10 profile support if the device can
decode it.
Note that since h264 10bits decoding is not supported by nvidia GPU for now,
the additional code path for h264 high-10 profile is a preparation for
the future Nvidia's enhancement.
GstVideoDecoder::drain/flush can be called at very initial state
with stream-start and flush-stop event, respectively.
Draning with NULL CUvideoparser seems to unsafe and that eventually
failed to handle it.
It is possible that the output region size (e.g. 192x144) is different
from the coded picture size (e.g. 192x256). We may adjust the alignment
parameters so that the padding is respected in GstVideoInfo and use
GstVideoInfo to calculate mfx frame width and height
This fixes the error below when decoding a stream which has different
output region size and coded picture size
0:00:00.057726900 28634 0x55df6c3220a0 ERROR msdkdec
gstmsdkdec.c:1065:gst_msdkdec_handle_frame:<msdkh265dec0>
DecodeFrameAsync failed (failed to allocate memory)
Sample pipeline:
gst-launch-1.0 filesrc location=output.h265 ! h265parse ! msdkh265dec !
glimagesink
... and add our stub cuda header.
Newly introduced stub cuda.h file is defining minimal types in order to
build nvcodec plugin without system installed CUDA toolkit dependency.
This will make cross-compile possible.
* By this commit, if there are more than one device,
nvenc element factory will be created per
device like nvh264device{device-id}enc and nvh265device{device-id}enc
in addition to nvh264enc and nvh265enc, so that the element factory
can expose the exact capability of the device for the codec.
* Each element factory will have fixed cuda-device-id
which is determined during plugin initialization
depending on the capability of corresponding device.
(e.g., when only the second device can encode h265 among two GPU,
then nvh265enc will choose "1" (zero-based numbering)
as it's target cuda-device-id. As we have element factory
per GPU device, "cuda-device-id" property is changed to read-only.
* nvh265enc gains ability to encoding
4:4:4 8bits, 4:2:0 10 bits formats and up to 8K resolution
depending on device capability.
Additionally, I420 GLMemory input is supported by nvenc.
Only the default device has been used by NVDEC so far.
This commit make it possible to use registered device id.
To simplify device id selection, GstNvDecCudaContext usage is removed.
By this commit, each codec has its own element factory so the
nvdec element factory is removed. Also, if there are more than one device,
additional nvdec element factory will be created per
device like nvh264device{device-id}dec, so that the element factory
can expose the exact capability of the device for the codec.
Callbacks of CUvideoparser is called on the streaming thread.
So the use of async queue has no benefit.
Make control flow straightforward instead of long while/switch loop.
Previously we would've reported that there is signal unless we know for
sure that we don't have signal. For example signal would've been
reported before the device is even opened.
Now keep track whether the signal state is unknown or not and report no
signal if we don't know yet. As before, only send an INFO message about
signal recovery if we actually had a signal loss before.
... and put them into new nvcodec plugin.
* nvcodec plugin
Now each nvenc and nvdec element is moved to be a part of nvcodec plugin
for better interoperability.
Additionally, cuda runtime API header dependencies
(i.e., cuda_runtime_api.h and cuda_gl_interop.h) are removed.
Note that cuda runtime APIs have prefix "cuda". Since 1.16 release with
Windows support, only "cuda.h" and "cudaGL.h" dependent symbols have
been used except for some defined types. However, those types could be
replaced with other types which were defined by "cuda.h".
* dynamic library loading
CUDA library will be opened with g_module_open() instead of build-time linking.
On Windows, nvcuda.dll is installed to system path by CUDA Toolkit
installer, and on *nix, user should ensure that libcuda.so.1 can be
loadable (i.e., via LD_LIBRARY_PATH or default dlopen path)
Therefore, NVIDIA_VIDEO_CODEC_SDK_PATH env build time dependency for Windows
is removed.
Direct3D11 was shipped as part of Windows7 and it's obviously
primary graphics API on Windows.
This plugin includes HDR10 rendering if following requirements are satisfied
* IDXGISwapChain4::SetHDRMetaData is available (decleared in dxgi1_5.h)
* Display can support DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020 color space
* Upstream provides 10 bitdepth format with smpte-st 2084 static metadata
MFX_FOURCC_VP9_SEGMAP surface in MSDK is an internal surface however
MSDK still call the external allocator for this surface, so this plugin
has to return UNSUPPORTED and force MSDK allocates surface using the
internal allocator.
See https://github.com/Intel-Media-SDK/MediaSDK/issues/762 for details
The call of MFXVideoENCODE_EncodeFrameAsync may not generate output and
the function returns MFX_ERR_MORE_DATA with NULL sync point, the input
frame is cached in this case, so it is possible that all allocated
frames go into the surfaces_used list after calling
MFXVideoENCODE_EncodeFrameAsync a few times, then the encoder will fail
to get an available surface before releasing used frames
This patch adds a new field of num_extra_frames to GstMsdkEnc and allows
encode element requires extra frames, the default value is 0.
This patch is the preparation for msdkvp9enc element.
msdkenc supports CSC implicitly, so it is possible that two VPP
processes are required when a pipeline contains msdkvpp and msdkenc.
Before this fix, msdkvpp and msdkenc may share the same context, hence
the same mfx session, which results in MFX_ERR_UNDEFINED_BEHAVIOR
in MSDK because a mfx session has at most one VPP process only
This fixes the broken pipelines below:
gst-launch-1.0 videotestsrc ! video/x-raw,format=I420 ! msdkh264enc ! \
msdkh264dec ! msdkvpp ! video/x-raw,format=YUY2 ! fakesink
gst-launch-1.0 videotestsrc ! msdkvpp ! video/x-raw,format=YUY2 ! \
msdkh264enc ! fakesink
MSDK supports JPEG YUY2 (422 chroma) output color
format. The color format of input bitstream is
described by JPEGChromaFormat and JPEGColorFormat
fields in the mfxInfoMFX structure which is filled
in by the MFXVideoDECODE_DecodeHeader function.
To obtain lossless decoded output from 422 encoded
JPEGs, we must set the output color format in the
FourCC and ChromaFormat fields in the mfxFrameInfo
structure to the appropriate values at post_configure
so that they are propagated through to the srcpad
caps accordingly.
A post_configure virtual method is added to allow
codec subclasses to adjust the initialized parameters
after MFXVideoDECODE_DecodeHeader is called from the
gstmsdkdec::gst_msdkdec_handle_frame function.
This is useful if codecs want to adjust the output
parameters based on the codec-specific decoding
options that are present in the mfxInfoMFX structure
after MFXVideoDECODE_DecodeHeader initializes them.