Even if one of downstream d3d11 elements can support dynamic-usage memory,
another one might not support it. Also, to support dynamic-usage,
both upstream and downstream d3d11device must be the same object.
If d3d11colorconvert element is configured, do color space conversion
regardless of the device type whether it's S/W emulation or real H/W.
Since d3d11colorconvert is no more a child of d3d11videosinkbin,
we don't need this behavior. Note that previous code was added to
avoid color space conversion from d3d11videosink if no hardware
device is available (S/W emulation of d3d11 is too slow).
d3d11upload should be able to support upstream d3d11 memory, not only system memory.
Fix for following pipeline
d3d11upload ! "video/x-raw(memory:D3D11Memory)" ! d3d11videosink
borderless top-most style full screen mode support.
Basically fullscreen toggle mode is disabled by default. To enable it
use "fullscreen-toggle-mode" property to allow fullscreen mode change
by user input and/or property.
In some cases, rendering and dxgi (e.g., swapchain) APIs should be
called from window message pump thread, but current design (dedicated d3d11 thread)
make it impossible. To solve it, change concurrency model to locking based one
from single-thread model.
In earlier implementation of d3d11videosink where no shader was implemented,
the aspect ratio and render size were adjusted by manipulating the backbuffer size
with unintuitive formula. Since now we do color conversion and resize using
shader, we can remove the hack.
window event queue now does not lock on the class lock, so we can now shut
it down without releasing the class lock, thus avoiding a potential race when
stopping the sink.
... and use SetParent() WIN32 API when external window is used.
Depending on DXGI swap effect, the external window might not be
reusable by another backend. To preserve the external window's property
and setting, drawing to internal window seems to be safer way.
Otherwise GstVideoDecoder is not finalized and
resources are leaked.
Somehow GST_TRACERS="leaks" GST_DEBUG="GST_TRACER:7" did not catch it.
Valgrind output:
==31645== 22,480 (1,400 direct, 21,080 indirect) bytes in 5 blocks are definitely lost in loss record 5,042 of 5,049
==31645== at 0x4C2FB0F: malloc
==31645== by 0x51D9E88: g_malloc
==31645== by 0x51FA7B5: g_slice_alloc
==31645== by 0x51FAC68: g_slice_alloc0
==31645== by 0x58D9984: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
==31645== by 0x58BADA0: g_object_new
==31645== by 0x8ECA966: gst_video_decoder_init
==31645== by 0x58D99E7: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
An issue can be seen when using msdkh265enc with bitrate change in
playing state. The root cause is the corresponding plugin is loaded
again.
Returning MFX_ERR_UNDEFINED_BEHAVIOR from MSDK just means the plugin has
been loaded, so we may ignore this error when doing configuation again
in the sub class, otherwise the pipeline will be interrupted
If d3d11window does not convert format internally, shader resource view
is not required. Note that shader resource view is used for
color conversion using shader but when conversion is not required,
we just copy input input texture to backbuffer.
In theory it should not happen but it happened to me
in some cases where it failed to allocate some video
buffers so this was a consequence of a corner case.
Better to be safe than sorry.
Can happen if the oldest frame is the current frame
and if gst_video_decoder_finish_frame failed in which
case the current is unref and then drop instead of
just drop.
This patch also removes some assumptions, it was strange
to call unref and finish_frame in gst_msdkdec_finish_task.
In principle when owning a frame, the code should either
unref, or drop or finish.
D3D11 dynamic texture is a special memory type, which is mainly used for
frequent CPU write access to the texture. For now, this texture type
does not support gst_memory_{map,unmap}
* Create staging texture only when the CPU access is requested.
Note that we should avoid the CPU access to d3d11 memory as mush as possible.
Incoming d3d11upload and d3d11download will take this GPU memory upload/download.
* Upload/Download texture memory from/to staging only if it needed, similar to
GstGL PBO implementation.
* Define more dxgi formats for future usage (e.g., color conversion, dxva2 decoder).
Because I420_* formats are not supported formats by dxgi, each plane should
be handled likewise GstGL separately, but NV12/P10 formats might be supported ones.
So we decide the number of d3d11memory per GstBuffer for video memory depending on
OS version and dxgi format. For instance, if NV12 is supported by OS,
only one d3d11memory with DXGI_FORMAT_NV12 texture can be allocated by this commit.
One use case of such texture is DXVA. In case DXVA decoder, it might need to produce decoded data
to one DXGI_FORMAT_NV12 instead of seperate Y and UV planes.
Such behavior will be controlled via configuration of GstD3D11BufferPool and
default configuration is separate resources per plane.
Depending on selected feature level, d3d11 API usage can be different.
Instead of querying the selected feature level by user whenever required,
store it once by d3d11device.
Do not accept any GstD3D11Device context which has different adapter
index from the required one. For example, if a d3d11 element is expecting
d3d11 device with adapter 1 (i.e., the second GPU), any d3d11 device
context having different adapter could not be shared with
the d3d11 element.
Make them consistent with cuda context utils functions.
Put in-only parameter before all in-out parameters, and add _handle()
suffix to native handle getter functions.
In certain cases, the sink's buffer pool will not call the parent's
release_buffer method, so the pool does not clean up properly
after the buffer is released.
Since macOS Mojave (10.14), video permissions have to be explicitly
granted by a user in order to open a video device such as a camera.
This commit adds a check for the current permission status, and tries
to request for permission if applicable.
The whole `src_read()` function is a hot loop since the ringbuffer
thread is waiting on us, and printing to the console from inside it
can easily cause us to miss our deadline.
F.ex., if you had GST_DEBUG=3 and we accidentally missed a device
period, we'd trigger the "reported glitch" warning, which would cause
us to miss another device period, and so on. Let's reduce the log
level so that GST_DEBUG=3 is more usable, and only print buffer flag
info when it's actually relevant.
Some audio drivers return varying amounts of data per ::GetBuffer
call, instead of following the device period that they've told us
about in `src_prepare()`.
Previously, we would just drop those extra buffers hoping that the
extra buffers were temporary (f.ex., a startup 'burst' of audio data).
However, it seems that some audio drivers, particularly on older
Windows versions (such as Windows 10 1703 and older) consistently
return varying amounts of data.
Use GstAdapter to smooth that out, and hope that the audio driver is
locally varying but globally periodic.
Initially reported in https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/808
We were miscalculating the device period, i.e. the number of frames
we'll get from WASAPI in each IAudioClient::GetBuffer call, due to
a calculation mistake (truncate instead of round).
For example, on my machine when the aux input is set to 44.1KHz, the
reported device period is 101587, which comes out to 447.998 frames
per ::GetBuffer call. In reality we will, of course, get 448 frames
per call, but we were truncating, so we expected 447 and were
discarding one frame every time. This led to glitching, and skew over
time.
Interestingly, I can only see this with 44.1Khz. 48Khz/96Khz are fine,
because the device period is a more 'even' number.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/806
The for loop in gst_msdkdec_handle_frame is error prone
about how it manages surfaces. Because sometimes it sets
the surface variable to NULL and sometimes it needs to free
it right away. So better to print an error if surfaces are
leaked to help with any change around the loop.
msdk plugin is not used for sofware encode/decode as there are better
solutions available. Also, with MFX_IMPL_AUTO_ANY, if software decode
is not supported, the plugin will still load, but will then fail when trying to
run the (autoplugged) pipeline. With MFX_IMPL_HARDWARE_ANY,
the plugin fails and a better software decoder is auto-plugged.
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
Can be reproduced with:
videotestsrc ! x264enc key-int-max=$N ! \
h264parse ! msdkh264dec ! fakesink sync=1
It happens with any gop size but the smaller is the distance N
between key frames, the quicker it is leaking.
Fixes#1023
In case of pkg-config we need to create the include directories object
from the path using include_directories(). For INTELMEDIASDKROOT or
MFX_HOME we need to add the alternate include path ./include/mfx as
Intel MediaSDK now puts the headers there.
This adds a check to avoid draining when the imported buffers are in
fact own by kmssink. This happens since we export our kms buffer as
DMABuf. They are not really imported back as we pre-fill the cache,
but uses the same format as if they were external. This fixes
performance issues seen with videocrop2-test (found in -good).
Draining systematically on caps changes was a hack. Instead, properly
save the render information used to render last_render, and use that
information to drain. This fixes performance issues met with video crop
meta and per frame caps changes.
By passing NULL to `g_signal_new` instead of a marshaller, GLib will
actually internally optimize the signal (if the marshaller is available
in GLib itself) by also setting the valist marshaller. This makes the
signal emission a bit more performant than the regular marshalling,
which still needs to box into `GValue` and call libffi in case of a
generic marshaller.
Note that for custom marshallers, one would use
`g_signal_set_va_marshaller()` with the valist marshaller instead.
In future, a sub class of GstMsdkEncClass may decide a native format by
using this method, e.g. JPEG encoder may accept YUY2 input, however the
current implemation needs a conversion from YUY2 to NV12 before encoding.
In addtion, a sub class may choose a format for encoding if the input
format is not supported by MSDK, e.g. the current implemation does
UYVY->NV12 if the input format is UYVY. We may do UYVY->YUY2 for JPEG
encoder in future
MFX_FOURCC_BGR4 is mapped to VA_FOURCC_ABGR and JPEG encoder needs a
MFX_FOURCC_BGR4 frame for internal usage when the input format is
MFX_FOURCC_RGB4
This is a preparation for supporting native formats of JPEG encoder
Instead of using a proxy of `is_packetized` flag this patch
replaces it with the accessor to that flag in decoder base class,
avoiding probable mismatches.
commit 55c0d720 added the capability to handle non-packetized bitstream,
and there is a loop to handle multiple frames in a non-packetized buffer
in gst_msdkdec_handle_frame. However it is possible that a
non-packetized buffer still contains valid data but there is no long any
pending unfinished frame. Currently gst_video_decoder_decode_frame is
invoked to send a new frame with new input data, the situaltion is
repeated till an EOS is received. An application has to exit when
receiving an EOS, however there is still valid data in a
non-packetezied input buffer, hence some frames are dropped.
This fix adds a parse callback for non-packeteized input, a new frame
will be sent to the subclass as soon as the input buffer has valid data
This fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/665
Upon bitrate change, make sure to close the encoder otherwise
the encoder is not re-initialized and the target bitrate is
never reached, and the encoder was flushed at each frame
from this moment.
Regression introduced in f2b35abcab which replaced the call
that was closing the encoder by an early return to avoid
re-initialization.
gstwasapiutil.c(173) : warning C4715: 'gst_wasapi_device_role_to_erole': not all control paths return a value
gstwasapiutil.c(188) : warning C4715: 'gst_wasapi_erole_to_device_role': not all control paths return a value
The GstDeviceProvider isn't subclass of GstElement.
(gst-device-monitor-1.0:49356): GLib-GObject-WARNING **: 20:21:18.651:
invalid cast from 'GstWasapiDeviceProvider' to 'GstElement'
The first channel in memory for MFX_FOURCC_RGB4 (VA_FOURCC_ARGB or
GST_VIDEO_FORMAT_BGRA) is B, not A. In MSDK, channle B is used to access
data for RGB4 surface. In addition, the returned pointers for
MFX_FOURCC_AYUV and MFX_FOURCC_Y410 in gst_msdk_video_memory_map_full
were wrong too before this fix.
When the bitrate is changed in playing state the encoder issues a reconfig
that drains and recreates the underlaying hw encoder instance.
With this set of changes we ensure that all this work is only made when
the bitrate did actually change. It also tries to reuse the vpp buffer
pool and fixes the pool leak spotted when testing this feature.
When postpone_free_surface is TRUE, the output buffer is not writable,
however the base decoder needs a writable buffer as output buffer,
otherwise it will make a copy of the output buffer. As the underlying
memory is always lockable, so we may set the LOCKABLE flag for this buffer
to avoid buffer copy in the base class.
The refcount of the output buffer is 1 when postpone_free_surface is
FALSE, so needn't set the LOCKABLE flag for this case.
... instead of calculated display ratio from given PAR and DAR.
d3d11window calculates output display ratio
to decide padding area per window resize event. In the formula,
actual PAR is required to handle both 1:1 PAR and non-1:1 PAR.
Both MSDK and this plugin use mfxFrameAllocResponse for video and DMABuf
memory, it is possible that some GST buffers are still in use when calling
gst_msdk_frame_free, so add a reference count in the wrapper of
mfxFrameAllocResponse (GstMsdkAllocResponse) to make sure the underlying
mfx resources are still available if the corresponding buffer pool is in
use.
In addtion, currently all allocators for input or output share the same
mfxFrameAllocResponse pointer in an element, so it is possible that
the content of mfxFrameAllocResponse is updated for a new caps then all
GST buffers allocated from an old allocator will use this new content of
mfxFrameAllocResponse, which will result in unexpected behavior. In this
fix, we save the the content of mfxFrameAllocResponse in the corresponding
tructure to avoid such issue
Sample pipeline:
gst-launch-1.0 filesrc location=vp9_multi_resolutions.ivf ! ivfparse ! msdkvp9dec !
msdkvpp ! video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink
Otherwise it is possible that different wrappers share the same
mfxFrameAllocResponse pointer, so instead of caching the pointer, we may
cache the content of mfxFrameAllocResponse
For a skipped frame in VC1, MSDK returns the mfx surface of the reference
frame, so we have to make sure the corresponding surface for the
reference frame is not freed. In this fix, we postpone surface free because
we don't know whether a surface is referenced
Before this fix, the error is like as below:
New clock: GstSystemClock
0:00:00.181793130 23098 0x55f8a9d622d0 ERROR msdkdec
gstmsdkdec.c:622:gst_msdkdec_finish_task:<msdkvc1dec0> Couldn't find the
cached MSDK surface
Sample pipeline:
gst-launch-1.0 filesrc location=input_has_skipped_frame.wmv ! asfdemux !
vc1parse ! msdkvc1dec ! glimagesink
If the surface is not in use, we may release it even if GST_FLOW_OK is going
to be returned, which may avoid the issue of failing to get surface
available
This fixes the regression caused by commit c05acf4
GstAllocationParams::align is set to 31 in msdkdec/msdken/msdkvpp, hence
the stride align should be greater than or equal to 31, otherwise it
will result in issue
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
(msdk: "GStreamer-CRITICAL: gst_buffer_resize_range failed" SPAM),
In addition, the stride should match the pitch alignment in the media driver,
otherwise it will result in some issues when a buffer is shared between
different elements, e.g. the NV12 issue mentioned in commit 3f2314a, which
can be reproduced by `gst-launch-1.0 vidoetestsrc ! msdkvpp !
video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink`
Fixed https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
For some hevc 10bit 4K encoding cases, the encoding process may be
slow, and MediaSDK surface can't be released in time before one other
available surface is needed. So add an extra surface for hevc encoding
to avoid this issue.
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).
To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.
New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
- If NvEncEncodePicture returned success, then move all pair in pending_queue
to final stage
- Otherwise, wait more input raw pictures.
Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Add new macro for sink/src pad template to ensure no DMABuf caps
features are exposed on Windows. Some DMABuf caps features
were not handled by the commit 9ec62418c3
gst_buffer_make_writable() requires exclusive reference to the
GstMemory so the _make_writable() for the msdk buffer will result
to fallback system memory copy, because the msdk memory were initialized
with GST_MEMORY_FLAG_NO_SHARE flag.
Note that, disable sharing GstMemory brings high overhead but actually
the msdk memory objects can be shared over multiple buffers.
If the memory is not shareable, newly added GstAllocator::mem_copy will
create copied msdk memory.
Sometimes a HEVC/H265 stream doesn't have a valid profile but MSDK can
handle this stream. Like vaapih265dec, msdkh265dec may advertise the sink
caps without profile
DecodedOrder was deprecated in msdk-2017 version, but some customers
still use this for low-latency streaming of non-b-frame encoded streams,
which needs to output the frame at once
Asking decklink to render audio data seems to be based entirely on
the sample counts which completely disregards the timestamps
we pass to decklink. As a result, we need to explicitly check
for late buffers and drop them ourselves.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
Both g_list_delete_link and g_list_remove remove an element and free it,
so l->next is invalid (catched by valgrind) after calling g_list_delete_link
or g_list_remove
Returning MFX_WRN_INCOMPATIBLE_VIDEO_PARAM means MSDK detects some
incompatible parameters but it is resolved, and we may not regard
MFX_WRN_INCOMPATIBLE_VIDEO_PARAM as a fatal error. In this fix,
GST_FLOW_OK is returned but with a warning message so that a pipeline
may run to the end.
Fixes werror build:
In file included from ../sys/androidmedia/gstahcsrc.c:70:
../gst-libs/gst/interfaces/photography.h:27:2: error: "The GstPhotography interface is unstable API and may change in future." [-Werror,-W#warnings]
#warning "The GstPhotography interface is unstable API and may change in future."
^
../gst-libs/gst/interfaces/photography.h:28:2: error: "You can define GST_USE_UNSTABLE_API to avoid this warning." [-Werror,-W#warnings]
#warning "You can define GST_USE_UNSTABLE_API to avoid this warning."
^
video-direction property is common property in gstreamer. In addition,
both mirroring & rotation properties are marked as deprecated,
video-direction will override mirroring & rotation properties when they
are set explicitly
Fix https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1058
gst_msdkdec_finish_task() may release all frames in
GstVideoDecoder object. In this case, allocate_output_buffer()
cannot get the oldest frame to allocate buffer.
So gst_msdkdec_handle_frame() should return GST_FLOW_OK for
letting gst_video_decoder_decode_frame() to send a new frame
for decoding.
Fixes#664.
Fixes#665.
When vpp rotation is 90 or 270, the output frame
should be rotated, too.
Example:
gst-launch-1.0 -vf videotestsrc \
! video/x-raw,width=720,height=480 \
! msdkvpp rotation=90 ! vaapisink
There is no NdkMediaCodecList API yet, but it is still better to isolate
JNI code. This will facilitate porting to a native API if Google ever
release one.
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
Fix unexpected cropping with non 1:1 pixel aspect-ratio.
The actual buffer width/height should be passed to gst_d3d11_window_render(),
instead of the calculated resolution. The width/height
values are parameters for copying d3d11 video memory.
Also, aspect-ratio should be considered on resize callback
to decide render rectangle size.
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
Currently h264parser produces a field or a frame for
alignment=au for interlaced streams, but the flag
MFX_BITSTREAM_COMPLETE_FRAME needs a complete frame
or complementary field pair of data, this results in
broken images being output.
Some patches have been sent out to fix h264parser,
but they are pending on some unfinished work. In
order to make gstreamer-msdk decoding work properly
for interlaced streams before h264parser is fixed,
this flag will be removed temporarily and will be
added back once h264parser if fixed.
Related to:
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/399https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/228
Instead of using the information we stored ourselves for the video frame
itself. Which was also the wrong one: it was the mode from the property,
not the autodetected one.
This fixes vanc extraction with mode=auto
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
We don't support negotiation with downstream but simply set caps based
on the buffers we receive. This prevents renegotiation to other formats,
and negotiation to NTSC in mode=auto in the beginning until the first
buffer is received.
As side-effect of this, also remove various other caps handling code
that was working around the behaviour of the default
BaseSrc::negotiate().
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
Encoded bitstream might not have valid framerate. If upstream
provided non-variable-framerate (i.e., fps_n > 0 and fps_d > 0)
use upstream framerate instead of parsed one.
Encoding thread is terminated without any notification so
upstream streaming thread is locked because there is nothing
to pop from GAsyncQueue. If downstream returns error,
we need put SHUTDOWN_COOKIE to GAsyncQueue for chain function
can wakeup.
By adding system memory support for nvdec, both en/decoder
in the nvcodec plugin are able to be usable regardless of
OpenGL dependency. Besides, the direct use of system memory
might have less overhead than OpenGL memory depending on use cases.
(e.g., transcoding using S/W encoder)
False warning from MSVC, or it does not understand that
g_assert_not_reached() does not return.
...\gst-plugins-bad-1.0-1.17.0.1\sys\decklink\gstdecklink.cpp(1647) : warning C4715: 'gst_decklink_configure_duplex_mode': not all control paths return a value
Any plugin which returned FALSE from plugin_init will be blacklisted
so the plugin will be unusable even if an user install required runtime
dependency next time. So that's the reason why nvcodec returns TRUE always.
This commit is to remove possible misreading code.
Since we build nvcodec plugin without external CUDA dependency,
CUDA and en/decoder library loading failure can be natural behavior.
Emit error only when the module was opend but required symbols are missing.
This commit includes h265 main-10 profile support if the device can
decode it.
Note that since h264 10bits decoding is not supported by nvidia GPU for now,
the additional code path for h264 high-10 profile is a preparation for
the future Nvidia's enhancement.
GstVideoDecoder::drain/flush can be called at very initial state
with stream-start and flush-stop event, respectively.
Draning with NULL CUvideoparser seems to unsafe and that eventually
failed to handle it.
It is possible that the output region size (e.g. 192x144) is different
from the coded picture size (e.g. 192x256). We may adjust the alignment
parameters so that the padding is respected in GstVideoInfo and use
GstVideoInfo to calculate mfx frame width and height
This fixes the error below when decoding a stream which has different
output region size and coded picture size
0:00:00.057726900 28634 0x55df6c3220a0 ERROR msdkdec
gstmsdkdec.c:1065:gst_msdkdec_handle_frame:<msdkh265dec0>
DecodeFrameAsync failed (failed to allocate memory)
Sample pipeline:
gst-launch-1.0 filesrc location=output.h265 ! h265parse ! msdkh265dec !
glimagesink
... and add our stub cuda header.
Newly introduced stub cuda.h file is defining minimal types in order to
build nvcodec plugin without system installed CUDA toolkit dependency.
This will make cross-compile possible.
* By this commit, if there are more than one device,
nvenc element factory will be created per
device like nvh264device{device-id}enc and nvh265device{device-id}enc
in addition to nvh264enc and nvh265enc, so that the element factory
can expose the exact capability of the device for the codec.
* Each element factory will have fixed cuda-device-id
which is determined during plugin initialization
depending on the capability of corresponding device.
(e.g., when only the second device can encode h265 among two GPU,
then nvh265enc will choose "1" (zero-based numbering)
as it's target cuda-device-id. As we have element factory
per GPU device, "cuda-device-id" property is changed to read-only.
* nvh265enc gains ability to encoding
4:4:4 8bits, 4:2:0 10 bits formats and up to 8K resolution
depending on device capability.
Additionally, I420 GLMemory input is supported by nvenc.
Only the default device has been used by NVDEC so far.
This commit make it possible to use registered device id.
To simplify device id selection, GstNvDecCudaContext usage is removed.
By this commit, each codec has its own element factory so the
nvdec element factory is removed. Also, if there are more than one device,
additional nvdec element factory will be created per
device like nvh264device{device-id}dec, so that the element factory
can expose the exact capability of the device for the codec.
Callbacks of CUvideoparser is called on the streaming thread.
So the use of async queue has no benefit.
Make control flow straightforward instead of long while/switch loop.
Previously we would've reported that there is signal unless we know for
sure that we don't have signal. For example signal would've been
reported before the device is even opened.
Now keep track whether the signal state is unknown or not and report no
signal if we don't know yet. As before, only send an INFO message about
signal recovery if we actually had a signal loss before.
... and put them into new nvcodec plugin.
* nvcodec plugin
Now each nvenc and nvdec element is moved to be a part of nvcodec plugin
for better interoperability.
Additionally, cuda runtime API header dependencies
(i.e., cuda_runtime_api.h and cuda_gl_interop.h) are removed.
Note that cuda runtime APIs have prefix "cuda". Since 1.16 release with
Windows support, only "cuda.h" and "cudaGL.h" dependent symbols have
been used except for some defined types. However, those types could be
replaced with other types which were defined by "cuda.h".
* dynamic library loading
CUDA library will be opened with g_module_open() instead of build-time linking.
On Windows, nvcuda.dll is installed to system path by CUDA Toolkit
installer, and on *nix, user should ensure that libcuda.so.1 can be
loadable (i.e., via LD_LIBRARY_PATH or default dlopen path)
Therefore, NVIDIA_VIDEO_CODEC_SDK_PATH env build time dependency for Windows
is removed.
Direct3D11 was shipped as part of Windows7 and it's obviously
primary graphics API on Windows.
This plugin includes HDR10 rendering if following requirements are satisfied
* IDXGISwapChain4::SetHDRMetaData is available (decleared in dxgi1_5.h)
* Display can support DXGI_COLOR_SPACE_RGB_FULL_G2084_NONE_P2020 color space
* Upstream provides 10 bitdepth format with smpte-st 2084 static metadata
MFX_FOURCC_VP9_SEGMAP surface in MSDK is an internal surface however
MSDK still call the external allocator for this surface, so this plugin
has to return UNSUPPORTED and force MSDK allocates surface using the
internal allocator.
See https://github.com/Intel-Media-SDK/MediaSDK/issues/762 for details
The call of MFXVideoENCODE_EncodeFrameAsync may not generate output and
the function returns MFX_ERR_MORE_DATA with NULL sync point, the input
frame is cached in this case, so it is possible that all allocated
frames go into the surfaces_used list after calling
MFXVideoENCODE_EncodeFrameAsync a few times, then the encoder will fail
to get an available surface before releasing used frames
This patch adds a new field of num_extra_frames to GstMsdkEnc and allows
encode element requires extra frames, the default value is 0.
This patch is the preparation for msdkvp9enc element.
msdkenc supports CSC implicitly, so it is possible that two VPP
processes are required when a pipeline contains msdkvpp and msdkenc.
Before this fix, msdkvpp and msdkenc may share the same context, hence
the same mfx session, which results in MFX_ERR_UNDEFINED_BEHAVIOR
in MSDK because a mfx session has at most one VPP process only
This fixes the broken pipelines below:
gst-launch-1.0 videotestsrc ! video/x-raw,format=I420 ! msdkh264enc ! \
msdkh264dec ! msdkvpp ! video/x-raw,format=YUY2 ! fakesink
gst-launch-1.0 videotestsrc ! msdkvpp ! video/x-raw,format=YUY2 ! \
msdkh264enc ! fakesink
MSDK supports JPEG YUY2 (422 chroma) output color
format. The color format of input bitstream is
described by JPEGChromaFormat and JPEGColorFormat
fields in the mfxInfoMFX structure which is filled
in by the MFXVideoDECODE_DecodeHeader function.
To obtain lossless decoded output from 422 encoded
JPEGs, we must set the output color format in the
FourCC and ChromaFormat fields in the mfxFrameInfo
structure to the appropriate values at post_configure
so that they are propagated through to the srcpad
caps accordingly.
A post_configure virtual method is added to allow
codec subclasses to adjust the initialized parameters
after MFXVideoDECODE_DecodeHeader is called from the
gstmsdkdec::gst_msdkdec_handle_frame function.
This is useful if codecs want to adjust the output
parameters based on the codec-specific decoding
options that are present in the mfxInfoMFX structure
after MFXVideoDECODE_DecodeHeader initializes them.
We were not setting self->segment and we are using it
when notifying downstream that we handled a REQUEST_KEY_UNIT
event, leading to all sort of criticals.
It's been replaced by NVENC/NVDEC and even NVIDIA doesn't
support VDPAU any longer and hasn't for quite some time.
The plugin has been unmaintained and unsupported for a very
long time, and given the track record over the last 10 years
it seems highly unlikely anyone is going to make it work well,
not to mention adding plumbing for proper zero-copy or
gst-gl integration.
Closes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/828
All DRM ioctl uses errno to report the error and simply returns -1
when some error occured. This patch fixes all usage of the return
value instead of errno to trace the error type and moves to g_strerror
instead of string.h strerror in order to be consistent with the rest
of GStreamer.
separateColourPlaneFlag is mapped to separate_colour_plane_flag which
means Y, U and V planes are separately processed as monochrome sampled pictures.
So encoder shouldn't set that flag for normal 4:4:4 encoding.
Also for 4:4:4 encoding, NV_ENC_H264_PROFILE_HIGH_444_GUID profile must be
explicitly set.
The workaround for https://github.com/Intel-Media-SDK/MediaSDK/issues/1139
is required for vp8 only, so move this workaround to the corresponding
postinit_decoder function
The pipeline below works with this change
gst-launch-1.0 filesrc location=SA10104.vc1 ! \
'video/x-wmv,profile=(string)advanced',width=720,height=480,framerate=14/1 ! \
msdkvc1dec ! fakesink
MFXVideoDECODE_DecodeHeader only parses the sequence layer for VC1, so
the structure is unknown for a stream with interlace flag set in the
sequence layer. If forcing the struct to progressive in this plugin,
MediaSDK will fail to decode such streams.
It is possible MFXVideoDECODE_DecodeFrameAsync returns MFX_ERR_INCOMPATIBLE_VIDEO_PARAM
and this error can't be recovered by retrying MFXVideoDECODE_DecodeFrameAsync
in some cases, so we need to limit the number of retries to avoid infinite loop.
This fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/909
To drain all queued encoding items, encoder should gracefully
wait the encoding thread without stealing queued items.
Otherwise, some input frames can be dropped.
../subprojects/gst-plugins-bad/sys/msdk/msdk.c(61): error C2065: 'MFX_FOURCC_RGB565'
The minimum required version for the format seems to MFX_VERSION >= 1028
Returning MFX_ERR_INCOMPATIBLE_VIDEO_PARAM from
MFXVideoDECODE_DecodeFrameAsync means the allocated mfx surface is not
suitable for the current frame, we need a new mfx surface and try
MFXVideoDECODE_DecodeFrameAsync again.
When MFXVideoDECODE_DecodeFrameAsync () returns MFX_WRN_DEVICE_BUSY with
an output surface, a new input surface is required when retrying
MFXVideoDECODE_DecodeFrameAsync ().
This fixes the out-of-surface issue mentioned in
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/890
This gets rid of annoying message in the log, e.g. run the pipeline
below:
gst-launch-1.0 videotestsrc num-buffers=100 ! \
video/x-raw,format=NV12,width=352,height=288 ! msdkh264enc ! filesink \
location=test.h264
[LIBVA]:CRITICAL - DdiMedia_DestroyImage:4357: Invalid image
This make the pipeline below works:
gst-launch-1.0 videotestsrc num-buffers=1 ! msdkvpp ! \
video/x-raw,format=UYVY ! filesink location=a.yuv
Once https://github.com/intel/media-driver/pull/526 in the media-driver
is merged, the pipeline below also works:
gst-launch-1.0 videotestsrc num-buffers=1 ! msdkvpp ! \
video/x-raw\(memory:DMABuf\),format=UYVY ! filesink location=a.yuv
The VCD source was ported in 2014 (commit 89eb1e9), but the necessary
"cdxaparse" plugin, which is used to "Parse a .dat file (VCD) into
raw mpeg1" was never ported.
This means that the probable main user for the feature, totem, hasn't
actually been able to play back VCDs, since 2012, when it switched to
using GStreamer 1.0.
Note that even if cdxaparse was finally ported, a lot of work would
still be necessary before it is considered usable. Notably, it is
missing disc image support [1] and some VCDs just cannot be opened for
reading [2].
[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/898
[2]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/899
gstladspa.c:360:5: error: zero-length ms_printf format string [-Werror=format-zero-length]
vad_private.c:108:3: error: this decimal constant is unsigned only in ISO C90 [-Werror]
gstdecklinkvideosink.cpp:478:32: error: comparison between 'BMDTimecodeFormat {aka enum _BMDTimecodeFormat}' and 'enum GstDecklinkTimecodeFormat' [-Werror=enum-compare]
win/DeckLinkAPI_i.c:72:8: error: extra tokens at end of #endif directive [-Werror]
win/DeckLinkAPIDispatch.cpp:35:10: error: unused variable 'res' [-Werror=unused-variable]
gstwasapiutil.c:733:3: error: format '%x' expects argument of type 'unsigned int', but argument 8 has type 'DWORD' [-Werror=format]
gstwasapiutil.c:733:3: error: format '%x' expects argument of type 'unsigned int', but argument 9 has type 'guint64' [-Werror=format]
kshelpers.c:446:3: error: missing braces around initializer [-Werror=missing-braces]
kshelpers.c:446:3: error: (near initialization for 'known_property_sets[0].guid.Data4') [-Werror=missing-braces]
An output surface is returned but without sync point when when
MFXVideoDECODE_DecodeFrameAsync () returns MFX_ERR_MORE_DATA, this
surface should be released too, otherwise the surface is occupied
and it is easy to exhaust all pre-allocated mfx surfaces.
Example pipeline (input_vp8.webm contains lots of frame with show_frame
set to 0):
gst-launch-1.0 filesrc location=input_vp8.webm ! matroskademux !
msdkvp8dec ! msdkvpp ! fakesink
0:00:05.995959693 19866 0x563f30f14590 ERROR default
gstmsdkvideomemory.c:77:gst_msdk_video_allocator_get_surface: failed to
get surface available
ERROR: from element
/GstPipeline:pipeline0/GstMatroskaDemux:matroskademux0: Internal data
stream error.
The input buffer is released in gst_msdkdec_finish_task () when decoding
some special clips however this buffer is still in use, so ref the input
buffer before gst_msdkdec_finish_task () and unref it at the end of
gst_msdkdec_handle_frame ().
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/862
The picture structure in the output parameters from
MFXVideoDECODE_Query is set to MFX_PICSTRUCT_UNKNOWN for some codecs, so
the structure of the corresponding mfx surfaces created for decoding are
unknown. The pipeline will be broken when these surfaces are used as the
input for msdkvpp.
Example pipeline:
gst-launch-1.0 filesrc location=input_vp8.webm ! matroskademux !
msdkvp8dec ! msdkvpp ! fakesink
Error message:
0:00:00.031568911 14259 0x55b79dc684a0 ERROR msdkvpp
gstmsdkvpp.c:728:gst_msdkvpp_transform:<msdkvpp0> MSDK Failed to do VPP
ERROR: from element
/GstPipeline:pipeline0/GstMatroskaDemux:matroskademux0: Internal data
stream error.
This is a workaround for the above issue
The memory type was used as bitwise enum, but the enum was not
defined in that way.
Nonetheless, most of the usage of the memory type was as mutually
exclusive options, rather than option composition.
This patch refactor how the memory type is defined, so it is kept
the mutual exclusion among options.
'channel-mask' field should not be put in caps if channel mask is 0x0
Mapping WASAPI channel mask to GST equivalent was going only over
first nChannels elements of wasapi_to_gst_pos array, translating, for
example, WASAPI's 0x63f to GST's 0x3f instead of 0xc3f.
When 'channel-mask' is specified as NULL, it signifies that there's
need to do downmix or upmix and it makes caps negotiation with
audioconvert element impossible. Just omit it.
Signed-off-by: Nirbheek Chauhan <nirbheek@centricular.com>
When building the msdk plugin even if libmfx is found, unless the
plugin is explicitly enabled we should not error out if msdk
dependencies are not found.
Also give an error message when we don't build the plugin on Windows
because we're not building with MSVC.
When the audio device goes away during playback or capture, we were
going into an infinite loop of AUDCLNT_E_DEVICE_INVALIDATED. Return -1
and post an error message so the ringbuffer thread exits with an error.
BRCParamMultiplier in mfxInfoMFX is a parameter which specifies a
multiplier for bitrate control parameters [1], it impacts TargetKbps,
MaxKbps, BufferSizeInKB and InitialDelayInKB.
[1]: https://software.intel.com/en-us/node/628473
According to the MSDK documation[1], MFXCloneSession is a light-weight
equivalent of MFXJoinSession after MFXInit, so MFXJoinSession call isn't
needed in the msdk plugin, otherwise the cloned session is joined to the
parent session twice, and we will get a MFX error when closing the
parent session
example pipeline:
gst-launch-1.0 videotestsrc num-buffers=100 ! \
video/x-raw,format=NV12,width=352,height=288 ! msdkh264enc ! msdkh264dec ! \
msdkh264enc ! fakesink
Error message:
0:00:00.211948518 21733 0x5586ee741c60 ERROR msdk
msdk.c:148:msdk_close_session: Close failed (undefined behavior)
[1]: https://software.intel.com/en-us/node/628429#MFXCloneSession
In gst-msdk, a mfx session may be shared between different gst
elements, each element tries to set the frame allocator. However, per
the MSDK documation[1], the behavior is undefined if reset the frame
allocator while the previous allocator is in use. Fortunately all
elements use the same frame allocator, so we can avoid to call
MFXVideoCORE_SetFrameAllocator again.
[1]: https://software.intel.com/en-us/node/628430#MFXVideoCORE3
If so, BGRA is the preferred output format hence BGRA will be selected
as input format by default, e.g. in the pipleline below, BGRA instead of
NV12 is selected without renegotiation, so we can avoid the NV12 issue
(see commit 3f2314a) by default.
gst-launch-1.0 videotestsrc ! msdkvpp ! glimagesink
Otherwise MFXVideoVPP_Init will fail because it is called twice without
a close.
Example pipeline:
gst-launch-1.0 videotestsrc ! msdkvpp ! glimagesink
Sometimes glimagesink emits GST_EVENT_RECONFIGURE event which results
in that MFXVideoVPP_Init is called twice, then get the negotiation
failure below:
0:00:00.093715518 21218 0x558ef56231e0 ERROR msdkvpp
gstmsdkvpp.c:995:gst_msdkvpp_initialize:<msdkvpp0> Init failed
(undefined behavior)
WARNING: from element /GstPipeline:pipeline0/GstMsdkVPP:msdkvpp0: not
negotiated
After applying this commit, the pipeline above may run without
negotiation failure, however NV12 layout in dmabuf mode is selected in
renegotiation, the display image is corrupted due to the NV12 issue which
was mentioned in commit 3f2314a. Some other fixes are needed to avoid
renegotiation by default
In general, we should assume any unhandled error is
non-recoverable.
In the flush frames loop, some error states can cause us
to never increment the task and therefore we get stuck
in an infinite loop and generate GST_ELEMENT_ERROR
over and over again. This eventually consumes all
system memory and triggers OOM. Thus, assume the worst
and break out of the loop upon the first "unhandled" error.
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/859
When either the source or sink goes from PLAYING -> NULL -> PLAYING,
we call _reset() which sets client_needs_restart, and then we call
prepare() which calls IAudioClient_Start(), so we don't need to call
it again in src_read() or sink_write(). Unlike when we're just going
PLAYING -> PAUSED -> PLAYING.
configure_mode_setting() keeps a ref on tmp_kmsmem which is released in
gst_kms_sink_show_frame().
But if for some reason configure_mode_setting() is re-called before
showing a frame or if none is showed this memory was leaked.
ACM is an ancient legacy API, and there's no point in
keeping it around for a licensed mp3 decoder now that
mp3 patents have expired and we have a decoder in -good.
We didn't ship this in cerbero anyway. If there's a good
case for the AAC encoder (which is LC only anyway) someone
should write a new plugin based on current APIs, that can
actually be built out of the box.
Fixes#850
As a side-effect we can now actually store the line offset in the
line21dec element, and have to perform fewer transformations in the
decklink elements (which were also buggy as they assumed a single byte
triplet per meta).
We will add more profiles in the sink caps of msdkh265enc, so let
msdkh265enc re-add the sink pad template. Note this change doesn't
impact any capability
Fixes the time calculations when dealing with a slaved clock (as
will occur with more than one decklink video sink), when performing
flushing seeks causing stalls in the output timeline, pausing.
Tighten up the calculations by relying solely on the internal time
(from the internal clock) for determining when to schedule display
frames instead attempting to track pause lengths from the external
clock and converting to internal time. This results in a much easier
offset calculation for choosing the output time and ensures that the
clock is always advancing when we need it to.
This is fixup to the 'monotonically increasing output timestamps' goal
in: bf849e9a69
CodecProfile will be set in MFXVideoDECODE_DecodeHeader() to match
the input stream. Setting the hard-coded profile here will mislead
user that msdkh265dec supports a special profile only.
Previously alloc_info is initialized when both thiz->initialized
and thiz->allocation_caps are true, but only thiz->initialized is
checked when alloc_info is used.
Instead of relying on buffers after a state change to PLAYING to always start
from 0, track the amount of time we have spent outside playing but not changed
state to PAUSED.