D3D11 dynamic texture is a special memory type, which is mainly used for
frequent CPU write access to the texture. For now, this texture type
does not support gst_memory_{map,unmap}
* Create staging texture only when the CPU access is requested.
Note that we should avoid the CPU access to d3d11 memory as mush as possible.
Incoming d3d11upload and d3d11download will take this GPU memory upload/download.
* Upload/Download texture memory from/to staging only if it needed, similar to
GstGL PBO implementation.
* Define more dxgi formats for future usage (e.g., color conversion, dxva2 decoder).
Because I420_* formats are not supported formats by dxgi, each plane should
be handled likewise GstGL separately, but NV12/P10 formats might be supported ones.
So we decide the number of d3d11memory per GstBuffer for video memory depending on
OS version and dxgi format. For instance, if NV12 is supported by OS,
only one d3d11memory with DXGI_FORMAT_NV12 texture can be allocated by this commit.
One use case of such texture is DXVA. In case DXVA decoder, it might need to produce decoded data
to one DXGI_FORMAT_NV12 instead of seperate Y and UV planes.
Such behavior will be controlled via configuration of GstD3D11BufferPool and
default configuration is separate resources per plane.
Depending on selected feature level, d3d11 API usage can be different.
Instead of querying the selected feature level by user whenever required,
store it once by d3d11device.
Do not accept any GstD3D11Device context which has different adapter
index from the required one. For example, if a d3d11 element is expecting
d3d11 device with adapter 1 (i.e., the second GPU), any d3d11 device
context having different adapter could not be shared with
the d3d11 element.
Make them consistent with cuda context utils functions.
Put in-only parameter before all in-out parameters, and add _handle()
suffix to native handle getter functions.
In certain cases, the sink's buffer pool will not call the parent's
release_buffer method, so the pool does not clean up properly
after the buffer is released.
Since macOS Mojave (10.14), video permissions have to be explicitly
granted by a user in order to open a video device such as a camera.
This commit adds a check for the current permission status, and tries
to request for permission if applicable.
The whole `src_read()` function is a hot loop since the ringbuffer
thread is waiting on us, and printing to the console from inside it
can easily cause us to miss our deadline.
F.ex., if you had GST_DEBUG=3 and we accidentally missed a device
period, we'd trigger the "reported glitch" warning, which would cause
us to miss another device period, and so on. Let's reduce the log
level so that GST_DEBUG=3 is more usable, and only print buffer flag
info when it's actually relevant.
Some audio drivers return varying amounts of data per ::GetBuffer
call, instead of following the device period that they've told us
about in `src_prepare()`.
Previously, we would just drop those extra buffers hoping that the
extra buffers were temporary (f.ex., a startup 'burst' of audio data).
However, it seems that some audio drivers, particularly on older
Windows versions (such as Windows 10 1703 and older) consistently
return varying amounts of data.
Use GstAdapter to smooth that out, and hope that the audio driver is
locally varying but globally periodic.
Initially reported in https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/808
We were miscalculating the device period, i.e. the number of frames
we'll get from WASAPI in each IAudioClient::GetBuffer call, due to
a calculation mistake (truncate instead of round).
For example, on my machine when the aux input is set to 44.1KHz, the
reported device period is 101587, which comes out to 447.998 frames
per ::GetBuffer call. In reality we will, of course, get 448 frames
per call, but we were truncating, so we expected 447 and were
discarding one frame every time. This led to glitching, and skew over
time.
Interestingly, I can only see this with 44.1Khz. 48Khz/96Khz are fine,
because the device period is a more 'even' number.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/806
The for loop in gst_msdkdec_handle_frame is error prone
about how it manages surfaces. Because sometimes it sets
the surface variable to NULL and sometimes it needs to free
it right away. So better to print an error if surfaces are
leaked to help with any change around the loop.
msdk plugin is not used for sofware encode/decode as there are better
solutions available. Also, with MFX_IMPL_AUTO_ANY, if software decode
is not supported, the plugin will still load, but will then fail when trying to
run the (autoplugged) pipeline. With MFX_IMPL_HARDWARE_ANY,
the plugin fails and a better software decoder is auto-plugged.
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
Can be reproduced with:
videotestsrc ! x264enc key-int-max=$N ! \
h264parse ! msdkh264dec ! fakesink sync=1
It happens with any gop size but the smaller is the distance N
between key frames, the quicker it is leaking.
Fixes#1023
In case of pkg-config we need to create the include directories object
from the path using include_directories(). For INTELMEDIASDKROOT or
MFX_HOME we need to add the alternate include path ./include/mfx as
Intel MediaSDK now puts the headers there.
This adds a check to avoid draining when the imported buffers are in
fact own by kmssink. This happens since we export our kms buffer as
DMABuf. They are not really imported back as we pre-fill the cache,
but uses the same format as if they were external. This fixes
performance issues seen with videocrop2-test (found in -good).
Draining systematically on caps changes was a hack. Instead, properly
save the render information used to render last_render, and use that
information to drain. This fixes performance issues met with video crop
meta and per frame caps changes.
By passing NULL to `g_signal_new` instead of a marshaller, GLib will
actually internally optimize the signal (if the marshaller is available
in GLib itself) by also setting the valist marshaller. This makes the
signal emission a bit more performant than the regular marshalling,
which still needs to box into `GValue` and call libffi in case of a
generic marshaller.
Note that for custom marshallers, one would use
`g_signal_set_va_marshaller()` with the valist marshaller instead.
In future, a sub class of GstMsdkEncClass may decide a native format by
using this method, e.g. JPEG encoder may accept YUY2 input, however the
current implemation needs a conversion from YUY2 to NV12 before encoding.
In addtion, a sub class may choose a format for encoding if the input
format is not supported by MSDK, e.g. the current implemation does
UYVY->NV12 if the input format is UYVY. We may do UYVY->YUY2 for JPEG
encoder in future
MFX_FOURCC_BGR4 is mapped to VA_FOURCC_ABGR and JPEG encoder needs a
MFX_FOURCC_BGR4 frame for internal usage when the input format is
MFX_FOURCC_RGB4
This is a preparation for supporting native formats of JPEG encoder
Instead of using a proxy of `is_packetized` flag this patch
replaces it with the accessor to that flag in decoder base class,
avoiding probable mismatches.
commit 55c0d720 added the capability to handle non-packetized bitstream,
and there is a loop to handle multiple frames in a non-packetized buffer
in gst_msdkdec_handle_frame. However it is possible that a
non-packetized buffer still contains valid data but there is no long any
pending unfinished frame. Currently gst_video_decoder_decode_frame is
invoked to send a new frame with new input data, the situaltion is
repeated till an EOS is received. An application has to exit when
receiving an EOS, however there is still valid data in a
non-packetezied input buffer, hence some frames are dropped.
This fix adds a parse callback for non-packeteized input, a new frame
will be sent to the subclass as soon as the input buffer has valid data
This fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/665
Upon bitrate change, make sure to close the encoder otherwise
the encoder is not re-initialized and the target bitrate is
never reached, and the encoder was flushed at each frame
from this moment.
Regression introduced in f2b35abcab which replaced the call
that was closing the encoder by an early return to avoid
re-initialization.
gstwasapiutil.c(173) : warning C4715: 'gst_wasapi_device_role_to_erole': not all control paths return a value
gstwasapiutil.c(188) : warning C4715: 'gst_wasapi_erole_to_device_role': not all control paths return a value
The GstDeviceProvider isn't subclass of GstElement.
(gst-device-monitor-1.0:49356): GLib-GObject-WARNING **: 20:21:18.651:
invalid cast from 'GstWasapiDeviceProvider' to 'GstElement'
The first channel in memory for MFX_FOURCC_RGB4 (VA_FOURCC_ARGB or
GST_VIDEO_FORMAT_BGRA) is B, not A. In MSDK, channle B is used to access
data for RGB4 surface. In addition, the returned pointers for
MFX_FOURCC_AYUV and MFX_FOURCC_Y410 in gst_msdk_video_memory_map_full
were wrong too before this fix.
When the bitrate is changed in playing state the encoder issues a reconfig
that drains and recreates the underlaying hw encoder instance.
With this set of changes we ensure that all this work is only made when
the bitrate did actually change. It also tries to reuse the vpp buffer
pool and fixes the pool leak spotted when testing this feature.
When postpone_free_surface is TRUE, the output buffer is not writable,
however the base decoder needs a writable buffer as output buffer,
otherwise it will make a copy of the output buffer. As the underlying
memory is always lockable, so we may set the LOCKABLE flag for this buffer
to avoid buffer copy in the base class.
The refcount of the output buffer is 1 when postpone_free_surface is
FALSE, so needn't set the LOCKABLE flag for this case.
... instead of calculated display ratio from given PAR and DAR.
d3d11window calculates output display ratio
to decide padding area per window resize event. In the formula,
actual PAR is required to handle both 1:1 PAR and non-1:1 PAR.
Both MSDK and this plugin use mfxFrameAllocResponse for video and DMABuf
memory, it is possible that some GST buffers are still in use when calling
gst_msdk_frame_free, so add a reference count in the wrapper of
mfxFrameAllocResponse (GstMsdkAllocResponse) to make sure the underlying
mfx resources are still available if the corresponding buffer pool is in
use.
In addtion, currently all allocators for input or output share the same
mfxFrameAllocResponse pointer in an element, so it is possible that
the content of mfxFrameAllocResponse is updated for a new caps then all
GST buffers allocated from an old allocator will use this new content of
mfxFrameAllocResponse, which will result in unexpected behavior. In this
fix, we save the the content of mfxFrameAllocResponse in the corresponding
tructure to avoid such issue
Sample pipeline:
gst-launch-1.0 filesrc location=vp9_multi_resolutions.ivf ! ivfparse ! msdkvp9dec !
msdkvpp ! video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink
Otherwise it is possible that different wrappers share the same
mfxFrameAllocResponse pointer, so instead of caching the pointer, we may
cache the content of mfxFrameAllocResponse
For a skipped frame in VC1, MSDK returns the mfx surface of the reference
frame, so we have to make sure the corresponding surface for the
reference frame is not freed. In this fix, we postpone surface free because
we don't know whether a surface is referenced
Before this fix, the error is like as below:
New clock: GstSystemClock
0:00:00.181793130 23098 0x55f8a9d622d0 ERROR msdkdec
gstmsdkdec.c:622:gst_msdkdec_finish_task:<msdkvc1dec0> Couldn't find the
cached MSDK surface
Sample pipeline:
gst-launch-1.0 filesrc location=input_has_skipped_frame.wmv ! asfdemux !
vc1parse ! msdkvc1dec ! glimagesink
If the surface is not in use, we may release it even if GST_FLOW_OK is going
to be returned, which may avoid the issue of failing to get surface
available
This fixes the regression caused by commit c05acf4
GstAllocationParams::align is set to 31 in msdkdec/msdken/msdkvpp, hence
the stride align should be greater than or equal to 31, otherwise it
will result in issue
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
(msdk: "GStreamer-CRITICAL: gst_buffer_resize_range failed" SPAM),
In addition, the stride should match the pitch alignment in the media driver,
otherwise it will result in some issues when a buffer is shared between
different elements, e.g. the NV12 issue mentioned in commit 3f2314a, which
can be reproduced by `gst-launch-1.0 vidoetestsrc ! msdkvpp !
video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink`
Fixed https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
For some hevc 10bit 4K encoding cases, the encoding process may be
slow, and MediaSDK surface can't be released in time before one other
available surface is needed. So add an extra surface for hevc encoding
to avoid this issue.
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).
To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.
New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
- If NvEncEncodePicture returned success, then move all pair in pending_queue
to final stage
- Otherwise, wait more input raw pictures.
Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Add new macro for sink/src pad template to ensure no DMABuf caps
features are exposed on Windows. Some DMABuf caps features
were not handled by the commit 9ec62418c3
gst_buffer_make_writable() requires exclusive reference to the
GstMemory so the _make_writable() for the msdk buffer will result
to fallback system memory copy, because the msdk memory were initialized
with GST_MEMORY_FLAG_NO_SHARE flag.
Note that, disable sharing GstMemory brings high overhead but actually
the msdk memory objects can be shared over multiple buffers.
If the memory is not shareable, newly added GstAllocator::mem_copy will
create copied msdk memory.
Sometimes a HEVC/H265 stream doesn't have a valid profile but MSDK can
handle this stream. Like vaapih265dec, msdkh265dec may advertise the sink
caps without profile
DecodedOrder was deprecated in msdk-2017 version, but some customers
still use this for low-latency streaming of non-b-frame encoded streams,
which needs to output the frame at once
Asking decklink to render audio data seems to be based entirely on
the sample counts which completely disregards the timestamps
we pass to decklink. As a result, we need to explicitly check
for late buffers and drop them ourselves.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.