D3D11 dynamic texture is a special memory type, which is mainly used for
frequent CPU write access to the texture. For now, this texture type
does not support gst_memory_{map,unmap}
* Create staging texture only when the CPU access is requested.
Note that we should avoid the CPU access to d3d11 memory as mush as possible.
Incoming d3d11upload and d3d11download will take this GPU memory upload/download.
* Upload/Download texture memory from/to staging only if it needed, similar to
GstGL PBO implementation.
* Define more dxgi formats for future usage (e.g., color conversion, dxva2 decoder).
Because I420_* formats are not supported formats by dxgi, each plane should
be handled likewise GstGL separately, but NV12/P10 formats might be supported ones.
So we decide the number of d3d11memory per GstBuffer for video memory depending on
OS version and dxgi format. For instance, if NV12 is supported by OS,
only one d3d11memory with DXGI_FORMAT_NV12 texture can be allocated by this commit.
One use case of such texture is DXVA. In case DXVA decoder, it might need to produce decoded data
to one DXGI_FORMAT_NV12 instead of seperate Y and UV planes.
Such behavior will be controlled via configuration of GstD3D11BufferPool and
default configuration is separate resources per plane.
Depending on selected feature level, d3d11 API usage can be different.
Instead of querying the selected feature level by user whenever required,
store it once by d3d11device.
Do not accept any GstD3D11Device context which has different adapter
index from the required one. For example, if a d3d11 element is expecting
d3d11 device with adapter 1 (i.e., the second GPU), any d3d11 device
context having different adapter could not be shared with
the d3d11 element.
Make them consistent with cuda context utils functions.
Put in-only parameter before all in-out parameters, and add _handle()
suffix to native handle getter functions.
In certain cases, the sink's buffer pool will not call the parent's
release_buffer method, so the pool does not clean up properly
after the buffer is released.
Since macOS Mojave (10.14), video permissions have to be explicitly
granted by a user in order to open a video device such as a camera.
This commit adds a check for the current permission status, and tries
to request for permission if applicable.
The whole `src_read()` function is a hot loop since the ringbuffer
thread is waiting on us, and printing to the console from inside it
can easily cause us to miss our deadline.
F.ex., if you had GST_DEBUG=3 and we accidentally missed a device
period, we'd trigger the "reported glitch" warning, which would cause
us to miss another device period, and so on. Let's reduce the log
level so that GST_DEBUG=3 is more usable, and only print buffer flag
info when it's actually relevant.
Some audio drivers return varying amounts of data per ::GetBuffer
call, instead of following the device period that they've told us
about in `src_prepare()`.
Previously, we would just drop those extra buffers hoping that the
extra buffers were temporary (f.ex., a startup 'burst' of audio data).
However, it seems that some audio drivers, particularly on older
Windows versions (such as Windows 10 1703 and older) consistently
return varying amounts of data.
Use GstAdapter to smooth that out, and hope that the audio driver is
locally varying but globally periodic.
Initially reported in https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/808
We were miscalculating the device period, i.e. the number of frames
we'll get from WASAPI in each IAudioClient::GetBuffer call, due to
a calculation mistake (truncate instead of round).
For example, on my machine when the aux input is set to 44.1KHz, the
reported device period is 101587, which comes out to 447.998 frames
per ::GetBuffer call. In reality we will, of course, get 448 frames
per call, but we were truncating, so we expected 447 and were
discarding one frame every time. This led to glitching, and skew over
time.
Interestingly, I can only see this with 44.1Khz. 48Khz/96Khz are fine,
because the device period is a more 'even' number.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/806
The for loop in gst_msdkdec_handle_frame is error prone
about how it manages surfaces. Because sometimes it sets
the surface variable to NULL and sometimes it needs to free
it right away. So better to print an error if surfaces are
leaked to help with any change around the loop.
msdk plugin is not used for sofware encode/decode as there are better
solutions available. Also, with MFX_IMPL_AUTO_ANY, if software decode
is not supported, the plugin will still load, but will then fail when trying to
run the (autoplugged) pipeline. With MFX_IMPL_HARDWARE_ANY,
the plugin fails and a better software decoder is auto-plugged.
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
Can be reproduced with:
videotestsrc ! x264enc key-int-max=$N ! \
h264parse ! msdkh264dec ! fakesink sync=1
It happens with any gop size but the smaller is the distance N
between key frames, the quicker it is leaking.
Fixes#1023
In case of pkg-config we need to create the include directories object
from the path using include_directories(). For INTELMEDIASDKROOT or
MFX_HOME we need to add the alternate include path ./include/mfx as
Intel MediaSDK now puts the headers there.
This adds a check to avoid draining when the imported buffers are in
fact own by kmssink. This happens since we export our kms buffer as
DMABuf. They are not really imported back as we pre-fill the cache,
but uses the same format as if they were external. This fixes
performance issues seen with videocrop2-test (found in -good).
Draining systematically on caps changes was a hack. Instead, properly
save the render information used to render last_render, and use that
information to drain. This fixes performance issues met with video crop
meta and per frame caps changes.
By passing NULL to `g_signal_new` instead of a marshaller, GLib will
actually internally optimize the signal (if the marshaller is available
in GLib itself) by also setting the valist marshaller. This makes the
signal emission a bit more performant than the regular marshalling,
which still needs to box into `GValue` and call libffi in case of a
generic marshaller.
Note that for custom marshallers, one would use
`g_signal_set_va_marshaller()` with the valist marshaller instead.
In future, a sub class of GstMsdkEncClass may decide a native format by
using this method, e.g. JPEG encoder may accept YUY2 input, however the
current implemation needs a conversion from YUY2 to NV12 before encoding.
In addtion, a sub class may choose a format for encoding if the input
format is not supported by MSDK, e.g. the current implemation does
UYVY->NV12 if the input format is UYVY. We may do UYVY->YUY2 for JPEG
encoder in future
MFX_FOURCC_BGR4 is mapped to VA_FOURCC_ABGR and JPEG encoder needs a
MFX_FOURCC_BGR4 frame for internal usage when the input format is
MFX_FOURCC_RGB4
This is a preparation for supporting native formats of JPEG encoder
Instead of using a proxy of `is_packetized` flag this patch
replaces it with the accessor to that flag in decoder base class,
avoiding probable mismatches.
commit 55c0d720 added the capability to handle non-packetized bitstream,
and there is a loop to handle multiple frames in a non-packetized buffer
in gst_msdkdec_handle_frame. However it is possible that a
non-packetized buffer still contains valid data but there is no long any
pending unfinished frame. Currently gst_video_decoder_decode_frame is
invoked to send a new frame with new input data, the situaltion is
repeated till an EOS is received. An application has to exit when
receiving an EOS, however there is still valid data in a
non-packetezied input buffer, hence some frames are dropped.
This fix adds a parse callback for non-packeteized input, a new frame
will be sent to the subclass as soon as the input buffer has valid data
This fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/665
Upon bitrate change, make sure to close the encoder otherwise
the encoder is not re-initialized and the target bitrate is
never reached, and the encoder was flushed at each frame
from this moment.
Regression introduced in f2b35abcab which replaced the call
that was closing the encoder by an early return to avoid
re-initialization.
gstwasapiutil.c(173) : warning C4715: 'gst_wasapi_device_role_to_erole': not all control paths return a value
gstwasapiutil.c(188) : warning C4715: 'gst_wasapi_erole_to_device_role': not all control paths return a value
The GstDeviceProvider isn't subclass of GstElement.
(gst-device-monitor-1.0:49356): GLib-GObject-WARNING **: 20:21:18.651:
invalid cast from 'GstWasapiDeviceProvider' to 'GstElement'
The first channel in memory for MFX_FOURCC_RGB4 (VA_FOURCC_ARGB or
GST_VIDEO_FORMAT_BGRA) is B, not A. In MSDK, channle B is used to access
data for RGB4 surface. In addition, the returned pointers for
MFX_FOURCC_AYUV and MFX_FOURCC_Y410 in gst_msdk_video_memory_map_full
were wrong too before this fix.
When the bitrate is changed in playing state the encoder issues a reconfig
that drains and recreates the underlaying hw encoder instance.
With this set of changes we ensure that all this work is only made when
the bitrate did actually change. It also tries to reuse the vpp buffer
pool and fixes the pool leak spotted when testing this feature.
When postpone_free_surface is TRUE, the output buffer is not writable,
however the base decoder needs a writable buffer as output buffer,
otherwise it will make a copy of the output buffer. As the underlying
memory is always lockable, so we may set the LOCKABLE flag for this buffer
to avoid buffer copy in the base class.
The refcount of the output buffer is 1 when postpone_free_surface is
FALSE, so needn't set the LOCKABLE flag for this case.
... instead of calculated display ratio from given PAR and DAR.
d3d11window calculates output display ratio
to decide padding area per window resize event. In the formula,
actual PAR is required to handle both 1:1 PAR and non-1:1 PAR.
Both MSDK and this plugin use mfxFrameAllocResponse for video and DMABuf
memory, it is possible that some GST buffers are still in use when calling
gst_msdk_frame_free, so add a reference count in the wrapper of
mfxFrameAllocResponse (GstMsdkAllocResponse) to make sure the underlying
mfx resources are still available if the corresponding buffer pool is in
use.
In addtion, currently all allocators for input or output share the same
mfxFrameAllocResponse pointer in an element, so it is possible that
the content of mfxFrameAllocResponse is updated for a new caps then all
GST buffers allocated from an old allocator will use this new content of
mfxFrameAllocResponse, which will result in unexpected behavior. In this
fix, we save the the content of mfxFrameAllocResponse in the corresponding
tructure to avoid such issue
Sample pipeline:
gst-launch-1.0 filesrc location=vp9_multi_resolutions.ivf ! ivfparse ! msdkvp9dec !
msdkvpp ! video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink
Otherwise it is possible that different wrappers share the same
mfxFrameAllocResponse pointer, so instead of caching the pointer, we may
cache the content of mfxFrameAllocResponse
For a skipped frame in VC1, MSDK returns the mfx surface of the reference
frame, so we have to make sure the corresponding surface for the
reference frame is not freed. In this fix, we postpone surface free because
we don't know whether a surface is referenced
Before this fix, the error is like as below:
New clock: GstSystemClock
0:00:00.181793130 23098 0x55f8a9d622d0 ERROR msdkdec
gstmsdkdec.c:622:gst_msdkdec_finish_task:<msdkvc1dec0> Couldn't find the
cached MSDK surface
Sample pipeline:
gst-launch-1.0 filesrc location=input_has_skipped_frame.wmv ! asfdemux !
vc1parse ! msdkvc1dec ! glimagesink
If the surface is not in use, we may release it even if GST_FLOW_OK is going
to be returned, which may avoid the issue of failing to get surface
available
This fixes the regression caused by commit c05acf4
GstAllocationParams::align is set to 31 in msdkdec/msdken/msdkvpp, hence
the stride align should be greater than or equal to 31, otherwise it
will result in issue
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
(msdk: "GStreamer-CRITICAL: gst_buffer_resize_range failed" SPAM),
In addition, the stride should match the pitch alignment in the media driver,
otherwise it will result in some issues when a buffer is shared between
different elements, e.g. the NV12 issue mentioned in commit 3f2314a, which
can be reproduced by `gst-launch-1.0 vidoetestsrc ! msdkvpp !
video/x-raw\(memory:DMABuf\),format=NV12 ! glimagesink`
Fixed https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/861
For some hevc 10bit 4K encoding cases, the encoding process may be
slow, and MediaSDK surface can't be released in time before one other
available surface is needed. So add an extra surface for hevc encoding
to avoid this issue.
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).
To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.
New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
- If NvEncEncodePicture returned success, then move all pair in pending_queue
to final stage
- Otherwise, wait more input raw pictures.
Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality
Also, various configurable rate-control related properties are added.
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
Add new macro for sink/src pad template to ensure no DMABuf caps
features are exposed on Windows. Some DMABuf caps features
were not handled by the commit 9ec62418c3
gst_buffer_make_writable() requires exclusive reference to the
GstMemory so the _make_writable() for the msdk buffer will result
to fallback system memory copy, because the msdk memory were initialized
with GST_MEMORY_FLAG_NO_SHARE flag.
Note that, disable sharing GstMemory brings high overhead but actually
the msdk memory objects can be shared over multiple buffers.
If the memory is not shareable, newly added GstAllocator::mem_copy will
create copied msdk memory.
Sometimes a HEVC/H265 stream doesn't have a valid profile but MSDK can
handle this stream. Like vaapih265dec, msdkh265dec may advertise the sink
caps without profile
DecodedOrder was deprecated in msdk-2017 version, but some customers
still use this for low-latency streaming of non-b-frame encoded streams,
which needs to output the frame at once
Asking decklink to render audio data seems to be based entirely on
the sample counts which completely disregards the timestamps
we pass to decklink. As a result, we need to explicitly check
for late buffers and drop them ourselves.
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
Both g_list_delete_link and g_list_remove remove an element and free it,
so l->next is invalid (catched by valgrind) after calling g_list_delete_link
or g_list_remove
Returning MFX_WRN_INCOMPATIBLE_VIDEO_PARAM means MSDK detects some
incompatible parameters but it is resolved, and we may not regard
MFX_WRN_INCOMPATIBLE_VIDEO_PARAM as a fatal error. In this fix,
GST_FLOW_OK is returned but with a warning message so that a pipeline
may run to the end.
Fixes werror build:
In file included from ../sys/androidmedia/gstahcsrc.c:70:
../gst-libs/gst/interfaces/photography.h:27:2: error: "The GstPhotography interface is unstable API and may change in future." [-Werror,-W#warnings]
#warning "The GstPhotography interface is unstable API and may change in future."
^
../gst-libs/gst/interfaces/photography.h:28:2: error: "You can define GST_USE_UNSTABLE_API to avoid this warning." [-Werror,-W#warnings]
#warning "You can define GST_USE_UNSTABLE_API to avoid this warning."
^
video-direction property is common property in gstreamer. In addition,
both mirroring & rotation properties are marked as deprecated,
video-direction will override mirroring & rotation properties when they
are set explicitly
Fix https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1058
gst_msdkdec_finish_task() may release all frames in
GstVideoDecoder object. In this case, allocate_output_buffer()
cannot get the oldest frame to allocate buffer.
So gst_msdkdec_handle_frame() should return GST_FLOW_OK for
letting gst_video_decoder_decode_frame() to send a new frame
for decoding.
Fixes#664.
Fixes#665.
When vpp rotation is 90 or 270, the output frame
should be rotated, too.
Example:
gst-launch-1.0 -vf videotestsrc \
! video/x-raw,width=720,height=480 \
! msdkvpp rotation=90 ! vaapisink
There is no NdkMediaCodecList API yet, but it is still better to isolate
JNI code. This will facilitate porting to a native API if Google ever
release one.
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
Fix unexpected cropping with non 1:1 pixel aspect-ratio.
The actual buffer width/height should be passed to gst_d3d11_window_render(),
instead of the calculated resolution. The width/height
values are parameters for copying d3d11 video memory.
Also, aspect-ratio should be considered on resize callback
to decide render rectangle size.
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
Currently h264parser produces a field or a frame for
alignment=au for interlaced streams, but the flag
MFX_BITSTREAM_COMPLETE_FRAME needs a complete frame
or complementary field pair of data, this results in
broken images being output.
Some patches have been sent out to fix h264parser,
but they are pending on some unfinished work. In
order to make gstreamer-msdk decoding work properly
for interlaced streams before h264parser is fixed,
this flag will be removed temporarily and will be
added back once h264parser if fixed.
Related to:
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/399https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/228
Instead of using the information we stored ourselves for the video frame
itself. Which was also the wrong one: it was the mode from the property,
not the autodetected one.
This fixes vanc extraction with mode=auto
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
New object and helper functions can remove duplicated code
from nvenc/nvdec. Also this is prework for CUDA device context sharing
among nvdec(s)/nvenc(s).
We don't support negotiation with downstream but simply set caps based
on the buffers we receive. This prevents renegotiation to other formats,
and negotiation to NTSC in mode=auto in the beginning until the first
buffer is received.
As side-effect of this, also remove various other caps handling code
that was working around the behaviour of the default
BaseSrc::negotiate().
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
Encoded bitstream might not have valid framerate. If upstream
provided non-variable-framerate (i.e., fps_n > 0 and fps_d > 0)
use upstream framerate instead of parsed one.
Encoding thread is terminated without any notification so
upstream streaming thread is locked because there is nothing
to pop from GAsyncQueue. If downstream returns error,
we need put SHUTDOWN_COOKIE to GAsyncQueue for chain function
can wakeup.