all msdk output surfaces come from out_alloc_resp, so the buffer count is not resizable.
we need set min_buffers, max_buffers to same size.
steps to reproduce
1. ffmpeg -f lavfi -i testsrc=duration=10:size=320x240:rate=30:decimals=3 -pix_fmt yuv420p -c:v libx265 ~/bits/hevc/test.265
2. GST_GL_PLATFORM=egl gst-launch-1.0 -v filesrc location=~/bits/hevc/test.265 ! h265parse ! msdkh265dec ! msdkvpp ! queue ! glimagesink
you will see error like this:
ERROR default gstmsdkvideomemory.c:77:gst_msdk_video_allocator_get_surface: failed to get surface available
ERROR msdkbufferpool gstmsdkbufferpool.c:270:gst_msdk_buffer_pool_alloc_buffer:<msdkbufferpool2> failed to create new MSDK memory
ERROR msdkvpp gstmsdkvpp.c:297:create_output_buffer:<msdkvpp0> failed to create output video buffer
ERROR msdkdec gstmsdkdec.c:699:gst_msdkdec_finish_task:<msdkh265dec0> Failed to finish frame
ERROR msdkdec gstmsdkdec.c:1085:gst_msdkdec_handle_frame:<msdkh265dec0> Failed to finish a task
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1278>
Add MediaFoundation AAC encoder element.
Before Windows 10, mono and stereo channels were supported audio channels
configuration by AAC encoder MFT. However, on Windows 10,
5.1 channels support was introduced.
To expose correct range of support format by this element
whatever the OS version is, this element will enumerate
all the supported format by the AAC encoder MFT
and then will configure sink/src templates while plugin init.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1280>
If parent and child windows are running on different thread,
there is always a chance to cause deadlock as DefWindowProc() call
from child window thread might be blocked until the message
is handled by parent's window procedure.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1299>
GObject::dispose method can be called multiple times. As win32 d3d11window
has an internal thread and because GObject::dispose method could be called from the
thread, it might cause problems such as trying to join self-thread
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1299>
New video capture implementation using WinRT Media APIs for UWP app.
Due to the strict permission policy of UWP, device enumeration and
open should be done via new WinRT APIs and to get permission from users,
it will invoke permission dialog on UI.
Strictly saying, this implementation is not a part of MediaFoundation
but structurally it's very similar to MediaFoundation API.
So we can avoid some code duplication by adding this implementation
into MediaFoundation plugin.
This implementation requires UniversalApiContract version >= 6.0
which is part of Windows 10 version 1803 (Redstone 4)
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1241>
There was a missing break for the 4:4:4 case which would break the sizeimage
calculation. We don't currently have hardware that supports 4:4:4, so this
code wasn't tested. This was detected by Coverity.
CID 1463592 1463591
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1283>
The Cedrus driver would otherwise choose 1KB buffer, which is too small.
This follows what some drivers do, which is simply to use the size a
packed raw image would have. Specifications do not really guaranty any minimum
compression ratio.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1268>
This adds support for slice based decoder like the Allwinner/Cedrus driver. In
order to keep things efficient, we hold the sink buffer until we reach the end
of the picture. Note that as we don't know which one is last, we lazy queue the
slices. This effectively introduces one slice latency.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1268>
GstD3D11ColorConverter implementation is able to rescale video as well.
By doing colorspace conversion and rescale at once, we can save
one cycle of shader pipeline which will can save GPU resource.
Since this element can support color space conversion and rescale,
it's renamed as d3d11convert
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1275>
This commit does the following things to fix compilation on FreeBSD:
1. Add required typedefs to linux/types-compat.h.
2. Remove unnecessary include linux/ioctl.h and replace linux/types.h
with linux/types-compat.h. Both files do not exist on FreeBSD.
3. Check the header including makedev macro. FreeBSD does not have
sys/sysmacros.h, and including it unconditionally causes error.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1259>
else gst_video_meta_validate_alignment will report error like
"videometa gstvideometa.c:416:gst_video_meta_validate_alignment: Stride of plane 0 defined in meta (384) is different from the one computed from the alignment (320)"
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1224>
In section 9.3.4 a), segment_feature_mode have 0 for absolute and 1 for delta,
while in 19.2, it says the opposite. But the reference code, which usually
rules over the text state that 1 means absolute:
if (hdr->update_data)
{
hdr->abs = bool_get_bit(bool);
And uses it with that meaning to decide weither to override the existing value
or just add the detla. This fixes multiple decoding issues.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1245>
D3D11_CREATE_DEVICE_DEBUG flag will be used while creating d3d11 device
to activate debug layer. However, if system doesn't support the
debug layer for some reason, we should try to create d3d11 device
without the flag. Debug layer should be optional for device creation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1217>
The Microsoft Media Foundation (MF) is the successor of DirectShow.
This commit includes two kinds of video capture implementation,
one uses IMFSourceReader interface which is available since Windows Vista
and the other is based on IMFCaptureEngine interface which is available
since Windows 8.
Note that this new video source element cannot be used in UWP app
for now, since device activation using those APIs are not allowed by MS.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/760>
Introduce GST_USE_NV_STATELESS_CODEC environment to allow user to select
primary nvidia decoder implementation. In case the environment
GST_USE_NV_STATELESS_CODEC=h264 was set, old nvidia h264 decoder element
will not be registered. Instead, both nvh264dec and nvh264sldec
factory name will create gstcodecs based new nvidia h264 decoder element.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1198>
Introduce GstH264Decoder based Nvidia H.264 decoder element.
Similar the element factory name of to v4l2 stateless codec,
this element can be configured with factory name "gstnvh264sldec".
Note that "sl" in the name stands for "stateless"
For now, existing nvh264dec covers more profile and formats
(e.g., interlaced stream) than this implementation.
However, this implementation allows us to control lower level
parameters such as decoded picture buffer management and therefore
we can get a chance to improve performance in terms of latency.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1198>
The render width/height and the vinfo was only saved upon renegotiation. This
fixes the problem by saving this metadata at the same time the buffer is
saved. The saved copy of this is needed for expose() and drain() virtual functions.
This fixes various assertion that happens on drain query.
in C, & is weaker than the ! operator and clang is giving the following
error about it.
```
../subprojects/gst-plugins-bad/sys/msdk/gstmsdkdec.c:731:7: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses]
if (!gst_msdk_context_get_job_type (thiz->context) & GST_MSDK_JOB_DECODER) {
^ ~
../subprojects/gst-plugins-bad/sys/msdk/gstmsdkdec.c:731:7: note: add parentheses after the '!' to evaluate the bitwise operator first
if (!gst_msdk_context_get_job_type (thiz->context) & GST_MSDK_JOB_DECODER) {
^
( )
../subprojects/gst-plugins-bad/sys/msdk/gstmsdkdec.c:731:7: note: add parentheses around left hand side expression to silence this warning
if (!gst_msdk_context_get_job_type (thiz->context) & GST_MSDK_JOB_DECODER) {
^
( )
1 error generated.
```
Too many decode surface would waste GPU memory. Also it seems to be
introducing additional latency depending on stream. Since nvcodec
sdk version 9.0, CUVID parser API has been providing the minimum
required number of surface. By using it, we can save GPU memory
and reduce possible latency.
We should directly check the values of the `debug` and `optimization`
options instead.
`get_option('buildtype')` will return `'custom'` for most combinations
of `-Doptimization` and `-Ddebug`, but those two will always be set
correctly if only `-Dbuildtype` is set. So we should look at those
options directly.
For the two-way mapping between `buildtype` and `optimization`
+ `debug`, see this table:
https://mesonbuild.com/Builtin-options.html#build-type-options
This code add required mechanism to try and allocate (not implemented yet)
otherwise wait for more buffers. This also comes with mechanism to terminate
the wait on flush or PAUSED_TO_READY transitions.
Use a goto to ensure that for all cases we cleanup the current picture state.
And move the src buffer allocation higher, so we don't queue a bitstream
buffer if we don't have a picture buffer to decode into.
This allow negotiating the output format through caps. Some drivers can
pipeline the decoder buffer through an image processor. This only support
colorspace conversion for now.
This implements driver stride support but only for single allocation buffers.
This code is imported from the original v4l2 plugin and adapted to the new
helper context.
In some case, when downstream does not support GstVideoMeta, we need to
normalize the stride and offset of the buffer so that downstream can render
properly with a GstVideoMeta. This code is not called when GstVideoMeta is
supported downstream.
In this patch we strictly set the GstVideoMeta width/height to the coded width
and height. Further patches will add stride support and frame copying when
downstream does not support GstVideoMeta.
Don't let downstream cause a renegotiation at random point in time. This would
lead to spurious renegotiation and the decoder state may not be recoverable.
We now pass the controls, associated to a request, queue the bitstream, qeueue
a picture buffer to decode into and finally queue the request. This now runs
until the buffer pool is exhausted. The next step will be to dequeue.
In this patch we fill the control structure with the bitstream paramter and
copy the bitstream data into V4L2 memory. Slice paramters are only the subset
of what Hantro needs, without any support for interlaced content.
This is a pooling allocator and the buffer pool does nothing other then
reusing the GstBuffer structure. Note that the pool is an internal pool, so
the start/stop/set_config virtual functions are not implemented.
This introduces the skeleton of the H264 decoder. The plugin will list the
devices and register a subclass of the GstV4L2CodecH264Dec base class. The
subclass will pick the required specific information from the GstV4L2Device
stored in the subclass structure.
This is a GstObject which will be used to hold on media and video device file
descriptor and provide abstracted ioctl calls with these descriptor. At the
moment this helper contains just enough to enumerate the supported format.
This part will be used by the plugin to register the CODEC specific elements..
Most of the features we need are very early or not expose yet in the uAPI.
Using an internal copy ensure that we everything we need is defined avoiding
to add load of checks and conditionnal code.
This introduces a GstV4L2CodecDevice structure and helper to retrieve a
list of CODEC device drivers. In order to find the device driver we
enumerate all media devices with UDEV. We then get the media controller
topology and locate a entity with function encoder or decoder and make
sure it is linked to two V4L2 IO entity pointing to the same device
node.
The media driver can support HEVC 8-bit 422 encoding for non-lowpower
mode since ICL[1], so VPP is not needed for this case.
Sample pipeline:
gst-launch-1.0 videotestsrc ! video/x-raw,format=YUY2 ! msdkh265enc ! \
filesink location=output.h265
[1] https://github.com/intel/media-driver#decodingencoding-features
DXVA supports two kinds of texture structure for DPB, one is
"1) texture array" and the other is "2) array of texture".
1) is a type of texture which is single ID3D11Texture2D object having
ArraySize greater than one. So the ID3D11Texture2D itself is a set of texture.
Each sub texture of this type mush have identical resolution, format and so on,
and the number of sub texture in a texture array is fixed.
2) is an array of usual ID3D11Texture2D object. That means each
ID3D11Texture2D is independent each other and might have different resolution as well.
Moreover, we can modify the number of frames of the array dynamically.
This type is more flexible than "1) texture array" in terms of dynamic
behavior and also this type of texture can be used for shader resource view
but "1) texture array" couldn't be.
If "2) array of texture" is supported by driver, DXVA spec is saying that
it's preferred format over "1) texture array" in terms of performance.
The set of supported color space by DXGI is not full combination of
our colorimetry. That means we should convert color space to one
of supported color space by DXGI. This commit modifies the color space
selection step so that d3d11window can find the best matching DXGI color space
first and then the selected input/output color space will be referenced
by shader and/or d3d11videoprocessor.
Adds properties to the devices listed in GstDeviceMonitor by the
applemedia plugin.
These properties are:
- device.api (always set to "avf")
- avf.unique_id
- avf.model_id
- avf.manufacturer (except on iOS)
- avf.has_flash
- avf.has_torch
Everything except device.api is taken directly from the AVCaptureDevice object
provided by AVFoundation.
VP9 codec allows resizing reference frame by spec. Handling this case
is a bit tricky especially when the resizing happens on non-keyframe,
because pre-allocated decoder textures (i.e., dpb) have negotiated
resolution and to change resolution meanwhile decoding on non-keyframe,
each texture might need to be re-created, copied to new dpb somehow,
and re-negotiated with downstream.
Due to the complicated requirement of negotiation driven
resizing handling, this commit adds shader into d3d11decoder object
to resize only corresponding frames. Note that if the resolution change
is detected on keyframe, decoder will re-negotiate with downstream.
Not only any textures for decoder output view, any destination texture
which would be copied from decoder output texture need to be aligned too.
Otherwise driver sometimes crashed/hung (not sure why).
Resolution of NV12, P010, and P016 formats must be multiple of two.
Otherwise texture cannot be created. Instead of doing this alignment
per API consumer side, do this in buffer pool for simplicity.
Now that the system_frame_number is saved on the pictures we can use
gst_video_decoder_get_frame() helper instead of getting the full list
and looping over it.
On new_segment, the decoder is expected to negotiate. The decoder may want to
pre-allocate the needed buffers. Pass the max_dpb_size as this is needed to
determin how many buffers should be allocated.
This introduce a library which contains a set of base classes which
handles the parsing and the state tracking for the purpose of decoding
different CODECs. Currently H264, H265 and VP9 are supported. These
bases classes are used to decode with low level decoding API like DXVA,
NVDEC, VDPAU, VAAPI and V4L2 State Less decoders. The new library is
named gstreamer-codecs-1.0 / libgstcodecs.
This commit moves parsing code for superframe and frame header into
handle_frame() method, and removes parse() implementation from vp9decoder
baseclass.
The combination of
- multiple frames are packed in a given input buffer (i.e., superframe)
- reverse playback
seems to be complicated and also it doesn't work as intended in some case
The most common audio sample rate in AV streams is 48kHz, and the most
common device output sample rate is 48kHz. This allows handing of 48kHz
input streams without resampling.
Remove comments about avoiding the use of 48kHz.
This change is needed to support 2K DCI video modes.
Version 10.8 of the Decklink SDK supported DCI video modes for output
only. This updated version drops that restriction.
The current latest version of the Decklink SDK is 11.5, however
the gstreamer decklink plugin is not compatible with API changes
introduced in version 11 of the SDK. Therefore I have opted to upgrade
to the latest 10.x version instead.
* Remove redundant variables for width/height and par from GstD3D11Window.
GstVideoInfo holds all the values.
* Don't need to pass par to gst_d3d11_window_prepare().
It will be parsed from caps again
* Remove duplicated math
Fixing regression of the commit 9dada90108
gst_d3d11_result() will print warning message when HRESULT != S_OK.
However, since the retry is trivial stuff, check hr == E_PENDING first
and do not warn it.
The DXGI_PRESENT_ALLOW_TEARING flag might cause unexpected tearing
side effect. Setting it in fullscreen mode only seems to be
the correct usage as in the Microsoft's direct3d examples.
DXVA spec is saying that the size of bitstream buffer provided by hardware decoder
should be 128 bytes aligned. And also the host software decoder should
align the size of written buffer to 128 bytes. That means if the slice
(or frame in case of VP9) size is not aligned with 128 bytes,
the rest of non 128 bytes aligned memory should be zero-padded.
In addition to aligning implementation, some variables are renamed
to be more intuitive by this commit.
This implementation is similar to what we've done for nvcodec plugin.
Since supported resolution, profiles, and formats are device dependent ones,
single template caps cannot represent them, so this modification
will help autoplugging and fallback.
Note that the legacy gpu list and list of resolution to query were
taken from chromium's code.
gst_video_frame_copy will copy input frame to stating texture
of fallback frame. Then, we need to map fallback texture with GST_MAP_D3D11
flag to upload the staging texture to render texture. Otherwise
the render texture wouldn't be updated.
Source texture (decoder view) might be larger than destination (staging) texture.
In that case, D3D11_BOX structure should be passed to CopySubresourceRegion method
in order to specify the exact target area.
DXGI_SWAP_EFFECT_DISCARD cannot be used with dirty rect drawing feature
of IDXGISwapChain1::Present().
Note that IDXGISwapChain1 interface is available on Platform Update for Windows 7
and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL is also the case.
Use resolution specified in caps for input_rect instead of
passed width and height value. The width and height might be modified
ones by d3d11videosink, then frame resolution might be different.
* Move decoding process to handle_frame
* Remove GstVideoDecoder::parse implementation
* Clarify flush/drain/finish usage
In forward playback case, have_frame() call will be followed by
handle_frame() but reverse playback is not the case.
To ensure GstVideoCodecFrame, the decoding process should be placed inside
of handle_frame(), instead of parse().
Since we don't support alignment=nal, the parse() implementation is not worth.
In order to fix broken reverse playback, let's remove the parse()
implementation and revisit it when adding alignment=nal support.
... and remove unused start, stop method from subclass.
Current implementation does not require subclass specific behavior
for the handle_frame() method.
Actually our buffer pool size and the number of backbuffer are
independent. In case of reverse playback, upstream might request
a lot of buffers (up to GOP size).
The class data with the caps in it will be leaked if the element is
registered but never instantiated. There is no way around this. Mark
the caps as such so that the leaks tracer does not warn about it.
This is the same as pad template caps getting leaked, which are also
marked as may-be-leaked. These objects are initialized exactly once,
and are 'global' data.
video/x-vp9 is required in the src pad, however the output includes a
IVF header, which makes the pipeline below doesn't work
gst-launch-1.0 videotestsrc ! msdkvp9enc ! msdkvp9dec ! fakesink
Since mfx 1.26, the VP9 encoder supports bitstream without IVF header,
so in this patch, the mfx version is checked and msdkvp9enc is enabled
only if mfx 1.26+ is available
Android 25 added support for i-frame-interval to be a floating
point value. Store the property as a float and use the newer
version when it's available.
Android 19 added an API for dynamically changing the bitrate in a running
codec.
Also make it so that even when not update-able at runtime, parameters will at least
be stored so that they take effect the next the codec is restarted.
d3d11window holds one buffer to redraw client area per resize event.
When the input format is being changed, this buffer should be cleared
to avoid mismatch beween newly configured shader/videoprocessor and
the format of previously cached buffer.
Because the size of texture array cannot be updated dynamically,
allocator should block the allocation request. This cannot be
done at buffer pool side if this d3d11 memory is shared among
multiple buffer objects. Note that setting NO_SHARE flag to
d3d11 memory is very inefficient. It would cause most likey
copy of the d3d11 texture.
...for color space conversion if available
ID3D11VideoProcessor is equivalent to DXVA-HD video processor
which might use specialized blocks for video processing
instead of general GPU resource. In addition to that feature,
we need to use this API for color space conversion of DXVA2 decoder
output memory, because any d3d11 texture arrays that were
created with D3D11_BIND_DECODER cannot be used for shader resource.
This is prework for d3d11decoder zero-copy rendering and also
for conditional HDR tone-map support.
Note that some Intel platform is known to support tone-mapping
at the driver level using this API on Windows 10.
We've been using NvEncodeAPICreateInstance method to find the supported API
version, but that seems to be insufficient since there is a case
where plugin failed in opening encoding session even if NvEncodeAPICreateInstance
succeeded. Asking driver about the version would be the most certain way.
User is seeing corrupted display when running `videotestsrc !
video/x-raw,format=NV12,width=xxx,height=xxx ! msdkh265enc ! msdkh265dec
! glimagesink` with changed frame size, e.g. from 1920x1080 to 1920x240
The root cause is a same dmabuf fd is used for frames with
different size, which causes some unexpected result. This patch requires
cached response is used for frames with same size only for DMABuf, so a
dmabuf fd can't be used for frames with different size any more.
Don't specify the resolution of backbuffer. Then dxgi will let us know the
actual client area. When upstream resolution is chagned, updating the size
of backbuffer without the consideration for client size would cause mismatch
between them.
Setting the CUVID_PKT_DISCONTINUITY implies clearing any past information
about the stream in the decoder. The GStreamer discont flag is used for
discontinuity caused by a seek, for first buffer and if a buffer was
dropped. In the first two cases, the parsers and demuxers should ensure we
start from a synchronization point, so it's unlikely that delta will be
matched against the wrong state.
For packet lost, the discontinuity flag will prevent the decoder from doing
any concealment, with a result that ca be much worst visually, or freeze the
playback until an IDR is met. It's better to let the decoder handle that for
us.
Removing this flag, also workaround a but in NVidia parser that makes it
ignore our ENDOFFRAME flag and increase the latency by one frame.
This sets the CUVID_PKT_ENDOFPICTURE flag in order to inform the decoder that
we have a complete picture. This should remove one frame latency otherwise
introduce by NVidia parser.
This patch fixed compiler warning below:
[1/4] Compiling C object 'sys/msdk/dc44ea0@@gstmsdk@sha/gstmsdkvpp.c.o'.
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c: In function
‘gst_msdkvpp_context_prepare’:
../../gst-plugins-bad/sys/msdk/gstmsdkvpp.c:214:7: warning: suggest
parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’ to ‘~’
[-Wparentheses]
Our context is non-persistent, and we propagate it throughout the
pipeline. This means that if we try to reuse any gstmsdk element by
removing it from the pipeline and then re-adding it, we'll clone the
mfxSession and create a new gstmsdk context as a child of the old one
inside `gst_msdk_context_new_with_parent()`.
Normally this only allocates a few KB inside the driver, but on
Windows it seems to allocate tens of MBs which leads to linearly
increasing memory usage for each PLAYING->NULL->PLAYING state cycle
for the process. The contexts will only be freed when the pipeline
itself goes to `NULL`, which would defeat the purpose of dynamic
pipelines.
Essentially, we need to optimize the case in which the element is
removed from the pipeline and re-added and the same context is re-set
on it. To detect that case, we set the context on `old_context`, and
compare it to the new one when preparing the context. If they're the
same, we don't need to do anything.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/946
Split it out into a separate function with early exits to make the
flow clearer, and document what the function is doing clearly.
No functional changes.
We weren't using the correct calling convention when calling CUDA and
CUVID APIs. `CUDAAPI` is `__stdcall` on Windows. This was working fine
on x64 because `__stdcall` is ignored and there's no special calling
convention. However, on x86, we need to use `__stdcall`.
Before this change decoder used the oldest frame in the list to pair it
with the decoded surface. This only works when there's a perfect stream
like HEADERS,SYNCPOINT,DELTA...
When playing RTSP streams we can get imperfect streams like HEADERS,
DELTA,SYNCPOINT,DELTA... In this case decoder drops the frames
between HEADERS and SYNCPOINT which leads into using wrong PTS on
the output frames.
With this change we inject the input PTS into the bitstream and use it
to align the internal frame list with the actually decoded position.
Fixes playback with:
```
gst-launch-1.0 rtspsrc location=... latency=0 drop-on-latency=1 ! ...
```
Hard-coded 16x16 resolution is likely to differ from the device's support
in most cases. If we can use NV_ENC_CAPS_WIDTH_MIN and NV_ENC_CAPS_HEIGHT_MIN,
update pad template with returned value.
need_reconfig is added to allow sub class requires a reconfig when
the input frame or the MetaData (e.g. GstVideoRegionOfInterestMeta)
attached to the input frame is changed.
`pipe()` isn't used since 15927b6511,
and `socketpair()` from `#include <sys/socket.h>` is used only in the
examples. In practice, you can use probably also use anything that
allows you to create fd pairs, such as named pipes or anonymous pipes.
We use the cross-platform GstPollFD API in the plugin.
Use consistent memory layout between dxva and other shader use case.
For example, use DXGI_FORMAT_NV12 texture format instead of
two textures with DXGI_FORMAT_R8_UNORM and DXGI_FORMAT_R8G8_UNORM.
This reverts commit ddd13fc7c0
Dynamic usage can reduce the number of copy per frame but make
things complicated and the benefit seems to not significant.
Also since we don't provide _map() method for the dynamic usage,
application cannot read buffers which make "last-sample" property
unusable in case of d3d11videosink.
Commit a1584b6 caused big performance drop if the downstream element
is not a msdk element because it is very slow to read data from video
memory directly.
This reverts commit a1584b6f99.
If 8 bit are required by the device/mode then it will be converted internally
by the SDK, but the SDK won't automatically convert from 8 to 10 bit. As
such, always use 10 bit VANC.
Some devices require configuring also a 10 bit video format when using
10 bit VANC is required but those would fail regardless and the
application would have to configure the correct video format.
With newer versions of the SDK this information can be retrieved via the
BMDDeckLinkVANCRequires10BitYUVVideoFrames flag but we don't use a new
enough SDK version yet to extract this information.
Although the target platform of D3D11 decoding API are both desktop and UWP app,
DXVA header is blocked by "WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)"
which is meaning that that's only for desktop app.
To workaround this inconsistent annoyingness, we need to define WINAPI_PARTITION_DESKTOP
regardless of target WinAPI partition.
The codec profile should be consistent with the frame fourcc code, this
fixes pipeline below:
gst-launch-1.0 videotestsrc ! \
video/x-raw,width=320,height=240,format=P010_10LE ! msdkvp9enc ! \
fakesink
The frame width and height is rounded up to 128 and 32 since commit
8daac1c, so the width, height for initialization should be rounded up to
128 and 32 too because the MSDK VP9 encoder will do some check on width
and height.
Sample pipeline:
gst-launch-1.0 videotestsrc ! \
video/x-raw,width=320,height=240,format=NV12 ! msdkvp9enc ! fakesink
Renegotiation was implemented for bitrate change. We can re-use
the same sequence when video info changes except that this can be
executed right away when receiving the new input format. I.e. no
need to wait for the next call to handle_frame.
The block that sets use_video_memory flag is after the
the condition `if gst_msdk_context_prepare` but it
always returns false when there is no other msdk elements.
So the decoder ends up with use_video_memory as FALSE.
Note that msdkvpp always set use_video_memory as TRUE.
When use_video_memory is FALSE then the msdkdec allocates
the output frames with posix_memalign (see gstmsdksystemmemory.c).
The result is then copied back to the GstVideoPool's buffers
(or to the downstream pool's buffers if any).
When use_video_memory is TRUE then the msdkdec uses vaCreateSurfaces
to create vaapi surfaces for the hw decoder to decode into
(see gstmsdkvideomemory.c). The result is then copied to either
the internal GstVideoPool and to the downstream pool if any.
(vaDeriveImage/vaMapBuffer is used in order to read the surfaces)
Use boolean instead of GstFlowReturn as declared.
Note that since base class does not check return value of GstVideoDecoder::flush(),
this would not cause any change of behavior.
https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/merge_requests/924
is trying to use video memory for decoding on Linux, which reveals a
hidden bug in msdkdec.
For video memory, it is possible that a locked mfx surface is not used
indeed and it will be un-locked later in MSDK, so we have to check the
associated MSDK surface to find out and free un-used surfaces, otherwise
it is easy to exhaust all pre-allocated mfx surfaces and get errors below:
0:00:00.777324879 27290 0x564b65a510a0 ERROR default
gstmsdkvideomemory.c:77:gst_msdk_video_allocator_get_surface: failed to
get surface available
0:00:00.777429079 27290 0x564b65a510a0 ERROR msdkbufferpool
gstmsdkbufferpool.c:260:gst_msdk_buffer_pool_alloc_buffer:<msdkbufferpool0>
failed to create new MSDK memory
Note the sample code in MSDK does similar thing in
CBuffering::SyncFrameSurfaces()
A ID3D11Texture2D memory can consist of multiple planes with array.
For array typed memory, GstD3D11Allocator will allocate new GstD3D11Memory
with increased reference count to the ID3D11Texture2D but different array index.
Even if one of downstream d3d11 elements can support dynamic-usage memory,
another one might not support it. Also, to support dynamic-usage,
both upstream and downstream d3d11device must be the same object.
If d3d11colorconvert element is configured, do color space conversion
regardless of the device type whether it's S/W emulation or real H/W.
Since d3d11colorconvert is no more a child of d3d11videosinkbin,
we don't need this behavior. Note that previous code was added to
avoid color space conversion from d3d11videosink if no hardware
device is available (S/W emulation of d3d11 is too slow).
d3d11upload should be able to support upstream d3d11 memory, not only system memory.
Fix for following pipeline
d3d11upload ! "video/x-raw(memory:D3D11Memory)" ! d3d11videosink
borderless top-most style full screen mode support.
Basically fullscreen toggle mode is disabled by default. To enable it
use "fullscreen-toggle-mode" property to allow fullscreen mode change
by user input and/or property.
In some cases, rendering and dxgi (e.g., swapchain) APIs should be
called from window message pump thread, but current design (dedicated d3d11 thread)
make it impossible. To solve it, change concurrency model to locking based one
from single-thread model.
In earlier implementation of d3d11videosink where no shader was implemented,
the aspect ratio and render size were adjusted by manipulating the backbuffer size
with unintuitive formula. Since now we do color conversion and resize using
shader, we can remove the hack.
window event queue now does not lock on the class lock, so we can now shut
it down without releasing the class lock, thus avoiding a potential race when
stopping the sink.
... and use SetParent() WIN32 API when external window is used.
Depending on DXGI swap effect, the external window might not be
reusable by another backend. To preserve the external window's property
and setting, drawing to internal window seems to be safer way.
Otherwise GstVideoDecoder is not finalized and
resources are leaked.
Somehow GST_TRACERS="leaks" GST_DEBUG="GST_TRACER:7" did not catch it.
Valgrind output:
==31645== 22,480 (1,400 direct, 21,080 indirect) bytes in 5 blocks are definitely lost in loss record 5,042 of 5,049
==31645== at 0x4C2FB0F: malloc
==31645== by 0x51D9E88: g_malloc
==31645== by 0x51FA7B5: g_slice_alloc
==31645== by 0x51FAC68: g_slice_alloc0
==31645== by 0x58D9984: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
==31645== by 0x58BADA0: g_object_new
==31645== by 0x8ECA966: gst_video_decoder_init
==31645== by 0x58D99E7: g_type_create_instance
==31645== by 0x58BA344: g_object_new_with_properties
An issue can be seen when using msdkh265enc with bitrate change in
playing state. The root cause is the corresponding plugin is loaded
again.
Returning MFX_ERR_UNDEFINED_BEHAVIOR from MSDK just means the plugin has
been loaded, so we may ignore this error when doing configuation again
in the sub class, otherwise the pipeline will be interrupted
If d3d11window does not convert format internally, shader resource view
is not required. Note that shader resource view is used for
color conversion using shader but when conversion is not required,
we just copy input input texture to backbuffer.
In theory it should not happen but it happened to me
in some cases where it failed to allocate some video
buffers so this was a consequence of a corner case.
Better to be safe than sorry.
Can happen if the oldest frame is the current frame
and if gst_video_decoder_finish_frame failed in which
case the current is unref and then drop instead of
just drop.
This patch also removes some assumptions, it was strange
to call unref and finish_frame in gst_msdkdec_finish_task.
In principle when owning a frame, the code should either
unref, or drop or finish.
D3D11 dynamic texture is a special memory type, which is mainly used for
frequent CPU write access to the texture. For now, this texture type
does not support gst_memory_{map,unmap}
* Create staging texture only when the CPU access is requested.
Note that we should avoid the CPU access to d3d11 memory as mush as possible.
Incoming d3d11upload and d3d11download will take this GPU memory upload/download.
* Upload/Download texture memory from/to staging only if it needed, similar to
GstGL PBO implementation.
* Define more dxgi formats for future usage (e.g., color conversion, dxva2 decoder).
Because I420_* formats are not supported formats by dxgi, each plane should
be handled likewise GstGL separately, but NV12/P10 formats might be supported ones.
So we decide the number of d3d11memory per GstBuffer for video memory depending on
OS version and dxgi format. For instance, if NV12 is supported by OS,
only one d3d11memory with DXGI_FORMAT_NV12 texture can be allocated by this commit.
One use case of such texture is DXVA. In case DXVA decoder, it might need to produce decoded data
to one DXGI_FORMAT_NV12 instead of seperate Y and UV planes.
Such behavior will be controlled via configuration of GstD3D11BufferPool and
default configuration is separate resources per plane.
Depending on selected feature level, d3d11 API usage can be different.
Instead of querying the selected feature level by user whenever required,
store it once by d3d11device.
Do not accept any GstD3D11Device context which has different adapter
index from the required one. For example, if a d3d11 element is expecting
d3d11 device with adapter 1 (i.e., the second GPU), any d3d11 device
context having different adapter could not be shared with
the d3d11 element.
Make them consistent with cuda context utils functions.
Put in-only parameter before all in-out parameters, and add _handle()
suffix to native handle getter functions.
In certain cases, the sink's buffer pool will not call the parent's
release_buffer method, so the pool does not clean up properly
after the buffer is released.
Since macOS Mojave (10.14), video permissions have to be explicitly
granted by a user in order to open a video device such as a camera.
This commit adds a check for the current permission status, and tries
to request for permission if applicable.
The whole `src_read()` function is a hot loop since the ringbuffer
thread is waiting on us, and printing to the console from inside it
can easily cause us to miss our deadline.
F.ex., if you had GST_DEBUG=3 and we accidentally missed a device
period, we'd trigger the "reported glitch" warning, which would cause
us to miss another device period, and so on. Let's reduce the log
level so that GST_DEBUG=3 is more usable, and only print buffer flag
info when it's actually relevant.
Some audio drivers return varying amounts of data per ::GetBuffer
call, instead of following the device period that they've told us
about in `src_prepare()`.
Previously, we would just drop those extra buffers hoping that the
extra buffers were temporary (f.ex., a startup 'burst' of audio data).
However, it seems that some audio drivers, particularly on older
Windows versions (such as Windows 10 1703 and older) consistently
return varying amounts of data.
Use GstAdapter to smooth that out, and hope that the audio driver is
locally varying but globally periodic.
Initially reported in https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/808
We were miscalculating the device period, i.e. the number of frames
we'll get from WASAPI in each IAudioClient::GetBuffer call, due to
a calculation mistake (truncate instead of round).
For example, on my machine when the aux input is set to 44.1KHz, the
reported device period is 101587, which comes out to 447.998 frames
per ::GetBuffer call. In reality we will, of course, get 448 frames
per call, but we were truncating, so we expected 447 and were
discarding one frame every time. This led to glitching, and skew over
time.
Interestingly, I can only see this with 44.1Khz. 48Khz/96Khz are fine,
because the device period is a more 'even' number.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/806
The for loop in gst_msdkdec_handle_frame is error prone
about how it manages surfaces. Because sometimes it sets
the surface variable to NULL and sometimes it needs to free
it right away. So better to print an error if surfaces are
leaked to help with any change around the loop.
msdk plugin is not used for sofware encode/decode as there are better
solutions available. Also, with MFX_IMPL_AUTO_ANY, if software decode
is not supported, the plugin will still load, but will then fail when trying to
run the (autoplugged) pipeline. With MFX_IMPL_HARDWARE_ANY,
the plugin fails and a better software decoder is auto-plugged.
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
Can be reproduced with:
videotestsrc ! x264enc key-int-max=$N ! \
h264parse ! msdkh264dec ! fakesink sync=1
It happens with any gop size but the smaller is the distance N
between key frames, the quicker it is leaking.
Fixes#1023
In case of pkg-config we need to create the include directories object
from the path using include_directories(). For INTELMEDIASDKROOT or
MFX_HOME we need to add the alternate include path ./include/mfx as
Intel MediaSDK now puts the headers there.
This adds a check to avoid draining when the imported buffers are in
fact own by kmssink. This happens since we export our kms buffer as
DMABuf. They are not really imported back as we pre-fill the cache,
but uses the same format as if they were external. This fixes
performance issues seen with videocrop2-test (found in -good).
Draining systematically on caps changes was a hack. Instead, properly
save the render information used to render last_render, and use that
information to drain. This fixes performance issues met with video crop
meta and per frame caps changes.
By passing NULL to `g_signal_new` instead of a marshaller, GLib will
actually internally optimize the signal (if the marshaller is available
in GLib itself) by also setting the valist marshaller. This makes the
signal emission a bit more performant than the regular marshalling,
which still needs to box into `GValue` and call libffi in case of a
generic marshaller.
Note that for custom marshallers, one would use
`g_signal_set_va_marshaller()` with the valist marshaller instead.
In future, a sub class of GstMsdkEncClass may decide a native format by
using this method, e.g. JPEG encoder may accept YUY2 input, however the
current implemation needs a conversion from YUY2 to NV12 before encoding.
In addtion, a sub class may choose a format for encoding if the input
format is not supported by MSDK, e.g. the current implemation does
UYVY->NV12 if the input format is UYVY. We may do UYVY->YUY2 for JPEG
encoder in future
MFX_FOURCC_BGR4 is mapped to VA_FOURCC_ABGR and JPEG encoder needs a
MFX_FOURCC_BGR4 frame for internal usage when the input format is
MFX_FOURCC_RGB4
This is a preparation for supporting native formats of JPEG encoder
Instead of using a proxy of `is_packetized` flag this patch
replaces it with the accessor to that flag in decoder base class,
avoiding probable mismatches.
commit 55c0d720 added the capability to handle non-packetized bitstream,
and there is a loop to handle multiple frames in a non-packetized buffer
in gst_msdkdec_handle_frame. However it is possible that a
non-packetized buffer still contains valid data but there is no long any
pending unfinished frame. Currently gst_video_decoder_decode_frame is
invoked to send a new frame with new input data, the situaltion is
repeated till an EOS is received. An application has to exit when
receiving an EOS, however there is still valid data in a
non-packetezied input buffer, hence some frames are dropped.
This fix adds a parse callback for non-packeteized input, a new frame
will be sent to the subclass as soon as the input buffer has valid data
This fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/665