Output texture of d3d11 decoder cannot have the bind flag
D3D11_BIND_SHADER_RESOURCE (meaning that it cannot be used for shader
input resource). So d3d11convert (and it's subclasses) was copying
texture into another internal texture to use d3d11 shader.
It's obviously overhead and we can avoid texture copy for
colorspace conversion or resizing via ID3D11VideoProcessor
as it supports decoder output texture.
This commit would be a visible optimization for d3d11 decoder with
d3d11compositor use case because we can avoid texture copy per frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1718>
GstMemory object could be disposed if GstBuffer is not allocated
by GstD3D11BufferPool such as via gst_buffer_copy() and/or
gst_buffer_make_writable(). So attaching qdata on GstMemory
object would cause unnecessary view alloc/free.
By using view pool which is implemented in GstD3D11Allocator,
we can avoid redundant view alloc/free.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1716>
In order to know the chroma format, besides profile, subsampling_x and
subsampling_y are needed (Spec 7.2.2 Color config semantics). These values are
in GstVp9Parser but not in GstVp9Framehdr.
Also, bit_depth is available in parser but not frame header. Evenmore, those
values are copied to picture structure later.
In case of VA-API, to configure the pipeline, it is require to know the chroma
format and depth.
It is possible to know chroma and depth through caps coming from vp9parser, but
it requires string parsing. It would be less error prone to get these values
through the parser structure at new_sequence() virtual method.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1700>
Add new video composition element which is equivalent to compositor
and glvideomixer elements. When d3d11 decoder elements are used,
d3d11compositor can do efficient graphics memory handling
(zero copying or at least copying memory on GPU memory space).
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
New d3d11colorconvert and d3d11scale elements will perform only
colorspace conversion and rescale, respectively. Those new elements
would be useful when only colorspace conversion or rescale is required
and the other part should be done by another elements.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1323>
Staging texture is used for memory transfer between system and
gpu memory. Apart from d3d11{upload,download} elements, however,
it should happen very rarely.
Before this commit, d3d11bufferpool was allocating at least one
staging texture in order to calculate cpu accessible memory size,
and it wasn't freed for later use of the texture unconditionally.
But it will increase system memory usage. Although GstD3D11memory
object is implemented so that support CPU access, most memory
transfer will happen in d3d11{upload,download} elements.
By this commit, the initial staging texture will be freed immediately
once cpu accessible memory size is calculated.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1627>
Note that newly added formats (YUY2, UYVY, and VYUY) are not supported
render target view formats. So such formats can be only input of d3d11convert
or d3d11videosink. Another note is that YUY2 format is a very common
format for hardware en/decoders on Windows.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1581>
If the run_async() method is expected to be called from streaming
thread and not from application thread, use INFINITE as timeout value
so that d3d11window can wait UI dispatcher thread in any case.
There is no way to get a robust timeout value from library side.
So the fixed timeout value might not be optimal and therefore
we should avoid it as much as possible.
Rule whether a timeout value can be INFINITE or not is,
* If the waiting can be cancelled by GstBaseSink:unlock(), use INFINITE.
GstD3D11Window:on_resize() is one case for example.
* Otherwise, use timeout value
Some details are, GstBaseSink:start() and GstBaseSink:stop() will be called
when NULL to READY or READY to NULL state change, so there will be no
chance for GstBaseSink:unlock() and GstBaseSink:unlock_stop()
to be called around them. So there is no other way then timeout way.
GstD3D11Window:consturcted() and GstD3D11Window:unprepare() are the case.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1461>
All subclasses are retrieving list to get target output frame, which
can be done by baseclass. And pass the ownership of the GstH264Picture
to subclass so that subclass can clear implementation dependent resources
before finishing the frame.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1449>
We don't need to duplicate a method for HRESULT error code to string
conversion. This patch is intended to
* Remove duplicated code
* Ensure FormatMessageW (Unicode version) and avoid FormatMessageA
(ANSI version), as the ANSI format is not portable at all.
Note that if "UNICODE" is not defined, FormatMessageA will be aliased
as FormatMessage by default.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1442>
Current shader code is not compatible with HLSL profile "ps_4_0_level_9_3"
or lower. So d3dcompiler cannot compile our shader code in that case.
Note that VirtualBox is one known driver which doesn't support currently
implemented shader code.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1343>
d3d11videosink has an advantage over d3dvideosink, such as
* Zero-copy playback with d3d11 decoders
* HDR rendering with 10-bit format/swapchain support
* UWP support
* Any system memory alignment/padding can be supported
* User can select target GPU device
And old d3dvideosink's functionality (e.g., navigation event, overlaycomposition)
can be covered by d3d11videosink
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1311>
If parent and child windows are running on different thread,
there is always a chance to cause deadlock as DefWindowProc() call
from child window thread might be blocked until the message
is handled by parent's window procedure.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1299>
GObject::dispose method can be called multiple times. As win32 d3d11window
has an internal thread and because GObject::dispose method could be called from the
thread, it might cause problems such as trying to join self-thread
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1299>
GstD3D11ColorConverter implementation is able to rescale video as well.
By doing colorspace conversion and rescale at once, we can save
one cycle of shader pipeline which will can save GPU resource.
Since this element can support color space conversion and rescale,
it's renamed as d3d11convert
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1275>
D3D11_CREATE_DEVICE_DEBUG flag will be used while creating d3d11 device
to activate debug layer. However, if system doesn't support the
debug layer for some reason, we should try to create d3d11 device
without the flag. Debug layer should be optional for device creation.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1217>
We should directly check the values of the `debug` and `optimization`
options instead.
`get_option('buildtype')` will return `'custom'` for most combinations
of `-Doptimization` and `-Ddebug`, but those two will always be set
correctly if only `-Dbuildtype` is set. So we should look at those
options directly.
For the two-way mapping between `buildtype` and `optimization`
+ `debug`, see this table:
https://mesonbuild.com/Builtin-options.html#build-type-options
DXVA supports two kinds of texture structure for DPB, one is
"1) texture array" and the other is "2) array of texture".
1) is a type of texture which is single ID3D11Texture2D object having
ArraySize greater than one. So the ID3D11Texture2D itself is a set of texture.
Each sub texture of this type mush have identical resolution, format and so on,
and the number of sub texture in a texture array is fixed.
2) is an array of usual ID3D11Texture2D object. That means each
ID3D11Texture2D is independent each other and might have different resolution as well.
Moreover, we can modify the number of frames of the array dynamically.
This type is more flexible than "1) texture array" in terms of dynamic
behavior and also this type of texture can be used for shader resource view
but "1) texture array" couldn't be.
If "2) array of texture" is supported by driver, DXVA spec is saying that
it's preferred format over "1) texture array" in terms of performance.
The set of supported color space by DXGI is not full combination of
our colorimetry. That means we should convert color space to one
of supported color space by DXGI. This commit modifies the color space
selection step so that d3d11window can find the best matching DXGI color space
first and then the selected input/output color space will be referenced
by shader and/or d3d11videoprocessor.
VP9 codec allows resizing reference frame by spec. Handling this case
is a bit tricky especially when the resizing happens on non-keyframe,
because pre-allocated decoder textures (i.e., dpb) have negotiated
resolution and to change resolution meanwhile decoding on non-keyframe,
each texture might need to be re-created, copied to new dpb somehow,
and re-negotiated with downstream.
Due to the complicated requirement of negotiation driven
resizing handling, this commit adds shader into d3d11decoder object
to resize only corresponding frames. Note that if the resolution change
is detected on keyframe, decoder will re-negotiate with downstream.
Not only any textures for decoder output view, any destination texture
which would be copied from decoder output texture need to be aligned too.
Otherwise driver sometimes crashed/hung (not sure why).
Resolution of NV12, P010, and P016 formats must be multiple of two.
Otherwise texture cannot be created. Instead of doing this alignment
per API consumer side, do this in buffer pool for simplicity.
Now that the system_frame_number is saved on the pictures we can use
gst_video_decoder_get_frame() helper instead of getting the full list
and looping over it.
On new_segment, the decoder is expected to negotiate. The decoder may want to
pre-allocate the needed buffers. Pass the max_dpb_size as this is needed to
determin how many buffers should be allocated.
This introduce a library which contains a set of base classes which
handles the parsing and the state tracking for the purpose of decoding
different CODECs. Currently H264, H265 and VP9 are supported. These
bases classes are used to decode with low level decoding API like DXVA,
NVDEC, VDPAU, VAAPI and V4L2 State Less decoders. The new library is
named gstreamer-codecs-1.0 / libgstcodecs.
This commit moves parsing code for superframe and frame header into
handle_frame() method, and removes parse() implementation from vp9decoder
baseclass.
The combination of
- multiple frames are packed in a given input buffer (i.e., superframe)
- reverse playback
seems to be complicated and also it doesn't work as intended in some case
* Remove redundant variables for width/height and par from GstD3D11Window.
GstVideoInfo holds all the values.
* Don't need to pass par to gst_d3d11_window_prepare().
It will be parsed from caps again
* Remove duplicated math
Fixing regression of the commit 9dada90108
gst_d3d11_result() will print warning message when HRESULT != S_OK.
However, since the retry is trivial stuff, check hr == E_PENDING first
and do not warn it.
The DXGI_PRESENT_ALLOW_TEARING flag might cause unexpected tearing
side effect. Setting it in fullscreen mode only seems to be
the correct usage as in the Microsoft's direct3d examples.
DXVA spec is saying that the size of bitstream buffer provided by hardware decoder
should be 128 bytes aligned. And also the host software decoder should
align the size of written buffer to 128 bytes. That means if the slice
(or frame in case of VP9) size is not aligned with 128 bytes,
the rest of non 128 bytes aligned memory should be zero-padded.
In addition to aligning implementation, some variables are renamed
to be more intuitive by this commit.
This implementation is similar to what we've done for nvcodec plugin.
Since supported resolution, profiles, and formats are device dependent ones,
single template caps cannot represent them, so this modification
will help autoplugging and fallback.
Note that the legacy gpu list and list of resolution to query were
taken from chromium's code.
gst_video_frame_copy will copy input frame to stating texture
of fallback frame. Then, we need to map fallback texture with GST_MAP_D3D11
flag to upload the staging texture to render texture. Otherwise
the render texture wouldn't be updated.
Source texture (decoder view) might be larger than destination (staging) texture.
In that case, D3D11_BOX structure should be passed to CopySubresourceRegion method
in order to specify the exact target area.
DXGI_SWAP_EFFECT_DISCARD cannot be used with dirty rect drawing feature
of IDXGISwapChain1::Present().
Note that IDXGISwapChain1 interface is available on Platform Update for Windows 7
and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL is also the case.
Use resolution specified in caps for input_rect instead of
passed width and height value. The width and height might be modified
ones by d3d11videosink, then frame resolution might be different.
* Move decoding process to handle_frame
* Remove GstVideoDecoder::parse implementation
* Clarify flush/drain/finish usage
In forward playback case, have_frame() call will be followed by
handle_frame() but reverse playback is not the case.
To ensure GstVideoCodecFrame, the decoding process should be placed inside
of handle_frame(), instead of parse().
Since we don't support alignment=nal, the parse() implementation is not worth.
In order to fix broken reverse playback, let's remove the parse()
implementation and revisit it when adding alignment=nal support.
... and remove unused start, stop method from subclass.
Current implementation does not require subclass specific behavior
for the handle_frame() method.
Actually our buffer pool size and the number of backbuffer are
independent. In case of reverse playback, upstream might request
a lot of buffers (up to GOP size).
d3d11window holds one buffer to redraw client area per resize event.
When the input format is being changed, this buffer should be cleared
to avoid mismatch beween newly configured shader/videoprocessor and
the format of previously cached buffer.
Because the size of texture array cannot be updated dynamically,
allocator should block the allocation request. This cannot be
done at buffer pool side if this d3d11 memory is shared among
multiple buffer objects. Note that setting NO_SHARE flag to
d3d11 memory is very inefficient. It would cause most likey
copy of the d3d11 texture.
...for color space conversion if available
ID3D11VideoProcessor is equivalent to DXVA-HD video processor
which might use specialized blocks for video processing
instead of general GPU resource. In addition to that feature,
we need to use this API for color space conversion of DXVA2 decoder
output memory, because any d3d11 texture arrays that were
created with D3D11_BIND_DECODER cannot be used for shader resource.
This is prework for d3d11decoder zero-copy rendering and also
for conditional HDR tone-map support.
Note that some Intel platform is known to support tone-mapping
at the driver level using this API on Windows 10.
Don't specify the resolution of backbuffer. Then dxgi will let us know the
actual client area. When upstream resolution is chagned, updating the size
of backbuffer without the consideration for client size would cause mismatch
between them.
Use consistent memory layout between dxva and other shader use case.
For example, use DXGI_FORMAT_NV12 texture format instead of
two textures with DXGI_FORMAT_R8_UNORM and DXGI_FORMAT_R8G8_UNORM.
This reverts commit ddd13fc7c0
Dynamic usage can reduce the number of copy per frame but make
things complicated and the benefit seems to not significant.
Also since we don't provide _map() method for the dynamic usage,
application cannot read buffers which make "last-sample" property
unusable in case of d3d11videosink.