Since our decoder DPB texture pool cannot be grown once it's
configured, we should pre-allocate sufficient number of textures
for zero-copy playback (but not too many).
The "min buffers" allocation query parameter can be a hint for
the number of required textures in addition to DPB size.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2017>
Unlike other stateless decoder implementations (e.g., VA),
our DPB pool cannot be grown since we are using
texture array (pre-allocated, fixed-size d3d11 texture pool).
So, if there's no more available texture to use,
there's no way other than copying it to downstream's
d3d11 buffer pool. Otherwise deadlock will happen.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2003>
Some GPUs (especially NVIDIA) are complaining that GPU is still busy
even we did 50 times of retry with 1ms sleep per failure.
Because DXVA/D3D11 doesn't provide API for "GPU-IS-READY-TO-DECODE"
like signal, there seems to be still no better solution other than sleep.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1913>
Move d3d11 device, memory, buffer pool and minimal method
to gst-libs so that other plugins can access d3d11 resource.
Since Direct3D is primary graphics API on Windows, we need
this infrastructure for various plugins can share GPU resource
without downloading GPU memory.
Note that this implementation is public only for -bad scope
for now.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/464>
Hide most of symbols of GstD3D11Memory object.
GstD3D11Memory is one of primary resource for imcoming d3d11 library
and it's expected to be a extensible feature.
Hiding implementation detail would be helpful for later use case.
Summary of this commit:
* Now all native Direct3D11 resources are private of GstD3D11Memory.
To access native resources, getter methods need to be used
or generic map (e.g., gst_memory_map) API should be called
apart from some exceptional case such as d3d11decoder case.
* Various helper methods are added for GstBuffer related operations
and in order to remove duplicated code.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1892>
DXVA supports two kinds of texture structure for DPB, one is
"1) texture array" and the other is "2) array of texture".
1) is a type of texture which is single ID3D11Texture2D object having
ArraySize greater than one. So the ID3D11Texture2D itself is a set of texture.
Each sub texture of this type mush have identical resolution, format and so on,
and the number of sub texture in a texture array is fixed.
2) is an array of usual ID3D11Texture2D object. That means each
ID3D11Texture2D is independent each other and might have different resolution as well.
Moreover, we can modify the number of frames of the array dynamically.
This type is more flexible than "1) texture array" in terms of dynamic
behavior and also this type of texture can be used for shader resource view
but "1) texture array" couldn't be.
If "2) array of texture" is supported by driver, DXVA spec is saying that
it's preferred format over "1) texture array" in terms of performance.
VP9 codec allows resizing reference frame by spec. Handling this case
is a bit tricky especially when the resizing happens on non-keyframe,
because pre-allocated decoder textures (i.e., dpb) have negotiated
resolution and to change resolution meanwhile decoding on non-keyframe,
each texture might need to be re-created, copied to new dpb somehow,
and re-negotiated with downstream.
Due to the complicated requirement of negotiation driven
resizing handling, this commit adds shader into d3d11decoder object
to resize only corresponding frames. Note that if the resolution change
is detected on keyframe, decoder will re-negotiate with downstream.
Not only any textures for decoder output view, any destination texture
which would be copied from decoder output texture need to be aligned too.
Otherwise driver sometimes crashed/hung (not sure why).
gst_d3d11_result() will print warning message when HRESULT != S_OK.
However, since the retry is trivial stuff, check hr == E_PENDING first
and do not warn it.
This implementation is similar to what we've done for nvcodec plugin.
Since supported resolution, profiles, and formats are device dependent ones,
single template caps cannot represent them, so this modification
will help autoplugging and fallback.
Note that the legacy gpu list and list of resolution to query were
taken from chromium's code.
Source texture (decoder view) might be larger than destination (staging) texture.
In that case, D3D11_BOX structure should be passed to CopySubresourceRegion method
in order to specify the exact target area.
Use consistent memory layout between dxva and other shader use case.
For example, use DXGI_FORMAT_NV12 texture format instead of
two textures with DXGI_FORMAT_R8_UNORM and DXGI_FORMAT_R8G8_UNORM.
This reverts commit ddd13fc7c0
Dynamic usage can reduce the number of copy per frame but make
things complicated and the benefit seems to not significant.
Also since we don't provide _map() method for the dynamic usage,
application cannot read buffers which make "last-sample" property
unusable in case of d3d11videosink.