Commit graph

1302 commits

Author SHA1 Message Date
Mengkejiergeli Ba
5d488c444e msdkvpp: Remove passthrough condition in propose_allocation
According to basetransform func: gst_base_transform_do_bufferpool,
it is the upstream to decide if it needs a bufferpool from vpp, so
we don't need passthrough condition in subclass propose_allocation.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5879>
2024-01-04 04:35:40 +00:00
Seungha Yang
1b86c9d370 d3d12decoder: Try zero-copy in case of reverse playback too
In case of tier 1 decoder, always use reference-only picture to avoid
fixed-size pool limitation and output decoded picture without
copy even for negative rate. Also do not use copy queue for GPU to GPU
copy. Copy queue is specialized for upload/download and may occupy
PCIE bandwidth. Use direct queue as recommended by vendors.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
2024-01-03 02:08:53 +09:00
Seungha Yang
1fb44b8160 d3d12decoder: Fix crash on flush
On flush event, baseclass will discard all pictures from DPB
but there can be still in-flight commands not finished yet.
Use our command queue, allocator and fence data helper objects
to keep resource available during command execution.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
2024-01-03 01:36:42 +09:00
Seungha Yang
d3b3a1c00a d3d12: Add {set,get}_user_data() methods to command allocator
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
2024-01-03 01:19:39 +09:00
Seungha Yang
9d5277e70e d3d12decoder: Fix barrier usage
Common state promotion and decay does not seem to be applied to
decoder commands. Use barriers explicitly

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5877>
2024-01-03 01:17:32 +09:00
Seungha Yang
087e39564d d3d12videosink: Fix buffer leak
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
b0ef890726 d3d12device: Store adapter index
... and remove unused fence object

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
0577d1c9de d3d12: Reduce shader visible descriptor size
Shader visible descriptors occupy GPU resource and there are hardware
limits. Thus, in order to minimize the amount of shader visible heaps,
only non shader visible descriptor heap (staging) will be held by d3d12memory.
Then converter will copy the staging descriptor to shader visible
descriptor heap per draw.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
0ce6e752e4 d3d12: Use CREATE_NOT_ZEROED heap flag if possible
Zero initialization would have overhead and it's not required
most cases except for textures. Use CREATE_NOT_ZEROED flag
in case of buffer resource or if a texture will be rendered without any
prior read operation.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
a1ba38cd64 d3d12: Add compositor element
Adding d3d12compositor element. d3d12compositor will build GPU commands
asynchronously and each command is serialized at final render stage.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
996346f8d3 d3d12converter: Don't map output buffer with write flag
Conversion will happen when constructed command list is executed,
not by converter element. Thus this object should not map output buffer
with write flag which will result in error if multiple threads
are building commands for the same output target frame.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
bffbf0d951 d3d12converter: Add support for texture upload
If buffer is not a d3d12 memory or allocated by other device,
upload to internal d3d12 memory

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
f4fe17d8d2 d3d12converter: Add support for blending
Create new PSO if blend state update is required

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:47 +00:00
Seungha Yang
7c701058ed d3d12: Add testsrc element
Adding testsrc element with d2d interop support via d3d11on12

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5875>
2024-01-02 13:02:46 +00:00
Seungha Yang
abe1f5044d cuda: Prefer CUBIN over PTX
System installed NVRTC library might be newer version than
driver, then generate PTX can be incompatible with the driver.
Instead of the intermediate code PTX, use actual assembly code
directly.

Fixes: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3108
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5639>
2024-01-02 10:10:09 +00:00
Seungha Yang
2f091e7118 d3d12: Add video sink element
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
660f2d7d27 d3d12: Add convert element
Implement converter object with convert element

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
4198bd6932 d3d12: Add helper object for fence operation
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
eeb57061d2 d3d12: Enable plugin only for Windows8 or newer
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
3e49d6a75f d3d12: Define more formats
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
48cfca413d d3d12: Add header containing core features
... and include the single header instead of listing many ones

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
41d8a12649 d3d12: Remove unused methods
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
5ce2a7f64f d3d12bufferpool: Don't pre-allocate memory for size calculation
Unlike d3d11, we can know CPU accessible memory layout without
allocation

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
d316356bf2 d3d12memory: Add alloc_wrapped() method
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
3308a976bd d3d12: Add upload element
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
e4f794cbdd d3d12bufferpool: Wait fence before reusing buffer
Buffer can be released without waiting fence for previous commands

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:39:00 +00:00
Seungha Yang
1c5bba4b6b d3d12decoder: Remove ID3D12Device4 interface requirement
Old OS may not support the interface. And allow 11_0 feature level
hardware.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
4e5d4a45a3 d3d12decoder: Reduce the number of resource barriers
Single barrier per texture is sufficient in case of array-of-textures.
Avoid unnecessary decay barriers

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
e6bdb0458c d3d12decoder: Use flexible task queue
Instead of using fixed size command allocator array, make it
resizable.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
efc023e76e d3d12: Rework command scheduling
* Use single fence object per queue and remove GstD3D12Fence
  implementation
* Add a helper method for texture copy
* Run background thread and release unused resource from the thread

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
6d7d9291c3 d3d12: Add resource pool objects
Adding pool objects for command list, command allocator, and descriptor
heap

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
93458c0155 d3d12memory: Add more SRV/RTV getter methods
Adding a method so that memory object can create SRV/RTV
on external descriptor heap. And remove unused methods

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Seungha Yang
2578f234dd d3d12: Remove d3d11 dependency
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5870>
2023-12-29 14:38:59 +00:00
Víctor Manuel Jáquez Leal
fdfd51397b vajpegdec: only support progressive mjpeg streams
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5838>
2023-12-22 04:38:06 +00:00
Marek Vasut
5f3d4215a0 v4l2codecs: Switch gst_codec_picture_ts_ns() to gst_util_uint64_scale_int()
Instead of plain multiplication, use gst_util_uint64_scale_int()
to achieve the same effect with additional checks.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
3335af0efe v4l2codecs: Deduplicate picture frame number to timestamp in ns
Add macro which converts picture frame number to suitable timestamp in
nanoseconds for use in V4L2 VB2 buffer lookup. Since multiple codecs do
the same operation and almost all got it wrong, do it in one place so it
can be fixed in one place again, if needed.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
552a171671 vp9decoder: Simplify gst_v4l2_codecs_vp9_dec_fill_refs()
In case reference_frames is NULL, return outright. Remove the
duplicate check from subsequent conditionals. No functional change.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
4560cdff5c h265decoder: Align wraparound fix
Instead of casting GST_CODEC_PICTURE_FRAME_NUMBER (ref_pic) to u64,
use 1000ULL which is also u64 . This only aligns the behavior here
with '*decoder: Fix multiplication wraparound' commits.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
7725aa7d77 h264decoder: Align wraparound fix
Instead of casting GST_CODEC_PICTURE_FRAME_NUMBER (ref_pic) to u64,
use 1000ULL which is also u64 . This only aligns the behavior here
with '*decoder: Fix multiplication wraparound' commits.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
3cbf09d0c9 mpeg2decoder: Fix multiplication wraparound
The GstMpeg2Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecMpeg2Dec *_ref_ts multiplication result is u64 .

```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```

so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during MPEG2 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .

Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
50fb6f8c02 av1decoder: Fix multiplication wraparound
The GstAV1Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecAV1Dec v4l2_av1_frame.*_frame_ts multiplication result is u64 .

```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```

so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during AV1 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .

Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
6f74818a07 vp9decoder: Fix multiplication wraparound
The GstVp9Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecVp9Dec v4l2_vp9_frame.*_frame_ts multiplication result is u64 .

```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```

so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during VP9 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .

Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
Marek Vasut
15c9a6caa9 vp8decoder: Fix multiplication wraparound
The GstVp8Picture system_frame_number is guint32, constant 1000 is guint32,
GstV4l2CodecVp8Dec v4l2_vp8_frame.*_frame_ts multiplication result is u64 .

```
u64 result = (u32)((u32)system_frame_number * (u32)1000);
```
behaves the same as
```
u64 result = (u32)(((u32)system_frame_number * (u32)1000) & 0xffffffff);
```

so in case `system_frame_number > 4294967295 / 1000`, the `result` will
wrap around. Since the `result` is really used as a cookie used to look
up V4L2 buffers related to the currently decoded frame, this wraparound
leads to visible corruption during VP8 decoding. At 30 FPS this occurs
after cca. 40 hours of playback .

Fix this by changing the 1000 from u32 to u64, i.e.:
```
u64 result = (u64)((u32)system_frame_number * (u64)1000ULL);
```
this way, the wraparound is prevented and the correct cookie is used.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5791>
2023-12-20 18:47:39 +00:00
He Junyan
35390de50d vacompositor: consider the DMA kind input for sink pad
Co-authored-by: Víctor Jáquez <vjaquez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
2023-12-16 10:56:09 +00:00
He Junyan
0bbaa00c3f vacompositor: Override the update_caps() of video aggregator class
The input of the vacompositor may be DMA buffers. And in this case, the input
caps has the format=DMA_DRM, which can not be recognized by base video
aggregator class' find_best_format() function. So we need to override the
update_caps() virtual function.

Also we consider the DMA kind caps in negotiated_src_caps() for output.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
2023-12-16 10:56:09 +00:00
He Junyan
e68724c8e0 vacompositor: add helper function to get formats from caps
Co-authored-by: Víctor Jáquez <vjaquez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
2023-12-16 10:56:09 +00:00
He Junyan
c76999fe7b vacompositor: add helper function to choose format
The function is based on the most supported formats by Intel/Mesa VA drivers.

Co-authored-by: Víctor Jáquez <vjaquez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
2023-12-16 10:56:09 +00:00
He Junyan
eb4d2dd204 vacompositor: record the input caps for each input pad
The caps of each input sink pad wil decide the final output format and
caps of the src pad.

Co-authored-by: Víctor Jáquez <vjaquez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5160>
2023-12-16 10:56:09 +00:00
Seungha Yang
800a83b435 d3d12decoder: Implement threaded decoding
To achieve maximum throughput, waiting on command commit thread
is not ideal. And render-delay will introduce unwanted latency.
Best is to split thread and wait finished decoding job in a dedicated
output thread

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5812>
2023-12-15 22:56:33 +09:00
Seungha Yang
7471db9a30 d3d12decoder: Disable d3d11 interop
It does not seem to work with some AMD iGPU

Part-of: <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5812>
2023-12-15 20:40:21 +09:00