Commit graph

37 commits

Author SHA1 Message Date
Seungha Yang
a10f26aa3a nvenc: Do not access to broken encode session
If an encode session failed in initializing, the encode
session would be broken and the next nvenc API will cause crash.

Fixes: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1179
2020-01-21 16:34:41 +09:00
Seungha Yang
49bccf0433 nvcodec: Refactor plugin initialization
Create CUDA context per device, instead of per codec and encoder/decoder.
Allocating CUDA context is heavy operation so we should reuse it
as much as possible.

Fixes: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1130
2019-12-24 08:10:14 +00:00
Seungha Yang
0cf67c3be7 nvenc: Fix crash when nvenc was reused then freed without encoding
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
2019-11-22 03:02:57 +00:00
Seungha Yang
aef414375a nvenc: Remove unused code path
refilling queue would not happen
2019-11-22 03:02:57 +00:00
Aaron Boxer
6d3429af34 documentation: fixed a heap o' typos 2019-11-05 09:11:25 -05:00
Seungha Yang
52dfbbe5da nvenc: Early terminate handle_frame if the last flow was not GST_FLOW_OK
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).

To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
2019-09-11 15:21:03 +00:00
Seungha Yang
68a51abdcd nvenc: Add support VUYA format
The addition is very simple. Map NV_ENC_BUFFER_FORMAT_AYUV format
to GST_VIDEO_FORMAT_VUYA and add a condition for the VUYA format.
2019-09-11 14:33:54 +00:00
Seungha Yang
af77988b9f nvenc: Reduce the number of pre-allocated device memory
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
2019-09-11 11:44:03 +00:00
Seungha Yang
1cbb23cf79 nvenc: Adjust DTS when bframe is enabled
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
2019-09-11 13:18:12 +09:00
Seungha Yang
83a1c7a9a6 nvenc: Add qp-{min,max,const}-{i,p,b} properties
This new properties allows more detailed target QP value setting
2019-09-11 13:18:12 +09:00
Seungha Yang
d3a909ccdd nvenc: Add properties to support bframe encoding if device supports it
Note that bframe encoding capability varies with GPU architecture
2019-09-11 13:18:12 +09:00
Seungha Yang
94f2843774 nvenc: Refactoring internal buffer pool structure
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.

New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
  - If NvEncEncodePicture returned success, then move all pair in pending_queue
    to final stage
  - Otherwise, wait more input raw pictures.

Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
2019-09-11 13:18:12 +09:00
Seungha Yang
e73acbaa5c nvenc: Remove pointless iteration and cleanup some code
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
2019-09-11 13:18:12 +09:00
Seungha Yang
81272eaa82 nvenc: Add more rate-control options
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality

Also, various configurable rate-control related properties are added.
2019-09-11 13:18:12 +09:00
Seungha Yang
ea19a7c715 nvenc: Add support for weighted prediction option
Note that this property will be exposed only if the device
supports the weighted prediction.
2019-09-11 13:18:12 +09:00
Seungha Yang
d05cbdbd72 nvenc: Add property for AUD insertion
Make AUD insertion configurable option
2019-09-11 13:18:12 +09:00
Seungha Yang
b3b723462e nvenc: Refactor class hierarchy to handle device capability dependent options
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
2019-09-11 13:18:09 +09:00
Seungha Yang
09fd34dbb0 nvenc: Add support runtime resolution change freely
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
2019-09-02 10:59:03 +09:00
Seungha Yang
ff9838fd3d nvenc: Add support for old drivers which could not understand SDK version 9.0
Add helper functions to support old drivers
with our previous SDK version 8.1
2019-08-29 13:39:59 +00:00
Seungha Yang
afebb15d99 nvenc: Use consistent snake case convention 2019-08-29 13:39:59 +00:00
Seungha Yang
338a32b672 nvenc: Port to GstCudaGraphicsResource
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
2019-08-29 18:45:25 +09:00
Seungha Yang
eab564d857 nvcodec: Use default flag for CUDA stream creation
Since nvdec/nvenc engine is running on default stream,
non-default CUDA stream should be synchronized with default
stream eventually.
2019-08-19 07:13:26 +00:00
Seungha Yang
ca6657367c nvenc: Use non default CUDA stream and async operation
Use CUDA async operation if possible with non default CUDA stream
2019-08-19 01:18:52 +00:00
Seungha Yang
e6d21d048a nvenc: Add support YV12 format
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
2019-08-09 11:43:22 +09:00
Seungha Yang
f3e12a0b56 nvh265enc: Add support YUV 444 10bits encoding
Note that h264 encoder does not support the YUV 444 10bits format
2019-08-08 00:46:16 +09:00
Seungha Yang
cc4d0e91e3 nvenc: Fix broken RGB format support
Add missing format check introduced by the commit 7de4dbdeb2
2019-08-07 07:27:36 +00:00
Seungha Yang
9d0545d1a2 nvcodec: Wrap CUDA API return check with gst_cuda_result
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
2019-08-07 00:59:36 +00:00
Seungha Yang
5cf0351418 nvenc: Port to GstCudaContext
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
2019-08-07 00:59:36 +00:00
Seungha Yang
7de4dbdeb2 nvenc: Return profile compatible input formats from GstVideoEncoder::getcaps
Do not accept any input formats which could not be supported
by downstream requested codec profiles.
2019-08-06 15:03:22 +00:00
Seungha Yang
9e81f8e700 nvenc: Fix caps negotiation failure on unspecified interlace-mode
During GstVideoInfo conversion from GstCaps, interlace-mode is
inferred to progressive so unspecified interlace-mode should not cause any
negotiation issue. Simly set GST_PAD_FLAG_ACCEPT_INTERSECT flag
on sinkpad to fix issue.
2019-08-06 15:03:22 +00:00
Seungha Yang
e68bfd7566 nvenc: Add support RGB 8/10bits formats
BGRA/RGBA/RGB10A2/BGR10A2 formats can be supported by nvenc.
Depending on device, supported format can be different.

Fixes: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1038
2019-08-05 18:55:28 +00:00
Seungha Yang
158b4d8649 nvenc: Fix crash with unspecified framerate
Nvidia driver seems to calculating floating point framerate
without validation. This causes crash both on linux and Windows.

Fixes: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1012
2019-08-05 15:32:43 +00:00
Seungha Yang
0445ed6ba5 nvenc: Fix deadlock when pad_push return was not GST_FLOW_OK
Encoding thread is terminated without any notification so
upstream streaming thread is locked because there is nothing
to pop from GAsyncQueue. If downstream returns error,
we need put SHUTDOWN_COOKIE to GAsyncQueue for chain function
can wakeup.
2019-07-30 17:49:25 +09:00
Seungha Yang
a2ada54265 nvcodec: Keep requested rank for default device
Fix for default encoder and decoder element factory to make them have
higher rank than the others.
2019-07-23 10:28:52 +09:00
Seungha Yang
92afa74939 nvenc: Register elements per GPU device with capability check
* By this commit, if there are more than one device,
nvenc element factory will be created per
device like nvh264device{device-id}enc and nvh265device{device-id}enc
in addition to nvh264enc and nvh265enc, so that the element factory
can expose the exact capability of the device for the codec.

* Each element factory will have fixed cuda-device-id
which is determined during plugin initialization
depending on the capability of corresponding device.
(e.g., when only the second device can encode h265 among two GPU,
then nvh265enc will choose "1" (zero-based numbering)
as it's target cuda-device-id. As we have element factory
per GPU device, "cuda-device-id" property is changed to read-only.

* nvh265enc gains ability to encoding
4:4:4 8bits, 4:2:0 10 bits formats and up to 8K resolution
depending on device capability.
Additionally, I420 GLMemory input is supported by nvenc.
2019-07-22 21:01:41 +00:00
Seungha Yang
afe3c7e3ef nvcodec: Drop cudaGL.h dependency
nvcodec does not use any type/define/enum in cudaGL.h.
2019-07-22 23:11:14 +09:00
Seungha Yang
c18fda03d9 nvdec,nvenc: Port to dynamic library loading
... and put them into new nvcodec plugin.

* nvcodec plugin
Now each nvenc and nvdec element is moved to be a part of nvcodec plugin
for better interoperability.
Additionally, cuda runtime API header dependencies
(i.e., cuda_runtime_api.h and cuda_gl_interop.h) are removed.
Note that cuda runtime APIs have prefix "cuda". Since 1.16 release with
Windows support, only "cuda.h" and "cudaGL.h" dependent symbols have
been used except for some defined types. However, those types could be
replaced with other types which were defined by "cuda.h".

* dynamic library loading
CUDA library will be opened with g_module_open() instead of build-time linking.
On Windows, nvcuda.dll is installed to system path by CUDA Toolkit
installer, and on *nix, user should ensure that libcuda.so.1 can be
loadable (i.e., via LD_LIBRARY_PATH or default dlopen path)
Therefore, NVIDIA_VIDEO_CODEC_SDK_PATH env build time dependency for Windows
is removed.
2019-07-08 10:37:46 +00:00
Renamed from sys/nvenc/gstnvbaseenc.c (Browse further)