Commit graph

82 commits

Author SHA1 Message Date
Seungha Yang
49bccf0433 nvcodec: Refactor plugin initialization
Create CUDA context per device, instead of per codec and encoder/decoder.
Allocating CUDA context is heavy operation so we should reuse it
as much as possible.

Fixes: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1130
2019-12-24 08:10:14 +00:00
Seungha Yang
0cf67c3be7 nvenc: Fix crash when nvenc was reused then freed without encoding
GstNvBaseEnc::n_bufs was set from the previous encoding session
but it wasn't cleared after stop. That might result to invalid memory
access at the next start (no encoded data) and then stop sequence.
Instead of defining a variable for array length, use GArray::len
directly to avoid such confusion.
2019-11-22 03:02:57 +00:00
Seungha Yang
aef414375a nvenc: Remove unused code path
refilling queue would not happen
2019-11-22 03:02:57 +00:00
Aaron Boxer
6d3429af34 documentation: fixed a heap o' typos 2019-11-05 09:11:25 -05:00
Tim-Philipp Müller
f218ec2794 Remove autotools build system 2019-10-14 13:54:27 +01:00
Seungha Yang
52dfbbe5da nvenc: Early terminate handle_frame if the last flow was not GST_FLOW_OK
If the last flow was not GST_FLOW_OK, the encoding thread is not running
and there is nothing to pop from GAsyncQueue (this causes deadlock).

To prevent deadlock, just return the handle_frame without further encoding
process if the last flow was not GST_FLOW_OK. Note that the last flow
will be cleared per FLUSH_STOP and STREAM_START event.
2019-09-11 15:21:03 +00:00
Seungha Yang
68a51abdcd nvenc: Add support VUYA format
The addition is very simple. Map NV_ENC_BUFFER_FORMAT_AYUV format
to GST_VIDEO_FORMAT_VUYA and add a condition for the VUYA format.
2019-09-11 14:33:54 +00:00
Seungha Yang
14b9a1cffd nvdec: Add support for mpeg4 video decoding with codec_data
Decoder should handle codec_data of mpeg4 video which includes essential
config data.
2019-09-11 14:00:48 +00:00
Seungha Yang
af77988b9f nvenc: Reduce the number of pre-allocated device memory
The hard-coded upper bound 32 (or 48 depending on resolution) might waste
GPU memory and high resolution encoding causes OUT-OF-MEMORY allocation error
quite easily. This commit calculates the number of required pre-allocated
device memory based on encoding options and it can reduce the amount of device memory
used by nvenc.
2019-09-11 11:44:03 +00:00
Seungha Yang
e3508a4f26 nvdec: Update plugin description and fix typo
Use consistent description with nvenc, and fix typo s/devide/device/g
2019-09-11 15:16:45 +09:00
Seungha Yang
1cbb23cf79 nvenc: Adjust DTS when bframe is enabled
NVDEC driver always uses input timestamp without adjustment
even if bframe encoding was enabled.
So DTS can be larger than PTS when bframe was enabled.
To ensure PTS >= DTS, we should adjust the timestamp manually
based on the PTS difference between the first
encoded frame and the second one. That's also the maximum PTS/DTS
difference.
2019-09-11 13:18:12 +09:00
Seungha Yang
83a1c7a9a6 nvenc: Add qp-{min,max,const}-{i,p,b} properties
This new properties allows more detailed target QP value setting
2019-09-11 13:18:12 +09:00
Seungha Yang
d3a909ccdd nvenc: Add properties to support bframe encoding if device supports it
Note that bframe encoding capability varies with GPU architecture
2019-09-11 13:18:12 +09:00
Seungha Yang
94f2843774 nvenc: Refactoring internal buffer pool structure
To support rc-lookahead and bframe encoding, nvenc needs one more
staging queue, because NvEncEncodePicture can return NV_ENC_ERR_NEED_MORE_INPUT
but which was not considered so far.
As documented by NVENC programming guide, pending buffers should wait
other inputs until NvEncEncodePicture returns success.

New encoding flow is
- Submit raw picture buffer to encoder with NvEncEncodePicture
- The submitted input/output buffer pair will be queued to pending_queue
  - If NvEncEncodePicture returned success, then move all pair in pending_queue
    to final stage
  - Otherwise, wait more input raw pictures.

Another change is dropping NV_ENC_LOCK_INPUT_BUFFER usage.
So now nvenc always uses CUDA memory input buffer. As a result,
both opengl and system memory handling are unified.
2019-09-11 13:18:12 +09:00
Seungha Yang
e73acbaa5c nvenc: Remove pointless iteration and cleanup some code
* The number of iteration is always one so the iteration is useless
and that makes code complicated.
* Also defining named structure can code mroe readable.
* g_free is null safe
2019-09-11 13:18:12 +09:00
Seungha Yang
81272eaa82 nvenc: Add more rate-control options
New rate-control modes are introduced (if device can support)
* cbr-ld-hr: CBR low-delay high quality
* cbr-hq: CBR high quality
* vbr-hq: VBR high quality

Also, various configurable rate-control related properties are added.
2019-09-11 13:18:12 +09:00
Seungha Yang
ea19a7c715 nvenc: Add support for weighted prediction option
Note that this property will be exposed only if the device
supports the weighted prediction.
2019-09-11 13:18:12 +09:00
Seungha Yang
d05cbdbd72 nvenc: Add property for AUD insertion
Make AUD insertion configurable option
2019-09-11 13:18:12 +09:00
Seungha Yang
b3b723462e nvenc: Refactor class hierarchy to handle device capability dependent options
Introducing new dynamic class between GstNvBaseEncClass and
each subclass to be able to access device specific properties and
capabilities from each subclass implementation side.
2019-09-11 13:18:09 +09:00
Marc Leeman
3ef503346a nvcodec: minor spell corrects in log messages 2019-09-10 23:13:17 +00:00
Seungha Yang
09fd34dbb0 nvenc: Add support runtime resolution change freely
Do not restrict allowed maximum resolution depending on the
initial resolution. If new resolution is larger than previous one,
just re-init encode session.
2019-09-02 10:59:03 +09:00
Seungha Yang
fa83f086be nvdec: Check flow return of the only current handle_frame() to fix seeking issue
Due to uncleared last flow, decoding after seek was never possible
(last_ret == GST_FLOW_FLUSHING).
nvdec dose not need to keep track of the previous flow return,
and actually the interest is data/even flow of the current handle_frame().
2019-08-30 11:27:27 +09:00
Seungha Yang
be3a3da829 nvdec: Fallback to system memory if OpenGL context could not support PBO memory
If the environment could not support OpenGL PBO memory, nvdec will do negotiation
with system memory as fallback.
2019-08-30 01:36:46 +09:00
Seungha Yang
069fe93452 nvdec: Add support dynamic output format change
Implementing ::negotiate() method to support runtime output format
change. If downstream was reconfigured, baseclass will invoke
::negotiate() method, and nvdec should update output memory
type depending on downstream caps.
2019-08-30 01:36:46 +09:00
Seungha Yang
39f800c449 nvdec: Re-negotiate whenever output format is changed
Input stream might be silently changed without ::set_format() call.
Since nvdec has internal parser, nvdec element can figure out the format change
by itself.
2019-08-30 01:36:41 +09:00
Seungha Yang
f4f8941a91 nvdec: Add support 4:4:4 and 4:2:0 12bit decoding
Depending on GPU architecture, HEVC decoder can support
4:4:4 format up to 12 bitdepth. This commit covers VP9 4:2:0 12 bits
decoding also.
2019-08-29 13:39:59 +00:00
Seungha Yang
ff9838fd3d nvenc: Add support for old drivers which could not understand SDK version 9.0
Add helper functions to support old drivers
with our previous SDK version 8.1
2019-08-29 13:39:59 +00:00
Seungha Yang
afebb15d99 nvenc: Use consistent snake case convention 2019-08-29 13:39:59 +00:00
Seungha Yang
1010b9f567 nvcodec: Bump SDK header to version 9.0
The latest Turing architecture (e.g., RTX serise) can support
decoding HEVC 4:4:4 format up to 12bits.
2019-08-29 13:39:59 +00:00
Seungha Yang
338a32b672 nvenc: Port to GstCudaGraphicsResource
Register openGL resource only once per memory. Also if upstream
provides the registered information, reuse the information
instead of doing it again. This can improve performance dramatically
depending on system since the resource registration might cause
high overhead.
2019-08-29 18:45:25 +09:00
Seungha Yang
d0846f8eab nvdec: Port to GstCudaGraphicsResource
Make it possible to share registered graphics resource among nvidia encoders
and decoders.
2019-08-29 18:05:51 +09:00
Seungha Yang
da075b94a9 cudautils: Add GstCudaGraphicsResource structure for better openGL interoperability
Introduce GstCudaGraphicsResource structure to represent registered
CUDA graphics resources and to enable sharing the information among
nvdec and nvenc. This structure can reduce the number of resource
registration which cause high overhead.
2019-08-29 18:04:33 +09:00
Seungha Yang
8dc2b4a393 nvdec: Port to openGL PBO memory
For openGL interoperability, nvdec uses cuGraphicsGLRegisterImage API
which is to register openGL texture image.
Meanwhile nvenc uses cuGraphicsGLRegisterBuffer API to registure openGL buffer object.
That means two kinds of graphics resources are registered per memory
when nvdec/nvenc are configured at the same time.
The graphics resource registration brings possibly high overhead
so the registration should be performed only once per resource
from optimization point of view.
2019-08-29 18:04:33 +09:00
Seungha Yang
9bfd6d13e6 nvdec: Filter openGL API version to use
To ensure PBO buffer, openGL API >= 3 is required.
2019-08-29 18:04:29 +09:00
Seungha Yang
807e311ae8 nvdec: Always response QUERY_CONTEXT even if openGL is unavailable on the system
nvdec can response for the CUDA context type query regardless of openGL
availability.
2019-08-21 14:14:07 +09:00
Seungha Yang
4f60117db9 nvdec: Fix possible null object unref
gst_query_get_n_allocation_pools > 0 does not guarantee that
the N th internal array has GstBufferPool object. So users should
check the returned GstBufferPool object from
gst_query_parse_nth_allocation_pool.
2019-08-20 10:14:54 +09:00
Seungha Yang
eab564d857 nvcodec: Use default flag for CUDA stream creation
Since nvdec/nvenc engine is running on default stream,
non-default CUDA stream should be synchronized with default
stream eventually.
2019-08-19 07:13:26 +00:00
Seungha Yang
ca6657367c nvenc: Use non default CUDA stream and async operation
Use CUDA async operation if possible with non default CUDA stream
2019-08-19 01:18:52 +00:00
Seungha Yang
5615e9258f nvdec: Don't use default CUDA stream
Async CUDA operation with default stream (NULL CUstream) is not much
beneficial than blocking operation since all CUDA operations which belong
to the CUDA context will be synchronized with the default stream's operation.
Note that CUDA stream will share all resources of the corresponding CUDA context
but which can help parallel operation similar to the relation between thread and process
2019-08-19 01:18:52 +00:00
Seungha Yang
20d8f54e63 nvdec: Push/Pop CUDA context around library API call 2019-08-19 01:18:52 +00:00
Seungha Yang
f7b2b1b99d nvdec: Fix timestamp mismatch on draining frames
The internal decoding state must be GST_NVDEC_STATE_PARSE before
calling CuvidParseVideoData(). Otherwise, nvdec will be confused
on decode callback as if the frame is decoding only frame and
the input timestamp of corresponding frame will be ignored.
Eventually one decoded frame will have non-increased PTS.
2019-08-18 15:52:32 +09:00
Seungha Yang
b64733972e nvdec: Do not access nvdec object from destroy function of qdata
The destroy callback can be called just before the fìnalization of
GstMiniObject. So the nvdec object might be destroyed already.
Instead, store the GstCudaContext with increased ref to safely
unregister the CUDA resource.
2019-08-16 19:40:31 +09:00
Seungha Yang
e6d21d048a nvenc: Add support YV12 format
YV12 format is supported by Nvidia NVENC without manual conversion.
So nvenc is exposing YV12 format at sinkpad template but there is some
missing point around uploading the memory to GPU.
2019-08-09 11:43:22 +09:00
Seungha Yang
8dbaed0af7 nvh265enc: Enable HDR related SEI nal insertion
If upstream provides the HDR related information, create SEI message
nals and pass them to NVENC.
2019-08-08 23:18:14 +09:00
Seungha Yang
f3e12a0b56 nvh265enc: Add support YUV 444 10bits encoding
Note that h264 encoder does not support the YUV 444 10bits format
2019-08-08 00:46:16 +09:00
Seungha Yang
fa5e6f546b nvenc: Remove unnecessary constraint from YUV420 10bits capability decision
YUV444 capability shouldn't be applied to YUV420 10 bits format
2019-08-08 00:46:12 +09:00
Seungha Yang
cc4d0e91e3 nvenc: Fix broken RGB format support
Add missing format check introduced by the commit 7de4dbdeb2
2019-08-07 07:27:36 +00:00
Seungha Yang
9d0545d1a2 nvcodec: Wrap CUDA API return check with gst_cuda_result
The gst_cuda_result macro function is more helpful for debugging
than previous cuda_OK because gst_cuda_result prints the function
and line number. If the CUDA API return was not CUDA_SUCCESS,
gst_cuda_result will print WARNING level debug message with
error name, error text strings.
2019-08-07 00:59:36 +00:00
Seungha Yang
d69b590683 nvdec: Port to GstCUDAContext
... and drop CUvideoctxlock usage. The CUvideoctxlock basically
has the identical role of cuda context push/pop but nvdec specific
way. Since we can share the CUDA context among encoders and decoders,
use CUDA context directly for accessing GPU API.
2019-08-07 00:59:36 +00:00
Seungha Yang
5cf0351418 nvenc: Port to GstCudaContext
... and add support CUDA context sharing similar to glcontext sharing.
Multiple CUDA context per GPU is not the best practice. The context
sharing method is very similar to that of glcontext. The difference
is that there can be multiple context object on a pipeline since
the CUDA context is created per GPU id. For example, a pipeline
has nvh264dec (uses GPU #0) and nvh264device0dec (uses GPU #1),
then two CUDA context will propagated to all pipeline.
2019-08-07 00:59:36 +00:00