From 91b7d3679e306c9031760dd137f08c7f1fdad949 Mon Sep 17 00:00:00 2001 From: He Junyan Date: Mon, 6 Dec 2021 16:30:46 +0800 Subject: [PATCH] Docs: Add the design document for DMA buffer sharing. Co-authored-by: Nicolas Dufresne Co-authored-by: Victor Jaquez Part-of: --- .../markdown/additional/design/dmabuf.md | 344 ++++++++++++++++++ 1 file changed, 344 insertions(+) create mode 100644 subprojects/gst-docs/markdown/additional/design/dmabuf.md diff --git a/subprojects/gst-docs/markdown/additional/design/dmabuf.md b/subprojects/gst-docs/markdown/additional/design/dmabuf.md new file mode 100644 index 0000000000..d71287da32 --- /dev/null +++ b/subprojects/gst-docs/markdown/additional/design/dmabuf.md @@ -0,0 +1,344 @@ +# DMA buffers + +This document describes the GStreamer caps negotiation of DMA buffers on +Linux-like platforms. + +The DMA buffer sharing is the efficient way to share the buffer/memory +between different Linux kernel driver, such as codecs/3D/display/cameras. +For example, the decoder may want its output to be directly shared with the +display server for rendering without a copy. + +Any device driver which is part of DMA buffer sharing, can do so as either +the *exporter* or *importer* of buffers. + +This kind of buffer/memory is usually stored in non-system memory (maybe in +device's local memory or something else not directly accessible by the +CPU), then its memory mapping for CPU access may impose a big overhead and +low performance, or even impossible. + +DMA buffers are exposed to user-space as *file descriptors* allowing to pass +them between processes. + + +# DRM PRIME buffers + +PRIME is the cross device buffer sharing framework in DRM kernel +subsystem. These are the ones normally used in GStreamer which might +contain video frames. + +PRIME buffers requires some metadata to describe how to interpret them, +such as a set of file descriptors (for example, one per plane), color +definition in fourcc, and DRM-modifiers. If the frame is going to be mapped +onto system's memory, also is needed padding, strides, offsets, etc. + + +## File descriptor + +Each file descriptor represents a chunk of a frame, usually a plane. For +example, when a DMA buffer contains NV12 format data, it might be +composited by 2 planes: one for its Y component and the other for both UV +components. Then, the hardware may use two detached memory chunks, one per +plane, exposed as two file descriptors. Otherwise, if hardware uses only +one continuous memory chunk for all the planes, the DMA buffer should just +have one file descriptor. + + +## DRM fourcc + +Just like fourcc common usage, DRM-fourcc describes the underlying format +of the video frame, such as `DRM_FORMAT_YVU420` or `DRM_FORMAT_NV12`. All +of them with the prefix `DRM_FORMAT_`. Please refer to `drm_fourcc.h` in +the kernel for a full list. This list of fourcc formats maps to GStreamer +video formats. + + +## DRM modifier + +DRM-modifier describes the translation mechanism between pixel to memory +samples and the actual memory storage of the buffer. The most +straightforward modifier is LINEAR, where each pixel has contiguous storage +and pixel location in memory can be easily calculated with the stride. This +is considered the baseline interchange format, and most convenient for CPU +access. Nonetheless, modern hardware employs more sophisticated memory +access mechanisms, such as tiling and possibly compression. For example, +the TILED modifier describes memory storage where pixels are stored in 4x4 +blocks arranged in row-major ordering. For example, the first tile in +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in +memory stores pixels (4,0) to (7,3) inclusive, and so on. + +DRM-modifier is a sixteen hexadecimal digits to represent these memory +layouts. For example, `0x0000000000000000` means linear, +`0x0100000000000001` means Intel's X tile mode, etc. Please refer to +`drm_fourcc.h` in kernel for a full list. + +Excepting the linear modifier, the first 8 bits represent the vendor ID and +the other 56 bits describe the memory layout, which may be hardware +dependent. Users should be careful when interpreting non-linear memory by +themselves. + +Please bear in mind that, even for the linear modifier, as the access to +DMA memory's content is through `map()` / `unmap()` functions, its +read/write performance may be low or even bad, because of its cache type +and coherence assurance. So, most of the times, it's advised to avoid that +code path for upload or download frame data. + + +## Meta Data + +The meta data contains information about how to interpret the memory +holding the video frame, either when the frame mapped and its DRM modifier +is linear, or by other API that imports those DMA buffers. + + +# DMABufs in GStreamer + + +## Representation + +In GStreamer, a full DMA buffer-based video frame is mapped to a +`GstBuffer`, and each file descriptor used to describe the whole frame is +held by a `GstMemory` mini-object. A derived class of `GstDmaBufAllocator` +would be implemented for every wrapped API *exporting* DMA buffers to +user-space, as memory allocator. + + +## DRM format caps field + +The *GstCapsFeatures* *memory:DMABuf* is usually used to negotiate DMA +buffers. It is recommended to allow DMAbuf to flow without the +*GstCapsFeatures* *memory:DMABuf* if the DRM-modifier is linear. + +But also, in order to negotiate *memory:DMABuf* thoroughly, it's required +to match the DRM-modifiers between upstream and downstream. Otherwise video +sinks might end rendering wrong frames assuming linear access. + +Because DRM-fourcc and DRM-modifier are both necessary to render frames +DMABuf-backed, we now consider both as a pair and combine them together to +assure uniqueness. In caps, we use a *:* to link them together and write in +the mode of *FORMAT:MODIFIER*, which represents a totally new single video +format. For example, `NV12:0x0100000000000002` is a new video format +combined by video format NV12 and the modifier `0x0100000000000002`. It's +not NV12 and it's not its subset either. If no modifier present, we just +consider it as linear, namely, `NV12:0x0000000000000000` is equivalent to +NV12. Then, the intersection between the set of + +``` +{ NV12:0x0100000000000002, NV12:0x0000000000000000, ARGB:0x0100000000000001 } +``` + +and `{ NV12 }` should be + +``` +{ NV12:0x0000000000000000 } +``` + +While the intersection between the set of + +`{ ARGB }` and + +``` +{ NV12:0x0100000000000002, NV12:0x0000000000000000, ARGB:0x0100000000000001 } +``` + +should be empty. + +Please note that this form of video format only appears within +*memory:DMABuf* feature. It must not appear in any other video caps +feature. + +Unlike other type of video buffers, DMABuf frames might not be mappable and +its internal format is opaque to the user. Then, unless the modifier is +linear (0x0000000000000000) or some other well known tiled format such as +NV12_4L4, NV12_16L16, NV12_64Z32, NV12_16L32S, etc. (which are defined in +video-format.h), we always use `GST_VIDEO_FORMAT_ENCODED` in +`GstVideoFormat` enum to represent its video format. + +In order to not misuse this new format with the common video format, **in** +*memory:DMABuf* feature, *drm-format* field in caps will replace the +traditional *format* field. + +So a DMABuf-backed video caps may look like: + +``` + video/x-raw(memory:DMABuf), \ + drm-format=(string)NV12:0x0x0100000000000001, \ + width=(int)1920, \ + height=(int)1080, \ + interlace-mode=(string)progressive, \ + multiview-mode=(string)mono, \ + multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, \ + pixel-aspect-ratio=(fraction)1/1, \ + framerate=(fraction)24/1, \ + colorimetry=(string)bt709" +``` + +And when we call a video info API such as `gst_video_info_from_caps()` with +this caps, it should return an video format as `GST_VIDEO_FORMAT_ENCODED`, +leaving other fields unchanged as normal video caps. + +In addition, a new structure + +``` +struct GstDrmVideoInfo +{ + GstVideoInfo vinfo; + guint32 drm_fourcc; + guint64 drm_modifier; +}; +``` + +is introduced to represent more info of DMA video caps. User should use +this DMABuf related API such as `gst_drm_video_info_from_caps()` to recognize +the video format and parse the DMA info from caps. + + +## Meta data + +Besides the *file descriptors*, there may be a `GstVideoMeta` data attached +to each `GstBuffer` to describe more information such as the width, height, +pitches, strides and plane offsets for that DMA buffer (Please note that +the mandatory width and height information appears both in "caps" and here, +and they should be always equal). This kind of information is only obtained +by each module's API, such as the functions +`VkImageDrmFormatModifierExplicitCreateInfoEXT()` in Vulkan, and +`vaExportSurfaceHandle()` in VA-API. The information should be translated +into `GstVideoMeta`'s fields when the DMA buffer is created and +exported. These meta data is useful when other module wants to import the +DMA buffers. + +For example, we may create a `GstBuffer` using `vaExportSurfaceHandle()` +VA-API, and set each field of `GstVideoMeta` with information from +`VADRMPRIMESurfaceDescriptor`. Later, a downstream Vulkan element imports +these DMA buffers with `VkImageDrmFormatModifierExplicitCreateInfoEXT()`, +translating fields form buffer's `GstVideoMeta` into the +`VkSubresourceLayout` parameter. + +In short, the `GstVideoMeta` contains the common extra video information +about the DMA buffer, which can be interpreted by each module. + +Information in `GstVideoMeta` depends on the hardware context and +setting. Its values, such as stride and pitch, may differ from the standard +video format because of the hardware's requirement. For example, if a DMA +buffer represents a compressed video in memory, its pitch and stride may be +smaller than the standard linear one because of the compression. Please +remind that users should not use this meta data to interpret and access the +DMA buffer, **unless the modifier is linear**. + + +# Negotiation of DMA buffer + +If two elements of different modules (for example, VA-API decoder to +Wayland sink) want to transfer dmabufs, the negotiation should ensure a +common *drm-format* (FORMAT:MODIFIER). As we already illustrate how to +represent both of them in caps before, so the negotiation here in fact has +no special operation except finding the intersection. + + +## Static Template Caps + +If an element can list all the DRM fourcc/modifier composition at register +time, `gst-inspect` result should look like: + +``` +SRC template: 'src' + Availability: Always + Capabilities: + video/x-raw(memory:DMABuf) + width: [ 16, 16384 ] + height: [ 16, 16384 ] + drm-format: { (string)NV12:0x0100000000000001, (string)I420, (string)YV12, \ + (string)YUY2:0x0100000000000002, (string)P010_10LE:0x0100000000000002, \ + (string)BGRA:0x0100000000000002, (string)RGBA:0x0100000000000002, \ + (string)BGR10A2_LE:0x0100000000000002, (string)VUYA:0x0100000000000002 } +``` + +But because sometimes it is impossible to enumerate and list all +drm_fourcc/modifier composition in static templates (for example, we may +need a runtime context which is not available at register time to detect +the real modifers a HW can support), we can let the *drm-format* field +absent to mean the super set of all formats. + + +## Renegotiation + +Sometimes, a renegotiation may happen if the downstream element is not +pleased with the caps set by the upstream element. For example, some sink +element may not know the preferred DRM fourcc/modifier until the real +render target window is realized. Then, it will send a "reconfigure" event +to upstream element to require a renegotiation. At this round negotiation, +the downstream element will provide a more precise *drm-format* list. + + +## Example + +Consider the pipeline of: + +``` +vapostproc ! video/x-raw(memory:DMABuf) ! glupload +``` + +both `vapostproc` and `glupload` work on the same GPU. (DMABuf caps filter +is just for illustration, it doesn't need to be specified, since DMA +negotiation is well supported.) + +The VA-API based `vapostproc` element can detect the modifiers at the +element registration time and the src template should be: + +``` +SRC template: 'src' + Availability: Always + Capabilities: + video/x-raw(memory:DMABuf) + width: [ 16, 16384 ] + height: [ 16, 16384 ] + drm-format: { (string)NV12:0x0100000000000001, (string)NV12, \ + (string)I420, (string)YV12, \ + (string)BGRA:0x0100000000000002 } +``` + +While `glupload` needs the runtime EGL context to check the DRM fourcc and +modifiers, so it can just leave the *drm-format* field absent in its sink +template: + +``` +SINK template: 'sink' + Availability: Always + Capabilities: + video/x-raw(memory:DMABuf) + width: [ 1, 2147483647 ] + height: [ 1, 2147483647 ] +``` + +At runtime, when the `vapostproc` wants to decide its src caps, it first +query the downstream `glupload` element about all possible DMA caps. The +`glupload` should answer that query based on the GL/EGL query result, such +as: + +``` +drm-format: { (string)NV12:0x0100000000000001, (string)BGRA } +``` + +So, the intersection with `vapostproc`'s src caps will be +`NV12:0x0100000000000001`. It will be the sent to downstream (`glupload`) +by a CAPS event. The `vapostproc` element may also query the allocation +after that CAPS event, but downstream `glupload` will not provide a DMA +buffer pool because EGL API is mostly for DMAbuf importing. Then +`vapostproc` will create its own DMA pool, the buffers created from that +new pool should conform *drm-format*, described in this document, with +`NV12:0x0100000000000001`. Also, the downstream `glupload` should make sure +that it can import other DMA buffers which are not created in the pool it +provided, as long as they conform with *drm-format* +`NV12:0x0100000000000001`. + +Then, when `vapostproc` handles each frame, it creates GPU surfaces with +*drm-format* `NV12:0x0100000000000001`. Each surface is also exported as a +set of file descriptors, each one wrapped in `GstMemory` allocated by a +subclass of `GstDmaBufAllocator`. All the `GstMemory` are appended to a +`GstBuffer`. There may be some extra information about the pitch, stride +and plane offset when we export the surface, we also need to translate them +into `GstVideoMeta` and attached it to the `GstBuffer`. + +Later `glupload`, when it receives a `GstBuffer`, it can use those file +descriptors with *drm-format* `NV12:0x0100000000000001` to import an +EGLImage. If the `GstVideoMeta` exists, this extra parameters should also +be provided to the importing API.