mirror of
https://gitlab.freedesktop.org/gstreamer/gstreamer.git
synced 2025-01-31 03:29:50 +00:00
design: move over design docs from gst-plugins-base
Or most of them anyway (excl. draft-hw-acceleration and draft-va which didn't seem particularly pertinent).
This commit is contained in:
parent
a3fe9f6a7d
commit
aff7ad1080
13 changed files with 3475 additions and 0 deletions
129
markdown/design/audiosinks.md
Normal file
129
markdown/design/audiosinks.md
Normal file
|
@ -0,0 +1,129 @@
|
|||
## Audiosink design
|
||||
|
||||
### Requirements
|
||||
|
||||
- must operate chain based. Most simple playback pipelines will push
|
||||
audio from the decoders into the audio sink.
|
||||
|
||||
- must operate getrange based Most professional audio applications
|
||||
will operate in a mode where the audio sink pulls samples from the
|
||||
pipeline. This is typically done in a callback from the audiosink
|
||||
requesting N samples. The callback is either scheduled from a thread
|
||||
or from an interrupt from the audio hardware device.
|
||||
|
||||
- Exact sample accurate clocks. the audiosink must be able to provide
|
||||
a clock that is sample accurate even if samples are dropped or when
|
||||
discontinuities are found in the stream.
|
||||
|
||||
- Exact timing of playback. The audiosink must be able to play samples
|
||||
at their exact times.
|
||||
|
||||
- use DMA access when possible. When the hardware can do DMA we should
|
||||
use it. This should also work over bufferpools to avoid data copying
|
||||
to/from kernel space.
|
||||
|
||||
### Design
|
||||
|
||||
The design is based on a set of base classes and the concept of a
|
||||
ringbuffer of samples.
|
||||
|
||||
+-----------+ - provide preroll, rendering, timing
|
||||
+ basesink + - caps nego
|
||||
+-----+-----+
|
||||
|
|
||||
+-----V----------+ - manages ringbuffer
|
||||
+ audiobasesink + - manages scheduling (push/pull)
|
||||
+-----+----------+ - manages clock/query/seek
|
||||
| - manages scheduling of samples in the ringbuffer
|
||||
| - manages caps parsing
|
||||
|
|
||||
+-----V------+ - default ringbuffer implementation with a GThread
|
||||
+ audiosink + - subclasses provide open/read/close methods
|
||||
+------------+
|
||||
|
||||
The ringbuffer is a contiguous piece of memory divided into segtotal
|
||||
pieces of segments. Each segment has segsize bytes.
|
||||
|
||||
play position
|
||||
v
|
||||
+---+---+---+-------------------------------------+----------+
|
||||
+ 0 | 1 | 2 | .... | segtotal |
|
||||
+---+---+---+-------------------------------------+----------+
|
||||
<--->
|
||||
segsize bytes = N samples * bytes_per_sample.
|
||||
|
||||
The ringbuffer has a play position, which is expressed in segments. The
|
||||
play position is where the device is currently reading samples from the
|
||||
buffer.
|
||||
|
||||
The ringbuffer can be put to the PLAYING or STOPPED state.
|
||||
|
||||
In the STOPPED state no samples are played to the device and the play
|
||||
pointer does not advance.
|
||||
|
||||
In the PLAYING state samples are written to the device and the
|
||||
ringbuffer should call a configurable callback after each segment is
|
||||
written to the device. In this state the play pointer is advanced after
|
||||
each segment is written.
|
||||
|
||||
A write operation to the ringbuffer will put new samples in the
|
||||
ringbuffer. If there is not enough space in the ringbuffer, the write
|
||||
operation will block. The playback of the buffer never stops, even if
|
||||
the buffer is empty. When the buffer is empty, silence is played by the
|
||||
device.
|
||||
|
||||
The ringbuffer is implemented with lockfree atomic operations,
|
||||
especially on the reading side so that low-latency operations are
|
||||
possible.
|
||||
|
||||
Whenever new samples are to be put into the ringbuffer, the position of
|
||||
the read pointer is taken. The required write position is taken and the
|
||||
diff is made between the required and actual position. If the difference
|
||||
is \<0, the sample is too late. If the difference is bigger than
|
||||
segtotal, the writing part has to wait for the play pointer to advance.
|
||||
|
||||
### Scheduling
|
||||
|
||||
#### chain based mode
|
||||
|
||||
In chain based mode, bytes are written into the ringbuffer. This
|
||||
operation will eventually block when the ringbuffer is filled.
|
||||
|
||||
When no samples arrive in time, the ringbuffer will play silence. Each
|
||||
buffer that arrives will be placed into the ringbuffer at the correct
|
||||
times. This means that dropping samples or inserting silence is done
|
||||
automatically and very accurate and independend of the play pointer.
|
||||
|
||||
In this mode, the ringbuffer is usually kept as full as possible. When
|
||||
using a small buffer (small segsize and segtotal), the latency for audio
|
||||
to start from the sink to when it is played can be kept low but at least
|
||||
one context switch has to be made between read and write.
|
||||
|
||||
#### getrange based mode
|
||||
|
||||
In getrange based mode, the audiobasesink will use the callback
|
||||
function of the ringbuffer to get a segsize samples from the peer
|
||||
element. These samples will then be placed in the ringbuffer at the
|
||||
next play position. It is assumed that the getrange function returns
|
||||
fast enough to fill the ringbuffer before the play pointer reaches
|
||||
the write pointer.
|
||||
|
||||
In this mode, the ringbuffer is usually kept as empty as possible.
|
||||
There is no context switch needed between the elements that create
|
||||
the samples and the actual writing of the samples to the device.
|
||||
|
||||
#### DMA mode
|
||||
|
||||
Elements that can do DMA based access to the audio device have to
|
||||
subclass from the GstAudioBaseSink class and wrap the DMA ringbuffer
|
||||
in a subclass of GstRingBuffer.
|
||||
|
||||
The ringbuffer subclass should trigger a callback after writing or
|
||||
playing each sample to the device. This callback can be triggered
|
||||
from a thread or from a signal from the audio device.
|
||||
|
||||
### Clocks
|
||||
|
||||
The GstAudioBaseSink class will use the ringbuffer to act as a clock
|
||||
provider. It can do this by using the play pointer and the delay to
|
||||
calculate the clock time.
|
264
markdown/design/decodebin.md
Normal file
264
markdown/design/decodebin.md
Normal file
|
@ -0,0 +1,264 @@
|
|||
# Decodebin design
|
||||
|
||||
## GstDecodeBin
|
||||
|
||||
### Description
|
||||
|
||||
- Autoplug and decode to raw media
|
||||
|
||||
- Input: single pad with ANY caps
|
||||
|
||||
- Output: Dynamic pads
|
||||
|
||||
### Contents
|
||||
|
||||
- a GstTypeFindElement connected to the single sink pad
|
||||
|
||||
- optionally a demuxer/parser
|
||||
|
||||
- optionally one or more DecodeGroup
|
||||
|
||||
### Autoplugging
|
||||
|
||||
The goal is to reach 'target' caps (by default raw media).
|
||||
|
||||
This is done by using the GstCaps of a source pad and finding the
|
||||
available demuxers/decoders GstElement that can be linked to that pad.
|
||||
|
||||
The process starts with the source pad of typefind and stops when no
|
||||
more non-target caps are left. It is commonly done while pre-rolling,
|
||||
but can also happen whenever a new pad appears on any element.
|
||||
|
||||
Once a target caps has been found, that pad is ghosted and the
|
||||
'pad-added' signal is emitted.
|
||||
|
||||
If no compatible elements can be found for a GstCaps, the pad is ghosted
|
||||
and the 'unknown-type' signal is emitted.
|
||||
|
||||
### Assisted auto-plugging
|
||||
|
||||
When starting the auto-plugging process for a given GstCaps, two signals
|
||||
are emitted in the following way in order to allow the application/user
|
||||
to assist or fine-tune the process.
|
||||
|
||||
- **'autoplug-continue'**:
|
||||
|
||||
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
|
||||
|
||||
This signal is fired at the very beginning with the source pad GstCaps. If
|
||||
the callback returns TRUE, the process continues normally. If the
|
||||
callback returns FALSE, then the GstCaps are considered as a target caps
|
||||
and the autoplugging process stops.
|
||||
|
||||
- **'autoplug-factories'**:
|
||||
|
||||
GValueArray user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps);
|
||||
|
||||
Get a list of elementfactories for @pad with @caps. This function is
|
||||
used to instruct decodebin2 of the elements it should try to
|
||||
autoplug. The default behaviour when this function is not overriden
|
||||
is to get all elements that can handle @caps from the registry
|
||||
sorted by rank.
|
||||
|
||||
- **'autoplug-select'**:
|
||||
|
||||
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps*caps, GValueArray* factories);
|
||||
|
||||
This signal is fired once autoplugging has got a list of compatible
|
||||
GstElementFactory. The signal is emitted with the GstCaps of the
|
||||
source pad and a pointer on the GValueArray of compatible factories.
|
||||
|
||||
The callback should return the index of the elementfactory in
|
||||
@factories that should be tried next.
|
||||
|
||||
If the callback returns -1, the autoplugging process will stop as if
|
||||
no compatible factories were found.
|
||||
|
||||
The default implementation of this function will try to autoplug the
|
||||
first factory of the list.
|
||||
|
||||
### Target Caps
|
||||
|
||||
The target caps are a read/write GObject property of decodebin.
|
||||
|
||||
By default the target caps are:
|
||||
|
||||
- Raw audio: audio/x-raw
|
||||
|
||||
- Raw video: video/x-raw
|
||||
|
||||
- Raw text: text/x-raw, format={utf8,pango-markup}
|
||||
|
||||
### Media chain/group handling
|
||||
|
||||
When autoplugging, all streams coming out of a demuxer will be grouped
|
||||
in a DecodeGroup.
|
||||
|
||||
All new source pads created on that demuxer after it has emitted the
|
||||
'no-more-pads' signal will be put in another DecodeGroup.
|
||||
|
||||
Only one decodegroup can be active at any given time. If a new
|
||||
decodegroup is created while another one exists, that decodegroup will
|
||||
be set as blocking until the existing one has drained.
|
||||
|
||||
## DecodeGroup
|
||||
|
||||
### Description
|
||||
|
||||
Streams belonging to the same group/chain of a media file.
|
||||
|
||||
### Contents
|
||||
|
||||
The DecodeGroup contains:
|
||||
|
||||
- a GstMultiQueue to which all streams of a the media group are connected.
|
||||
|
||||
- the eventual decoders which are autoplugged in order to produce the
|
||||
requested target pads.
|
||||
|
||||
### Proper group draining
|
||||
|
||||
The DecodeGroup takes care that all the streams in the group are
|
||||
completely drained (EOS has come through all source ghost pads).
|
||||
|
||||
### Pre-roll and block
|
||||
|
||||
The DecodeGroup has a global blocking feature. If enabled, all the
|
||||
ghosted source pads for that group will be blocked.
|
||||
|
||||
A method is available to unblock all blocked pads for that group.
|
||||
|
||||
## GstMultiQueue
|
||||
|
||||
Multiple input-output data queue.
|
||||
|
||||
`multiqueue` achieves the same functionality as `queue`, with a
|
||||
few differences:
|
||||
|
||||
- Multiple streams handling.
|
||||
|
||||
The element handles queueing data on more than one stream at once.
|
||||
To achieve such a feature it has request sink pads (sink\_%u) and
|
||||
'sometimes' src pads (src\_%u).
|
||||
|
||||
When requesting a given sinkpad, the associated srcpad for that
|
||||
stream will be created. Ex: requesting sink\_1 will generate src\_1.
|
||||
|
||||
- Non-starvation on multiple streams.
|
||||
|
||||
If more than one stream is used with the element, the streams'
|
||||
queues will be dynamically grown (up to a limit), in order to ensure
|
||||
that no stream is risking data starvation. This guarantees that at
|
||||
any given time there are at least N bytes queued and available for
|
||||
each individual stream.
|
||||
|
||||
If an EOS event comes through a srcpad, the associated queue should
|
||||
be considered as 'not-empty' in the queue-size-growing algorithm.
|
||||
|
||||
- Non-linked srcpads graceful handling.
|
||||
|
||||
A GstTask is started for all srcpads when going to
|
||||
GST\_STATE\_PAUSED.
|
||||
|
||||
The task are blocking against a GCondition which will be fired in
|
||||
two different cases:
|
||||
|
||||
- When the associated queue has received a buffer.
|
||||
|
||||
- When the associated queue was previously declared as 'not-linked'
|
||||
and the first buffer of the queue is scheduled to be pushed
|
||||
synchronously in relation to the order in which it arrived globally
|
||||
in the element (see 'Synchronous data pushing' below).
|
||||
|
||||
When woken up by the GCondition, the GstTask will try to push the
|
||||
next GstBuffer/GstEvent on the queue. If pushing the
|
||||
GstBuffer/GstEvent returns GST\_FLOW\_NOT\_LINKED, then the
|
||||
associated queue is marked as 'not-linked'. If pushing the
|
||||
GstBuffer/GstEvent succeeded the queue will no longer be marked as
|
||||
'not-linked'.
|
||||
|
||||
If pushing on all srcpads returns GstFlowReturn different from
|
||||
GST\_FLOW\_OK, then all the srcpads' tasks are stopped and
|
||||
subsequent pushes on sinkpads will return GST\_FLOW\_NOT\_LINKED.
|
||||
|
||||
- Synchronous data pushing for non-linked pads.
|
||||
|
||||
In order to better support dynamic switching between streams, the
|
||||
multiqueue (unlike the current GStreamer queue) continues to push
|
||||
buffers on non-linked pads rather than shutting down.
|
||||
|
||||
In addition, to prevent a non-linked stream from very quickly
|
||||
consuming all available buffers and thus 'racing ahead' of the other
|
||||
streams, the element must ensure that buffers and inlined events for
|
||||
a non-linked stream are pushed in the same order as they were
|
||||
received, relative to the other streams controlled by the element.
|
||||
This means that a buffer cannot be pushed to a non-linked pad any
|
||||
sooner than buffers in any other stream which were received before
|
||||
it.
|
||||
|
||||
## Parsers, decoders and auto-plugging
|
||||
|
||||
This section has DRAFT status.
|
||||
|
||||
Some media formats come in different "flavours" or "stream formats".
|
||||
These formats differ in the way the setup data and media data is
|
||||
signalled and/or packaged. An example for this is H.264 video, where
|
||||
there is a bytestream format (with codec setup data signalled inline and
|
||||
units prefixed by a sync code and packet length information) and a "raw"
|
||||
format where codec setup data is signalled out of band (via the caps)
|
||||
and the chunking is implicit in the way the buffers were muxed into a
|
||||
container, to mention just two of the possible variants.
|
||||
|
||||
Especially on embedded platforms it is common that decoders can only
|
||||
handle one particular stream format, and not all of them.
|
||||
|
||||
Where there are multiple stream formats, parsers are usually expected to
|
||||
be able to convert between the different formats. This will, if
|
||||
implemented correctly, work as expected in a static pipeline such as
|
||||
|
||||
... ! parser ! decoder ! sink
|
||||
|
||||
where the parser can query the decoder's capabilities even before
|
||||
processing the first piece of data, and configure itself to convert
|
||||
accordingly, if conversion is needed at all.
|
||||
|
||||
In an auto-plugging context this is not so straight-forward though,
|
||||
because elements are plugged incrementally and not before the previous
|
||||
element has processes some data and decided what it will output exactly
|
||||
(unless the template caps are completely fixed, then it can continue
|
||||
right away, this is not always the case here though, see below). A
|
||||
parser will thus have to decide on *some* output format so auto-plugging
|
||||
can continue. It doesn't know anything about the available decoders and
|
||||
their capabilities though, so it's possible that it will choose a format
|
||||
that is not supported by any of the available decoders, or by the
|
||||
preferred decoder.
|
||||
|
||||
If the parser had sufficiently concise but fixed source pad template
|
||||
caps, decodebin could continue to plug a decoder right away, allowing
|
||||
the parser to configure itself in the same way as it would with a static
|
||||
pipeline. This is not an option, unfortunately, because often the parser
|
||||
needs to process some data to determine e.g. the format's profile or
|
||||
other stream properties (resolution, sample rate, channel configuration,
|
||||
etc.), and there may be different decoders for different profiles (e.g.
|
||||
DSP codec for baseline profile, and software fallback for main/high
|
||||
profile; or a DSP codec only supporting certain resolutions, with a
|
||||
software fallback for unusual resolutions). So if decodebin just plugged
|
||||
the most highest-ranking decoder, that decoder might not be be able to
|
||||
handle the actual stream later on, which would yield an error (this is a
|
||||
data flow error then which would be hard to intercept and avoid in
|
||||
decodebin). In other words, we can't solve this issue by plugging a
|
||||
decoder right away with the parser.
|
||||
|
||||
So decodebin needs to communicate to the parser the set of available
|
||||
decoder caps (which would contain the relevant capabilities/restrictions
|
||||
such as supported profiles, resolutions, etc.), after the usual
|
||||
"autoplug-\*" signal filtering/sorting of course.
|
||||
|
||||
This is done by plugging a capsfilter element right after the parser,
|
||||
and constructing set of filter caps from the list of available decoders
|
||||
(one appends at the end just the name(s) of the caps structures from the
|
||||
parser pad template caps to function as an 'ANY other' caps equivalent).
|
||||
This let the parser negotiate to a supported stream format in the same
|
||||
way as with the static pipeline mentioned above, but of course incur
|
||||
some overhead through the additional capsfilter element.
|
||||
|
469
markdown/design/encoding.md
Normal file
469
markdown/design/encoding.md
Normal file
|
@ -0,0 +1,469 @@
|
|||
## Encoding and Muxing
|
||||
|
||||
## Problems this proposal attempts to solve
|
||||
|
||||
- Duplication of pipeline code for gstreamer-based applications
|
||||
wishing to encode and or mux streams, leading to subtle differences
|
||||
and inconsistencies across those applications.
|
||||
|
||||
- No unified system for describing encoding targets for applications
|
||||
in a user-friendly way.
|
||||
|
||||
- No unified system for creating encoding targets for applications,
|
||||
resulting in duplication of code across all applications,
|
||||
differences and inconsistencies that come with that duplication, and
|
||||
applications hardcoding element names and settings resulting in poor
|
||||
portability.
|
||||
|
||||
## Goals
|
||||
|
||||
1. Convenience encoding element
|
||||
|
||||
Create a convenience GstBin for encoding and muxing several streams,
|
||||
hereafter called 'EncodeBin'.
|
||||
|
||||
This element will only contain one single property, which is a profile.
|
||||
|
||||
2. Define a encoding profile system
|
||||
|
||||
3. Encoding profile helper library
|
||||
|
||||
Create a helper library to:
|
||||
|
||||
- create EncodeBin instances based on profiles, and
|
||||
|
||||
- help applications to create/load/save/browse those profiles.
|
||||
|
||||
## EncodeBin
|
||||
|
||||
### Proposed API
|
||||
|
||||
EncodeBin is a GstBin subclass.
|
||||
|
||||
It implements the GstTagSetter interface, by which it will proxy the
|
||||
calls to the muxer.
|
||||
|
||||
Only two introspectable property (i.e. usable without extra API):
|
||||
- A GstEncodingProfile
|
||||
- The name of the profile to use
|
||||
|
||||
When a profile is selected, encodebin will:
|
||||
|
||||
- Add REQUEST sinkpads for all the GstStreamProfile
|
||||
- Create the muxer and expose the source pad
|
||||
|
||||
Whenever a request pad is created, encodebin will:
|
||||
|
||||
- Create the chain of elements for that pad
|
||||
- Ghost the sink pad
|
||||
- Return that ghost pad
|
||||
|
||||
This allows reducing the code to the minimum for applications wishing to
|
||||
encode a source for a given profile:
|
||||
|
||||
encbin = gst_element_factory_make ("encodebin, NULL);
|
||||
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
|
||||
gst_element_link (encbin, filesink);
|
||||
|
||||
vsrcpad = gst_element_get_src_pad (source, "src1");
|
||||
vsinkpad = gst_element_get_request\_pad (encbin, "video\_%u");
|
||||
gst_pad_link (vsrcpad, vsinkpad);
|
||||
|
||||
### Explanation of the Various stages in EncodeBin
|
||||
|
||||
This describes the various stages which can happen in order to end up
|
||||
with a multiplexed stream that can then be stored or streamed.
|
||||
|
||||
#### Incoming streams
|
||||
|
||||
The streams fed to EncodeBin can be of various types:
|
||||
|
||||
- Video
|
||||
- Uncompressed (but maybe subsampled)
|
||||
- Compressed
|
||||
- Audio
|
||||
- Uncompressed (audio/x-raw)
|
||||
- Compressed
|
||||
- Timed text
|
||||
- Private streams
|
||||
|
||||
#### Steps involved for raw video encoding
|
||||
|
||||
0) Incoming Stream
|
||||
|
||||
1) Transform raw video feed (optional)
|
||||
|
||||
Here we modify the various fundamental properties of a raw video stream
|
||||
to be compatible with the intersection of: \* The encoder GstCaps and \*
|
||||
The specified "Stream Restriction" of the profile/target
|
||||
|
||||
The fundamental properties that can be modified are: \* width/height
|
||||
This is done with a video scaler. The DAR (Display Aspect Ratio) MUST be
|
||||
respected. If needed, black borders can be added to comply with the
|
||||
target DAR. \* framerate \* format/colorspace/depth All of this is done
|
||||
with a colorspace converter
|
||||
|
||||
2) Actual encoding (optional for raw streams)
|
||||
|
||||
An encoder (with some optional settings) is used.
|
||||
|
||||
3) Muxing
|
||||
|
||||
A muxer (with some optional settings) is used.
|
||||
|
||||
4) Outgoing encoded and muxed stream
|
||||
|
||||
#### Steps involved for raw audio encoding
|
||||
|
||||
This is roughly the same as for raw video, expect for (1)
|
||||
|
||||
1) Transform raw audo feed (optional)
|
||||
|
||||
We modify the various fundamental properties of a raw audio stream to be
|
||||
compatible with the intersection of: \* The encoder GstCaps and \* The
|
||||
specified "Stream Restriction" of the profile/target
|
||||
|
||||
The fundamental properties that can be modifier are: \* Number of
|
||||
channels \* Type of raw audio (integer or floating point) \* Depth
|
||||
(number of bits required to encode one sample)
|
||||
|
||||
#### Steps involved for encoded audio/video streams
|
||||
|
||||
Steps (1) and (2) are replaced by a parser if a parser is available for
|
||||
the given format.
|
||||
|
||||
#### Steps involved for other streams
|
||||
|
||||
Other streams will just be forwarded as-is to the muxer, provided the
|
||||
muxer accepts the stream type.
|
||||
|
||||
## Encoding Profile System
|
||||
|
||||
This work is based on:
|
||||
|
||||
- The existing [GstPreset API documentation][gst-preset] system for elements
|
||||
|
||||
- The gnome-media [GConf audio profile system][gconf-audio-profile]
|
||||
|
||||
- The investigation done into device profiles by Arista and
|
||||
Transmageddon: [Research on a Device Profile API][device-profile-api],
|
||||
and [Research on defining presets usage][preset-usage].
|
||||
|
||||
### Terminology
|
||||
|
||||
- Encoding Target Category A Target Category is a classification of
|
||||
devices/systems/use-cases for encoding.
|
||||
|
||||
Such a classification is required in order for: \* Applications with a
|
||||
very-specific use-case to limit the number of profiles they can offer
|
||||
the user. A screencasting application has no use with the online
|
||||
services targets for example. \* Offering the user some initial
|
||||
classification in the case of a more generic encoding application (like
|
||||
a video editor or a transcoder).
|
||||
|
||||
Ex: Consumer devices Online service Intermediate Editing Format
|
||||
Screencast Capture Computer
|
||||
|
||||
- Encoding Profile Target A Profile Target describes a specific entity
|
||||
for which we wish to encode. A Profile Target must belong to at
|
||||
least one Target Category. It will define at least one Encoding
|
||||
Profile.
|
||||
|
||||
Examples (with category): Nokia N900 (Consumer device) Sony PlayStation 3
|
||||
(Consumer device) Youtube (Online service) DNxHD (Intermediate editing
|
||||
format) HuffYUV (Screencast) Theora (Computer)
|
||||
|
||||
- Encoding Profile A specific combination of muxer, encoders, presets
|
||||
and limitations.
|
||||
|
||||
Examples: Nokia N900/H264 HQ, Ipod/High Quality, DVD/Pal,
|
||||
Youtube/High Quality HTML5/Low Bandwith, DNxHD
|
||||
|
||||
### Encoding Profile
|
||||
|
||||
An encoding profile requires the following information:
|
||||
|
||||
- Name This string is not translatable and must be unique. A
|
||||
recommendation to guarantee uniqueness of the naming could be:
|
||||
<target>/<name>
|
||||
- Description This is a translatable string describing the profile
|
||||
- Muxing format This is a string containing the GStreamer media-type
|
||||
of the container format.
|
||||
- Muxing preset This is an optional string describing the preset(s) to
|
||||
use on the muxer.
|
||||
- Multipass setting This is a boolean describing whether the profile
|
||||
requires several passes.
|
||||
- List of Stream Profile
|
||||
|
||||
2.3.1 Stream Profiles
|
||||
|
||||
A Stream Profile consists of:
|
||||
|
||||
- Type The type of stream profile (audio, video, text, private-data)
|
||||
- Encoding Format This is a string containing the GStreamer media-type
|
||||
of the encoding format to be used. If encoding is not to be applied,
|
||||
the raw audio media type will be used.
|
||||
- Encoding preset This is an optional string describing the preset(s)
|
||||
to use on the encoder.
|
||||
- Restriction This is an optional GstCaps containing the restriction
|
||||
of the stream that can be fed to the encoder. This will generally
|
||||
containing restrictions in video width/heigh/framerate or audio
|
||||
depth.
|
||||
- presence This is an integer specifying how many streams can be used
|
||||
in the containing profile. 0 means that any number of streams can be
|
||||
used.
|
||||
- pass This is an integer which is only meaningful if the multipass
|
||||
flag has been set in the profile. If it has been set it indicates
|
||||
which pass this Stream Profile corresponds to.
|
||||
|
||||
### 2.4 Example profile
|
||||
|
||||
The representation used here is XML only as an example. No decision is
|
||||
made as to which formatting to use for storing targets and profiles.
|
||||
|
||||
<gst-encoding-target>
|
||||
<name>Nokia N900</name>
|
||||
<category>Consumer Device</category>
|
||||
<profiles>
|
||||
<profile>Nokia N900/H264 HQ</profile>
|
||||
<profile>Nokia N900/MP3</profile>
|
||||
<profile>Nokia N900/AAC</profile>
|
||||
</profiles>
|
||||
</gst-encoding-target>
|
||||
|
||||
<gst-encoding-profile>
|
||||
<name>Nokia N900/H264 HQ</name>
|
||||
<description>
|
||||
High Quality H264/AAC for the Nokia N900
|
||||
</description>
|
||||
<format>video/quicktime,variant=iso</format>
|
||||
<streams>
|
||||
<stream-profile>
|
||||
<type>audio</type>
|
||||
<format>audio/mpeg,mpegversion=4</format>
|
||||
<preset>Quality High/Main</preset>
|
||||
<restriction>audio/x-raw,channels=[1,2]</restriction>
|
||||
<presence>1</presence>
|
||||
</stream-profile>
|
||||
<stream-profile>
|
||||
<type>video</type>
|
||||
<format>video/x-h264</format>
|
||||
<preset>Profile Baseline/Quality High</preset>
|
||||
<restriction>
|
||||
video/x-raw,width=[16, 800],\
|
||||
height=[16, 480],framerate=[1/1, 30000/1001]
|
||||
</restriction>
|
||||
<presence>1</presence>
|
||||
</stream-profile>
|
||||
</streams>
|
||||
</gst-encoding-profile>
|
||||
|
||||
### API
|
||||
|
||||
A proposed C API is contained in the gstprofile.h file in this
|
||||
directory.
|
||||
|
||||
### Modifications required in the existing GstPreset system
|
||||
|
||||
#### Temporary preset.
|
||||
|
||||
Currently a preset needs to be saved on disk in order to be used.
|
||||
|
||||
This makes it impossible to have temporary presets (that exist only
|
||||
during the lifetime of a process), which might be required in the new
|
||||
proposed profile system
|
||||
|
||||
#### Categorisation of presets.
|
||||
|
||||
Currently presets are just aliases of a group of property/value without
|
||||
any meanings or explanation as to how they exclude each other.
|
||||
|
||||
Take for example the H264 encoder. It can have presets for: \* passes
|
||||
(1,2 or 3 passes) \* profiles (Baseline, Main, ...) \* quality (Low,
|
||||
medium, High)
|
||||
|
||||
In order to programmatically know which presets exclude each other, we
|
||||
here propose the categorisation of these presets.
|
||||
|
||||
This can be done in one of two ways 1. in the name (by making the name
|
||||
be \[<category>:\]<name>) This would give for example: "Quality:High",
|
||||
"Profile:Baseline" 2. by adding a new \_meta key This would give for
|
||||
example: \_meta/category:quality
|
||||
|
||||
#### Aggregation of presets.
|
||||
|
||||
There can be more than one choice of presets to be done for an element
|
||||
(quality, profile, pass).
|
||||
|
||||
This means that one can not currently describe the full configuration of
|
||||
an element with a single string but with many.
|
||||
|
||||
The proposal here is to extend the GstPreset API to be able to set all
|
||||
presets using one string and a well-known separator ('/').
|
||||
|
||||
This change only requires changes in the core preset handling code.
|
||||
|
||||
This would allow doing the following: gst\_preset\_load\_preset
|
||||
(h264enc, "pass:1/profile:baseline/quality:high");
|
||||
|
||||
### Points to be determined
|
||||
|
||||
This document hasn't determined yet how to solve the following problems:
|
||||
|
||||
#### Storage of profiles
|
||||
|
||||
One proposal for storage would be to use a system wide directory (like
|
||||
$prefix/share/gstreamer-0.10/profiles) and store XML files for every
|
||||
individual profiles.
|
||||
|
||||
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
|
||||
|
||||
This poses some limitations as to what to do if some applications want
|
||||
to have some profiles limited to their own usage.
|
||||
|
||||
## Helper library for profiles
|
||||
|
||||
These helper methods could also be added to existing libraries (like
|
||||
GstPreset, GstPbUtils, ..).
|
||||
|
||||
The various API proposed are in the accompanying gstprofile.h file.
|
||||
|
||||
### Getting user-readable names for formats
|
||||
|
||||
This is already provided by GstPbUtils.
|
||||
|
||||
### Hierarchy of profiles
|
||||
|
||||
The goal is for applications to be able to present to the user a list of
|
||||
combo-boxes for choosing their output profile:
|
||||
|
||||
\[ Category \] \# optional, depends on the application \[ Device/Site/..
|
||||
\] \# optional, depends on the application \[ Profile \]
|
||||
|
||||
Convenience methods are offered to easily get lists of categories,
|
||||
devices, and profiles.
|
||||
|
||||
### Creating Profiles
|
||||
|
||||
The goal is for applications to be able to easily create profiles.
|
||||
|
||||
The applications needs to be able to have a fast/efficient way to: \*
|
||||
select a container format and see all compatible streams he can use with
|
||||
it. \* select a codec format and see which container formats he can use
|
||||
with it.
|
||||
|
||||
The remaining parts concern the restrictions to encoder input.
|
||||
|
||||
### Ensuring availability of plugins for Profiles
|
||||
|
||||
When an application wishes to use a Profile, it should be able to query
|
||||
whether it has all the needed plugins to use it.
|
||||
|
||||
This part will use GstPbUtils to query, and if needed install the
|
||||
missing plugins through the installed distribution plugin installer.
|
||||
|
||||
## Use-cases researched
|
||||
|
||||
This is a list of various use-cases where encoding/muxing is being used.
|
||||
|
||||
### Transcoding
|
||||
|
||||
The goal is to convert with as minimal loss of quality any input file
|
||||
for a target use. A specific variant of this is transmuxing (see below).
|
||||
|
||||
Example applications: Arista, Transmageddon
|
||||
|
||||
### Rendering timelines
|
||||
|
||||
The incoming streams are a collection of various segments that need to
|
||||
be rendered. Those segments can vary in nature (i.e. the video
|
||||
width/height can change). This requires the use of identiy with the
|
||||
single-segment property activated to transform the incoming collection
|
||||
of segments to a single continuous segment.
|
||||
|
||||
Example applications: PiTiVi, Jokosher
|
||||
|
||||
### Encoding of live sources
|
||||
|
||||
The major risk to take into account is the encoder not encoding the
|
||||
incoming stream fast enough. This is outside of the scope of encodebin,
|
||||
and should be solved by using queues between the sources and encodebin,
|
||||
as well as implementing QoS in encoders and sources (the encoders
|
||||
emitting QoS events, and the upstream elements adapting themselves
|
||||
accordingly).
|
||||
|
||||
Example applications: camerabin, cheese
|
||||
|
||||
### Screencasting applications
|
||||
|
||||
This is similar to encoding of live sources. The difference being that
|
||||
due to the nature of the source (size and amount/frequency of updates)
|
||||
one might want to do the encoding in two parts: \* The actual live
|
||||
capture is encoded with a 'almost-lossless' codec (such as huffyuv) \*
|
||||
Once the capture is done, the file created in the first step is then
|
||||
rendered to the desired target format.
|
||||
|
||||
Fixing sources to only emit region-updates and having encoders capable
|
||||
of encoding those streams would fix the need for the first step but is
|
||||
outside of the scope of encodebin.
|
||||
|
||||
Example applications: Istanbul, gnome-shell, recordmydesktop
|
||||
|
||||
### Live transcoding
|
||||
|
||||
This is the case of an incoming live stream which will be
|
||||
broadcasted/transmitted live. One issue to take into account is to
|
||||
reduce the encoding latency to a minimum. This should mostly be done by
|
||||
picking low-latency encoders.
|
||||
|
||||
Example applications: Rygel, Coherence
|
||||
|
||||
### Transmuxing
|
||||
|
||||
Given a certain file, the aim is to remux the contents WITHOUT decoding
|
||||
into either a different container format or the same container format.
|
||||
Remuxing into the same container format is useful when the file was not
|
||||
created properly (for example, the index is missing). Whenever
|
||||
available, parsers should be applied on the encoded streams to validate
|
||||
and/or fix the streams before muxing them.
|
||||
|
||||
Metadata from the original file must be kept in the newly created file.
|
||||
|
||||
Example applications: Arista, Transmaggedon
|
||||
|
||||
### Loss-less cutting
|
||||
|
||||
Given a certain file, the aim is to extract a certain part of the file
|
||||
without going through the process of decoding and re-encoding that file.
|
||||
This is similar to the transmuxing use-case.
|
||||
|
||||
Example applications: PiTiVi, Transmageddon, Arista, ...
|
||||
|
||||
### Multi-pass encoding
|
||||
|
||||
Some encoders allow doing a multi-pass encoding. The initial pass(es)
|
||||
are only used to collect encoding estimates and are not actually muxed
|
||||
and outputted. The final pass uses previously collected information, and
|
||||
the output is then muxed and outputted.
|
||||
|
||||
### Archiving and intermediary format
|
||||
|
||||
The requirement is to have lossless
|
||||
|
||||
### CD ripping
|
||||
|
||||
Example applications: Sound-juicer
|
||||
|
||||
### DVD ripping
|
||||
|
||||
Example application: Thoggen
|
||||
|
||||
### Research links
|
||||
|
||||
Some of these are still active documents, some other not
|
||||
|
||||
[gst-preset]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
|
||||
[gconf-audio-profile]: http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
|
||||
[device-profile-api]: http://gstreamer.freedesktop.org/wiki/DeviceProfile (FIXME: wiki is gone)
|
||||
[preset-usage]: http://gstreamer.freedesktop.org/wiki/PresetDesign (FIXME: wiki is gone)
|
||||
|
102
markdown/design/interlaced-video.md
Normal file
102
markdown/design/interlaced-video.md
Normal file
|
@ -0,0 +1,102 @@
|
|||
# Interlaced Video
|
||||
|
||||
Video buffers have a number of states identifiable through a combination
|
||||
of caps and buffer flags.
|
||||
|
||||
Possible states:
|
||||
- Progressive
|
||||
- Interlaced
|
||||
- Plain
|
||||
- One field
|
||||
- Two fields
|
||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
||||
field that can be used for telecine pulldown
|
||||
- Telecine
|
||||
- One field
|
||||
- Two fields
|
||||
- Progressive
|
||||
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
|
||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
||||
field that can be used for telecine pulldown
|
||||
|
||||
Note: It can be seen that the difference between the plain interlaced
|
||||
and telecine states is that in the telecine state, buffers containing
|
||||
two fields may be progressive.
|
||||
|
||||
Tools for identification:
|
||||
- GstVideoInfo
|
||||
- GstVideoInterlaceMode - enum `GST_VIDEO_INTERLACE_MODE_...`
|
||||
- PROGRESSIVE
|
||||
- INTERLEAVED
|
||||
- MIXED
|
||||
- Buffers flags - `GST_VIDEO_BUFFER_FLAG_...`
|
||||
- TFF
|
||||
- RFF
|
||||
- ONEFIELD
|
||||
- INTERLACED
|
||||
|
||||
## Identification of Buffer States
|
||||
|
||||
Note that flags are not necessarily interpreted in the same way for all
|
||||
different states nor are they necessarily required nor make sense in all
|
||||
cases.
|
||||
|
||||
### Progressive
|
||||
|
||||
If the interlace mode in the video info corresponding to a buffer is
|
||||
**"progressive"**, then the buffer is progressive.
|
||||
|
||||
### Plain Interlaced
|
||||
|
||||
If the video info interlace mode is **"interleaved"**, then the buffer is
|
||||
plain interlaced.
|
||||
|
||||
`GST_VIDEO_BUFFER_FLAG_TFF` indicates whether the top or bottom field
|
||||
is to be displayed first. The timestamp on the buffer corresponds to the
|
||||
first field.
|
||||
|
||||
`GST_VIDEO_BUFFER_FLAG_RFF` indicates that the first field (indicated
|
||||
by the TFF flag) should be repeated. This is generally only used for
|
||||
telecine purposes but as the telecine state was added long after the
|
||||
interlaced state was added and defined, this flag remains valid for
|
||||
plain interlaced buffers.
|
||||
|
||||
`GST_VIDEO_BUFFER_FLAG_ONEFIELD` means that only the field indicated
|
||||
through the TFF flag is to be used. The other field should be ignored.
|
||||
|
||||
### Telecine
|
||||
|
||||
If video info interlace mode is **"mixed"** then the buffers are in some
|
||||
form of telecine state.
|
||||
|
||||
The `TFF` and `ONEFIELD` flags have the same semantics as for the plain
|
||||
interlaced state.
|
||||
|
||||
`GST_VIDEO_BUFFER_FLAG_RFF` in the telecine state indicates that the
|
||||
buffer contains only repeated fields that are present in other buffers
|
||||
and are as such unneeded. For example, in a sequence of three telecined
|
||||
frames, we might have:
|
||||
|
||||
AtAb AtBb BtBb
|
||||
|
||||
In this situation, we only need the first and third buffers as the
|
||||
second buffer contains fields present in the first and third.
|
||||
|
||||
Note that the following state can have its second buffer identified
|
||||
using the `ONEFIELD` flag (and `TFF` not set):
|
||||
|
||||
AtAb AtBb BtCb
|
||||
|
||||
The telecine state requires one additional flag to be able to identify
|
||||
progressive buffers.
|
||||
|
||||
The presence of the `GST_VIDEO_BUFFER_FLAG_INTERLACED` means that the
|
||||
buffer is an 'interlaced' or 'mixed' buffer that contains two fields
|
||||
that, when combined with fields from adjacent buffers, allow
|
||||
reconstruction of progressive frames. The absence of the flag implies
|
||||
the buffer containing two fields is a progressive frame.
|
||||
|
||||
For example in the following sequence, the third buffer would be mixed
|
||||
(yes, it is a strange pattern, but it can happen):
|
||||
|
||||
AtAb AtBb BtCb CtDb DtDb
|
97
markdown/design/keyframe-force.md
Normal file
97
markdown/design/keyframe-force.md
Normal file
|
@ -0,0 +1,97 @@
|
|||
# Forcing keyframes
|
||||
|
||||
Consider the following use case:
|
||||
|
||||
We have a pipeline that performs video and audio capture from a live
|
||||
source, compresses and muxes the streams and writes the resulting data
|
||||
into a file.
|
||||
|
||||
Inside the uncompressed video data we have a specific pattern inserted
|
||||
at specific moments that should trigger a switch to a new file, meaning,
|
||||
we close the existing file we are writing to and start writing to a new
|
||||
file.
|
||||
|
||||
We want the new file to start with a keyframe so that one can start
|
||||
decoding the file immediately.
|
||||
|
||||
## Components
|
||||
|
||||
1) We need an element that is able to detect the pattern in the video
|
||||
stream.
|
||||
|
||||
2) We need to inform the video encoder that it should start encoding a
|
||||
keyframe starting from exactly the frame with the pattern.
|
||||
|
||||
3) We need to inform the demuxer that it should flush out any pending
|
||||
data and start creating the start of a new file with the keyframe as
|
||||
a first video frame.
|
||||
|
||||
4) We need to inform the sink element that it should start writing to
|
||||
the next file. This requires application interaction to instruct the
|
||||
sink of the new filename. The application should also be free to
|
||||
ignore the boundary and continue to write to the existing file. The
|
||||
application will typically use an event pad probe to detect the
|
||||
custom event.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Downstream
|
||||
|
||||
The implementation would consist of generating a `GST_EVENT_CUSTOM_DOWNSTREAM`
|
||||
event that marks the keyframe boundary. This event is inserted into the
|
||||
pipeline by the application upon a certain trigger. In the above use case
|
||||
this trigger would be given by the element that detects the pattern, in the
|
||||
form of an element message.
|
||||
|
||||
The custom event would travel further downstream to instruct encoder,
|
||||
muxer and sink about the possible switch.
|
||||
|
||||
The information passed in the event consists of:
|
||||
|
||||
**GstForceKeyUnit**
|
||||
|
||||
- **"timestamp"** (`G_TYPE_UINT64`): the timestamp of the buffer that
|
||||
triggered the event.
|
||||
|
||||
- **"stream-time"** (`G_TYPE_UINT64`): the stream position that triggered the event.
|
||||
|
||||
- **"running-time"** (`G_TYPE_UINT64`): the running time of the stream when
|
||||
the event was triggered.
|
||||
|
||||
- **"all-headers"** (`G_TYPE_BOOLEAN`): Send all headers, including
|
||||
those in the caps or those sent at the start of the stream.
|
||||
|
||||
- **...**: optional other data fields.
|
||||
|
||||
Note that this event is purely informational, no element is required to
|
||||
perform an action but it should forward the event downstream, just like
|
||||
any other event it does not handle.
|
||||
|
||||
Elements understanding the event should behave as follows:
|
||||
|
||||
1) The video encoder receives the event before the next frame. Upon
|
||||
reception of the event it schedules to encode the next frame as a
|
||||
keyframe. Before pushing out the encoded keyframe it must push the
|
||||
GstForceKeyUnit event downstream.
|
||||
|
||||
2) The muxer receives the GstForceKeyUnit event and flushes out its
|
||||
current state, preparing to produce data that can be used as a
|
||||
keyunit. Before pushing out the new data it pushes the
|
||||
GstForceKeyUnit event downstream.
|
||||
|
||||
3) The application receives the GstForceKeyUnit on a sink padprobe of
|
||||
the sink and reconfigures the sink to make it perform new actions
|
||||
after receiving the next buffer.
|
||||
|
||||
### Upstream
|
||||
|
||||
When using RTP packets can get lost or receivers can be added at any
|
||||
time, they may request a new key frame.
|
||||
|
||||
An downstream element sends an upstream "GstForceKeyUnit" event up the
|
||||
pipeline.
|
||||
|
||||
When an element produces some kind of key unit in output, but has no
|
||||
such concept in its input (like an encoder that takes raw frames), it
|
||||
consumes the event (doesn't pass it upstream), and instead sends a
|
||||
downstream GstForceKeyUnit event and a new keyframe.
|
68
markdown/design/mediatype-audio-raw.md
Normal file
68
markdown/design/mediatype-audio-raw.md
Normal file
|
@ -0,0 +1,68 @@
|
|||
# Raw Audio Media Types
|
||||
|
||||
**audio/x-raw**
|
||||
|
||||
- **format**, G\_TYPE\_STRING, mandatory The format of the audio samples, see
|
||||
the Formats section for a list of valid sample formats.
|
||||
|
||||
- **rate**, G\_TYPE\_INT, mandatory The samplerate of the audio
|
||||
|
||||
- **channels**, G\_TYPE\_INT, mandatory The number of channels
|
||||
|
||||
- **channel-mask**, GST\_TYPE\_BITMASK, mandatory for more than 2 channels
|
||||
Bitmask of channel positions present. May be omitted for mono and
|
||||
stereo. May be set to 0 to denote that the channels are unpositioned.
|
||||
|
||||
- **layout**, G\_TYPE\_STRING, mandatory The layout of channels within a
|
||||
buffer. Possible values are "interleaved" (for LRLRLRLR) and
|
||||
"non-interleaved" (LLLLRRRR)
|
||||
|
||||
Use `GstAudioInfo` and related helper API to create and parse raw audio caps.
|
||||
|
||||
## Metadata
|
||||
|
||||
- `GstAudioDownmixMeta`: A matrix for downmixing multichannel audio to a
|
||||
lower numer of channels.
|
||||
|
||||
## Formats
|
||||
|
||||
The following values can be used for the format string property.
|
||||
|
||||
- "S8" 8-bit signed PCM audio
|
||||
- "U8" 8-bit unsigned PCM audio
|
||||
|
||||
- "S16LE" 16-bit signed PCM audio
|
||||
- "S16BE" 16-bit signed PCM audio
|
||||
- "U16LE" 16-bit unsigned PCM audio
|
||||
- "U16BE" 16-bit unsigned PCM audio
|
||||
|
||||
- "S24\_32LE" 24-bit signed PCM audio packed into 32-bit
|
||||
- "S24\_32BE" 24-bit signed PCM audio packed into 32-bit
|
||||
- "U24\_32LE" 24-bit unsigned PCM audio packed into 32-bit
|
||||
- "U24\_32BE" 24-bit unsigned PCM audio packed into 32-bit
|
||||
|
||||
- "S32LE" 32-bit signed PCM audio
|
||||
- "S32BE" 32-bit signed PCM audio
|
||||
- "U32LE" 32-bit unsigned PCM audio
|
||||
- "U32BE" 32-bit unsigned PCM audio
|
||||
|
||||
- "S24LE" 24-bit signed PCM audio
|
||||
- "S24BE" 24-bit signed PCM audio
|
||||
- "U24LE" 24-bit unsigned PCM audio
|
||||
- "U24BE" 24-bit unsigned PCM audio
|
||||
|
||||
- "S20LE" 20-bit signed PCM audio
|
||||
- "S20BE" 20-bit signed PCM audio
|
||||
- "U20LE" 20-bit unsigned PCM audio
|
||||
- "U20BE" 20-bit unsigned PCM audio
|
||||
|
||||
- "S18LE" 18-bit signed PCM audio
|
||||
- "S18BE" 18-bit signed PCM audio
|
||||
- "U18LE" 18-bit unsigned PCM audio
|
||||
- "U18BE" 18-bit unsigned PCM audio
|
||||
|
||||
- "F32LE" 32-bit floating-point audio
|
||||
- "F32BE" 32-bit floating-point audio
|
||||
- "F64LE" 64-bit floating-point audio
|
||||
- "F64BE" 64-bit floating-point audio
|
||||
|
22
markdown/design/mediatype-text-raw.md
Normal file
22
markdown/design/mediatype-text-raw.md
Normal file
|
@ -0,0 +1,22 @@
|
|||
# Raw Text Media Types
|
||||
|
||||
**text/x-raw**
|
||||
|
||||
- **format**, G\_TYPE\_STRING, mandatory The format of the text, see the
|
||||
Formats section for a list of valid format strings.
|
||||
|
||||
## Metadata
|
||||
|
||||
There are no common metas for this raw format yet.
|
||||
|
||||
## Formats
|
||||
|
||||
- "utf8": plain timed utf8 text (formerly text/plain)
|
||||
Parsed timed text in utf8 format.
|
||||
|
||||
- "pango-markup": plain timed utf8 text with pango markup
|
||||
(formerly text/x-pango-markup). Same as "utf8", but text embedded in an
|
||||
XML-style markup language for size, colour, emphasis, etc.
|
||||
See [Pango Markup Format][pango-markup]
|
||||
|
||||
[pango-markup]: http://developer.gnome.org/pango/stable/PangoMarkupFormat.html
|
1240
markdown/design/mediatype-video-raw.md
Normal file
1240
markdown/design/mediatype-video-raw.md
Normal file
File diff suppressed because it is too large
Load diff
159
markdown/design/orc-integration.md
Normal file
159
markdown/design/orc-integration.md
Normal file
|
@ -0,0 +1,159 @@
|
|||
# Orc Integration
|
||||
|
||||
## About Orc
|
||||
|
||||
Orc code can be in one of two forms: in .orc files that is converted by
|
||||
orcc to C code that calls liborc functions, or C code that calls liborc
|
||||
to create complex operations at runtime. The former is mostly for
|
||||
functions with predetermined functionality. The latter is for
|
||||
functionality that is determined at runtime, where writing .orc
|
||||
functions for all combinations would be prohibitive. Orc also has a fast
|
||||
memcpy and memset which are useful independently.
|
||||
|
||||
## Fast memcpy()
|
||||
|
||||
\*\*\* This part is not integrated yet. \*\*\*
|
||||
|
||||
Orc has built-in functions `orc_memcpy()` and `orc_memset()` that work
|
||||
like `memcpy()` and `memset()`. These are meant for large copies only. A
|
||||
reasonable cutoff for using `orc_memcpy()` instead of `memcpy()` is if the
|
||||
number of bytes is generally greater than 100. **DO NOT** use `orc_memcpy()`
|
||||
if the typical is size is less than 20 bytes, especially if the size is
|
||||
known at compile time, as these cases are inlined by the compiler.
|
||||
|
||||
(Example: sys/ximage/ximagesink.c)
|
||||
|
||||
Add $(ORC\_CFLAGS) to libgstximagesink\_la\_CFLAGS and $(ORC\_LIBS) to
|
||||
libgstximagesink\_la\_LIBADD. Then, in the source file, add:
|
||||
|
||||
\#ifdef HAVE\_ORC \#include <orc/orc.h> \#else \#define
|
||||
orc\_memcpy(a,b,c) memcpy(a,b,c) \#endif
|
||||
|
||||
Then switch relevant uses of memcpy() to orc\_memcpy().
|
||||
|
||||
The above example works whether or not Orc is enabled at compile time.
|
||||
|
||||
## Normal Usage
|
||||
|
||||
The following lines are added near the top of Makefile.am for plugins
|
||||
that use Orc code in .orc files (this is for the volume plugin):
|
||||
|
||||
ORC\_BASE=volume include $(top\_srcdir)/common/orc.mk
|
||||
|
||||
Also add the generated source file to the plugin build:
|
||||
|
||||
nodist\_libgstvolume\_la\_SOURCES = $(ORC\_SOURCES)
|
||||
|
||||
And of course, add $(ORC\_CFLAGS) to libgstvolume\_la\_CFLAGS, and
|
||||
$(ORC\_LIBS) to libgstvolume\_la\_LIBADD.
|
||||
|
||||
The value assigned to ORC\_BASE does not need to be related to the name
|
||||
of the plugin.
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
The Holy Grail of Orc usage is to programmatically generate Orc code at
|
||||
runtime, have liborc compile it into binary code at runtime, and then
|
||||
execute this code. Currently, the best example of this is in
|
||||
Schroedinger. An example of how this would be used is audioconvert:
|
||||
given an input format, channel position manipulation, dithering and
|
||||
quantizing configuration, and output format, a Orc code generator would
|
||||
create an OrcProgram, add the appropriate instructions to do each step
|
||||
based on the configuration, and then compile the program. Successfully
|
||||
compiling the program would return a function pointer that can be called
|
||||
to perform the operation.
|
||||
|
||||
This sort of advanced usage requires structural changes to current
|
||||
plugins (e.g., audioconvert) and will probably be developed
|
||||
incrementally. Moreover, if such code is intended to be used without Orc
|
||||
as strict build/runtime requirement, two codepaths would need to be
|
||||
developed and tested. For this reason, until GStreamer requires Orc, I
|
||||
think it's a good idea to restrict such advanced usage to the cog plugin
|
||||
in -bad, which requires Orc.
|
||||
|
||||
## Build Process
|
||||
|
||||
The goal of the build process is to make Orc non-essential for most
|
||||
developers and users. This is not to say you shouldn't have Orc
|
||||
installed -- without it, you will get slow backup C code, just that
|
||||
people compiling GStreamer are not forced to switch from Liboil to Orc
|
||||
immediately.
|
||||
|
||||
With Orc installed, the build process will use the Orc Compiler (orcc)
|
||||
to convert each .orc file into a temporary C source (tmp-orc.c) and a
|
||||
temporary header file (${name}orc.h if constructed from ${base}.orc).
|
||||
The C source file is compiled and linked to the plugin, and the header
|
||||
file is included by other source files in the plugin.
|
||||
|
||||
If 'make orc-update' is run in the source directory, the files tmp-orc.c
|
||||
and ${base}orc.h are copied to ${base}orc-dist.c and ${base}orc-dist.h
|
||||
respectively. The -dist.\[ch\] files are automatically disted via
|
||||
orc.mk. The -dist.\[ch\] files should be checked in to git whenever the
|
||||
.orc source is changed and checked in. Example workflow:
|
||||
|
||||
edit .orc file ... make, test, etc. make orc-update git add volume.orc
|
||||
volumeorc-dist.c volumeorc-dist.h git commit
|
||||
|
||||
At 'make dist' time, all of the .orc files are compiled, and then copied
|
||||
to their -dist.\[ch\] counterparts, and then the -dist.\[ch\] files are
|
||||
added to the dist directory.
|
||||
|
||||
Without Orc installed (or --disable-orc given to configure), the
|
||||
-dist.\[ch\] files are copied to tmp-orc.c and ${name}orc.h. When
|
||||
compiled Orc disabled, DISABLE\_ORC is defined in config.h, and the C
|
||||
backup code is compiled. This backup code is pure C, and does not
|
||||
include orc headers or require linking against liborc.
|
||||
|
||||
The common/orc.mk build method is limited by the inflexibility of
|
||||
automake. The file tmp-orc.c must be a fixed filename, using ORC\_NAME
|
||||
to generate the filename does not work because it conflicts with
|
||||
automake's dependency generation. Building multiple .orc files is not
|
||||
possible due to this restriction.
|
||||
|
||||
## Testing
|
||||
|
||||
If you create another .orc file, please add it to tests/orc/Makefile.am.
|
||||
This causes automatic test code to be generated and run during 'make
|
||||
check'. Each function in the .orc file is tested by comparing the
|
||||
results of executing the run-time compiled code and the C backup
|
||||
function.
|
||||
|
||||
## Orc Limitations
|
||||
|
||||
### audioconvert
|
||||
|
||||
Orc doesn't have a mechanism for generating random numbers, which
|
||||
prevents its use as-is for dithering. One way around this is to generate
|
||||
suitable dithering values in one pass, then use those values in a second
|
||||
Orc-based pass.
|
||||
|
||||
Orc doesn't handle 64-bit float, for no good reason.
|
||||
|
||||
Irrespective of Orc handling 64-bit float, it would be useful to have a
|
||||
direct 32-bit float to 16-bit integer conversion.
|
||||
|
||||
audioconvert is a good candidate for programmatically generated Orc code.
|
||||
|
||||
audioconvert enumerates functions in terms of big-endian vs.
|
||||
little-endian. Orc's functions are "native" and "swapped".
|
||||
Programmatically generating code removes the need to worry about this.
|
||||
|
||||
Orc doesn't handle 24-bit samples. Fixing this is not a priority (for ds).
|
||||
|
||||
### videoscale
|
||||
|
||||
Orc doesn't handle horizontal resampling yet. The plan is to add special
|
||||
sampling opcodes, for nearest, bilinear, and cubic interpolation.
|
||||
|
||||
### videotestsrc
|
||||
|
||||
Lots of code in videotestsrc needs to be rewritten to be SIMD (and Orc)
|
||||
friendly, e.g., stuff that uses `oil_splat_u8()`.
|
||||
|
||||
A fast low-quality random number generator in Orc would be useful here.
|
||||
|
||||
### volume
|
||||
|
||||
Many of the comments on audioconvert apply here as well.
|
||||
|
||||
There are a bunch of FIXMEs in here that are due to misapplied patches.
|
66
markdown/design/playbin.md
Normal file
66
markdown/design/playbin.md
Normal file
|
@ -0,0 +1,66 @@
|
|||
# playbin
|
||||
|
||||
The purpose of this element is to decode and render the media contained
|
||||
in a given generic uri. The element extends GstPipeline and is typically
|
||||
used in playback situations.
|
||||
|
||||
Required features:
|
||||
|
||||
- accept and play any valid uri. This includes
|
||||
- rendering video/audio
|
||||
- overlaying subtitles on the video
|
||||
- optionally read external subtitle files
|
||||
- allow for hardware (non raw) sinks
|
||||
- selection of audio/video/subtitle streams based on language.
|
||||
- perform network buffering/incremental download
|
||||
- gapless playback
|
||||
- support for visualisations with configurable sizes
|
||||
- ability to reject files that are too big, or of a format that would
|
||||
require too much CPU/memory usage.
|
||||
- be very efficient with adding elements such as converters to reduce
|
||||
the amount of negotiation that has to happen.
|
||||
- handle chained oggs. This includes having support for dynamic pad
|
||||
add and remove from a demuxer.
|
||||
|
||||
## Components
|
||||
|
||||
### decodebin
|
||||
|
||||
- performs the autoplugging of demuxers/decoders
|
||||
- emits signals when for steering the autoplugging
|
||||
- to decide if a non-raw media format is acceptable as output
|
||||
- to sort the possible decoders for a non-raw format
|
||||
- see also decodebin2 design doc
|
||||
|
||||
### uridecodebin
|
||||
|
||||
- combination of a source to handle the given uri, an optional
|
||||
queueing element and one or more decodebin2 elements to decode the
|
||||
non-raw streams.
|
||||
|
||||
### playsink
|
||||
|
||||
- handles display of audio/video/text.
|
||||
- has request audio/video/text input pad. There is only one sinkpad
|
||||
per type. The requested pads define the configuration of the
|
||||
internal pipeline.
|
||||
- allows for setting audio/video sinks or does automatic
|
||||
sink selection.
|
||||
- allows for configuration of visualisation element.
|
||||
- allows for enable/disable of visualisation, audio and video.
|
||||
|
||||
### playbin
|
||||
|
||||
- combination of one or more uridecodebin elements to read the uri and
|
||||
subtitle uri.
|
||||
- support for queuing new media to support gapless playback.
|
||||
- handles stream selection.
|
||||
- uses playsink to display.
|
||||
- selection of sinks and configuration of uridecodebin with raw
|
||||
output formats.
|
||||
|
||||
## Gapless playback feature
|
||||
|
||||
playbin has an "about-to-finish" signal. The application should
|
||||
configure a new uri (and optional suburi) in the callback. When the
|
||||
current media finishes, this new media will be played next.
|
320
markdown/design/stereo-multiview-video.md
Normal file
320
markdown/design/stereo-multiview-video.md
Normal file
|
@ -0,0 +1,320 @@
|
|||
# Stereoscopic & Multiview Video Handling
|
||||
|
||||
There are two cases to handle:
|
||||
|
||||
- Encoded video output from a demuxer to parser / decoder or from encoders
|
||||
into a muxer.
|
||||
|
||||
- Raw video buffers
|
||||
|
||||
The design below is somewhat based on the proposals from
|
||||
[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
|
||||
|
||||
Multiview is used as a generic term to refer to handling both
|
||||
stereo content (left and right eye only) as well as extensions for videos
|
||||
containing multiple independent viewpoints.
|
||||
|
||||
## Encoded Signalling
|
||||
|
||||
This is regarding the signalling in caps and buffers from demuxers to
|
||||
parsers (sometimes) or out from encoders.
|
||||
|
||||
For backward compatibility with existing codecs many transports of
|
||||
stereoscopic 3D content use normal 2D video with 2 views packed spatially
|
||||
in some way, and put extra new descriptions in the container/mux.
|
||||
|
||||
Info in the demuxer seems to apply to stereo encodings only. For all
|
||||
MVC methods I know, the multiview encoding is in the video bitstream itself
|
||||
and therefore already available to decoders. Only stereo systems have been retro-fitted
|
||||
into the demuxer.
|
||||
|
||||
Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
|
||||
and it would be useful to be able to put the info onto caps and buffers from the
|
||||
parser without decoding.
|
||||
|
||||
To handle both cases, we need to be able to output the required details on
|
||||
encoded video for decoders to apply onto the raw video buffers they decode.
|
||||
|
||||
*If there ever is a need to transport multiview info for encoded data the
|
||||
same system below for raw video or some variation should work*
|
||||
|
||||
### Encoded Video: Properties that need to be encoded into caps
|
||||
|
||||
1. multiview-mode (called "Channel Layout" in bug 611157)
|
||||
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
|
||||
(switches between mono and stereo - mp4 can do this)
|
||||
* Uses a buffer flag to mark individual buffers as mono or "not mono"
|
||||
(single|stereo|multiview) for mixed scenarios. The alternative (not
|
||||
proposed) is for the demuxer to switch caps for each mono to not-mono
|
||||
change, and not used a 'mixed' caps variant at all.
|
||||
* _single_ refers to a stream of buffers that only contain 1 view.
|
||||
It is different from mono in that the stream is a marked left or right
|
||||
eye stream for later combining in a mixer or when displaying.
|
||||
* _multiple_ marks a stream with multiple independent views encoded.
|
||||
It is included in this list for completeness. As noted above, there's
|
||||
currently no scenario that requires marking encoded buffers as MVC.
|
||||
|
||||
2. Frame-packing arrangements / view sequence orderings
|
||||
* Possible frame packings: side-by-side, side-by-side-quincunx,
|
||||
column-interleaved, row-interleaved, top-bottom, checker-board
|
||||
* bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
|
||||
I think that's covered by suitably adjusting pixel-aspect-ratio. If
|
||||
not, they can be added later.
|
||||
* _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
|
||||
* _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
|
||||
* _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
|
||||
1 pixel offset of each eye needs to be accounted when upscaling or displaying
|
||||
* there may be other packings (future expansion)
|
||||
* Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
|
||||
* _frame-by-frame_, each buffer is left, then right view etc
|
||||
* _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
|
||||
Demuxer info indicates which one is which.
|
||||
Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
|
||||
-> *Leave this for future expansion*
|
||||
* _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
|
||||
-> *Leave this for future expansion / deletion*
|
||||
|
||||
3. view encoding order
|
||||
* Describes how to decide which piece of each frame corresponds to left or right eye
|
||||
* Possible orderings left, right, left-then-right, right-then-left
|
||||
- Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
|
||||
- Need a buffer flag for marking the first buffer of a group.
|
||||
|
||||
4. "Frame layout flags"
|
||||
* flags for view specific interpretation
|
||||
* horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
|
||||
Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
|
||||
* This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
|
||||
* It might be better just to use a hex value / integer
|
||||
|
||||
## Buffer representation for raw video
|
||||
|
||||
- Transported as normal video buffers with extra metadata
|
||||
- The caps define the overall buffer width/height, with helper functions to
|
||||
extract the individual views for packed formats
|
||||
- pixel-aspect-ratio adjusted if needed to double the overall width/height
|
||||
- video sinks that don't know about multiview extensions yet will show the
|
||||
packed view as-is. For frame-sequence outputs, things might look weird, but
|
||||
just adding multiview-mode to the sink caps can disallow those transports.
|
||||
- _row-interleaved_ packing is actually just side-by-side memory layout with
|
||||
half frame width, twice the height, so can be handled by adjusting the
|
||||
overall caps and strides
|
||||
- Other exotic layouts need new pixel formats defined (checker-board,
|
||||
column-interleaved, side-by-side-quincunx)
|
||||
- _Frame-by-frame_ - one view per buffer, but with alternating metas marking
|
||||
which buffer is which left/right/other view and using a new buffer flag as
|
||||
described above to mark the start of a group of corresponding frames.
|
||||
- New video caps addition as for encoded buffers
|
||||
|
||||
### Proposed Caps fields
|
||||
|
||||
Combining the requirements above and collapsing the combinations into mnemonics:
|
||||
|
||||
* multiview-mode =
|
||||
mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
|
||||
frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
|
||||
mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
|
||||
|
||||
* multiview-flags =
|
||||
+ 0x0000 none
|
||||
+ 0x0001 right-view-first
|
||||
+ 0x0002 left-h-flipped
|
||||
+ 0x0004 left-v-flipped
|
||||
+ 0x0008 right-h-flipped
|
||||
+ 0x0010 right-v-flipped
|
||||
|
||||
### Proposed new buffer flags
|
||||
|
||||
Add two new `GST_VIDEO_BUFFER_*` flags in video-frame.h and make it clear that
|
||||
those flags can apply to encoded video buffers too. wtay says that's currently
|
||||
the case anyway, but the documentation should say it.
|
||||
|
||||
- **`GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW`** - Marks a buffer as representing
|
||||
non-mono content, although it may be a single (left or right) eye view.
|
||||
|
||||
- **`GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE`** - for frame-sequential methods of
|
||||
transport, mark the "first" of a left/right/other group of frames
|
||||
|
||||
### A new GstMultiviewMeta
|
||||
|
||||
This provides a place to describe all provided views in a buffer / stream,
|
||||
and through Meta negotiation to inform decoders about which views to decode if
|
||||
not all are wanted.
|
||||
|
||||
* Logical labels/names and mapping to GstVideoMeta numbers
|
||||
* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
|
||||
|
||||
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
|
||||
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
|
||||
|
||||
struct GstVideoMultiviewViewInfo {
|
||||
guint view_label;
|
||||
guint meta_id; // id of the GstVideoMeta for this view
|
||||
|
||||
padding;
|
||||
}
|
||||
|
||||
struct GstVideoMultiviewMeta {
|
||||
guint n_views;
|
||||
GstVideoMultiviewViewInfo *view_info;
|
||||
}
|
||||
|
||||
The meta is optional, and probably only useful later for MVC
|
||||
|
||||
|
||||
## Outputting stereo content
|
||||
|
||||
The initial implementation for output will be stereo content in glimagesink
|
||||
|
||||
### Output Considerations with OpenGL
|
||||
|
||||
- If we have support for stereo GL buffer formats, we can output separate
|
||||
left/right eye images and let the hardware take care of display.
|
||||
|
||||
- Otherwise, glimagesink needs to render one window with left/right in a
|
||||
suitable frame packing and that will only show correctly in fullscreen on a
|
||||
device set for the right 3D packing -> requires app intervention to set the
|
||||
video mode.
|
||||
|
||||
- Which could be done manually on the TV, or with HDMI 1.4 by setting the
|
||||
right video mode for the screen to inform the TV or third option, we support
|
||||
rendering to two separate overlay areas on the screen - one for left eye,
|
||||
one for right which can be supported using the 'splitter' element and two
|
||||
output sinks or, better, add a 2nd window overlay for split stereo output
|
||||
|
||||
- Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so
|
||||
initial implementation won't include that
|
||||
|
||||
## Other elements for handling multiview content
|
||||
|
||||
- videooverlay interface extensions
|
||||
- __Q__: Should this be a new interface?
|
||||
- Element message to communicate the presence of stereoscopic information to the app
|
||||
- App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
|
||||
- Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
|
||||
- New API for the app to set rendering options for stereo/multiview content
|
||||
- This might be best implemented as a **multiview GstContext**, so that
|
||||
the pipeline can share app preferences for content interpretation and downmixing
|
||||
to mono for output, or in the sink and have those down as far upstream/downstream as possible.
|
||||
|
||||
- Converter element
|
||||
- convert different view layouts
|
||||
- Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
|
||||
|
||||
- Mixer element
|
||||
- take 2 video streams and output as stereo
|
||||
- later take n video streams
|
||||
- share code with the converter, it just takes input from n pads instead of one.
|
||||
|
||||
- Splitter element
|
||||
- Output one pad per view
|
||||
|
||||
### Implementing MVC handling in decoders / parsers (and encoders)
|
||||
|
||||
Things to do to implement MVC handling
|
||||
|
||||
1. Parsing SEI in h264parse and setting caps (patches available in
|
||||
bugzilla for parsing, see below)
|
||||
2. Integrate gstreamer-vaapi MVC support with this proposal
|
||||
3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
|
||||
4. generating SEI in H.264 encoder
|
||||
5. Support for MPEG2 MVC extensions
|
||||
|
||||
## Relevant bugs
|
||||
|
||||
- [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
|
||||
- [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
|
||||
- [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
|
||||
|
||||
## Other Information
|
||||
|
||||
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
|
||||
|
||||
## Open Questions
|
||||
|
||||
### Background
|
||||
|
||||
### Representation for GstGL
|
||||
|
||||
When uploading raw video frames to GL textures, the goal is to implement:
|
||||
|
||||
Split packed frames into separate GL textures when uploading, and
|
||||
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
|
||||
multiview-flags fields in the caps should change to reflect the conversion
|
||||
from one incoming GstMemory to multiple GstGLMemory, and change the
|
||||
width/height in the output info as needed.
|
||||
|
||||
This is (currently) targetted as 2 render passes - upload as normal
|
||||
to a single stereo-packed RGBA texture, and then unpack into 2
|
||||
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
|
||||
2 GstGLMemory attached to one buffer. We can optimise the upload later
|
||||
to go directly to 2 textures for common input formats.
|
||||
|
||||
Separat output textures have a few advantages:
|
||||
|
||||
- Filter elements can more easily apply filters in several passes to each
|
||||
texture without fundamental changes to our filters to avoid mixing pixels
|
||||
from separate views.
|
||||
|
||||
- Centralises the sampling of input video frame packings in the upload code,
|
||||
which makes adding new packings in the future easier.
|
||||
|
||||
- Sampling multiple textures to generate various output frame-packings
|
||||
for display is conceptually simpler than converting from any input packing
|
||||
to any output packing.
|
||||
|
||||
- In implementations that support quad buffers, having separate textures
|
||||
makes it trivial to do GL_LEFT/GL_RIGHT output
|
||||
|
||||
For either option, we'll need new glsink output API to pass more
|
||||
information to applications about multiple views for the draw signal/callback.
|
||||
|
||||
I don't know if it's desirable to support *both* methods of representing
|
||||
views. If so, that should be signalled in the caps too. That could be a
|
||||
new multiview-mode for passing views in separate GstMemory objects
|
||||
attached to a GstBuffer, which would not be GL specific.
|
||||
|
||||
### Overriding frame packing interpretation
|
||||
|
||||
Most sample videos available are frame packed, with no metadata
|
||||
to say so. How should we override that interpretation?
|
||||
|
||||
- Simple answer: Use capssetter + new properties on playbin to
|
||||
override the multiview fields. *Basically implemented in playbin, using*
|
||||
*a pad probe. Needs more work for completeness*
|
||||
|
||||
### Adding extra GstVideoMeta to buffers
|
||||
|
||||
There should be one GstVideoMeta for the entire video frame in packed
|
||||
layouts, and one GstVideoMeta per GstGLMemory when views are attached
|
||||
to a GstBuffer separately. This should be done by the buffer pool,
|
||||
which knows from the caps.
|
||||
|
||||
### videooverlay interface extensions
|
||||
|
||||
GstVideoOverlay needs:
|
||||
|
||||
- A way to announce the presence of multiview content when it is
|
||||
detected/signalled in a stream.
|
||||
- A way to tell applications which output methods are supported/available
|
||||
- A way to tell the sink which output method it should use
|
||||
- Possibly a way to tell the sink to override the input frame
|
||||
interpretation / caps - depends on the answer to the question
|
||||
above about how to model overriding input interpretation.
|
||||
|
||||
### What's implemented
|
||||
|
||||
- Caps handling
|
||||
- gst-plugins-base libsgstvideo pieces
|
||||
- playbin caps overriding
|
||||
- conversion elements - glstereomix, gl3dconvert (needs a rename),
|
||||
glstereosplit.
|
||||
|
||||
### Possible future enhancements
|
||||
|
||||
- Make GLupload split to separate textures at upload time?
|
||||
- Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
|
||||
- Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
|
||||
- current done by packing then downloading which isn't OK overhead for RGBA download
|
||||
- Think about how we integrate GLstereo - do we need to do anything special,
|
||||
or can the app just render to stereo/quad buffers if they're available?
|
527
markdown/design/subtitle-overlays.md
Normal file
527
markdown/design/subtitle-overlays.md
Normal file
|
@ -0,0 +1,527 @@
|
|||
# Subtitle overlays, hardware-accelerated decoding and playbin
|
||||
|
||||
This document describes some of the considerations and requirements that
|
||||
led to the current `GstVideoOverlayCompositionMeta` API which allows
|
||||
attaching of subtitle bitmaps or logos to video buffers.
|
||||
|
||||
## Background
|
||||
|
||||
Subtitles can be muxed in containers or come from an external source.
|
||||
|
||||
Subtitles come in many shapes and colours. Usually they are either
|
||||
text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
|
||||
and the most common form of DVB subs). Bitmap based subtitles are
|
||||
usually compressed in some way, like some form of run-length encoding.
|
||||
|
||||
Subtitles are currently decoded and rendered in subtitle-format-specific
|
||||
overlay elements. These elements have two sink pads (one for raw video
|
||||
and one for the subtitle format in question) and one raw video source
|
||||
pad.
|
||||
|
||||
They will take care of synchronising the two input streams, and of
|
||||
decoding and rendering the subtitles on top of the raw video stream.
|
||||
|
||||
Digression: one could theoretically have dedicated decoder/render
|
||||
elements that output an AYUV or ARGB image, and then let a videomixer
|
||||
element do the actual overlaying, but this is not very efficient,
|
||||
because it requires us to allocate and blend whole pictures (1920x1080
|
||||
AYUV = 8MB, 1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the
|
||||
overlay region is only a small rectangle at the bottom. This wastes
|
||||
memory and CPU. We could do something better by introducing a new format
|
||||
that only encodes the region(s) of interest, but we don't have such a
|
||||
format yet, and are not necessarily keen to rewrite this part of the
|
||||
logic in playbin at this point - and we can't change existing elements'
|
||||
behaviour, so would need to introduce new elements for this.
|
||||
|
||||
Playbin supports outputting compressed formats, i.e. it does not force
|
||||
decoding to a raw format, but is happy to output to a non-raw format as
|
||||
long as the sink supports that as well.
|
||||
|
||||
In case of certain hardware-accelerated decoding APIs, we will make use
|
||||
of that functionality. However, the decoder will not output a raw video
|
||||
format then, but some kind of hardware/API-specific format (in the caps)
|
||||
and the buffers will reference hardware/API-specific objects that the
|
||||
hardware/API-specific sink will know how to handle.
|
||||
|
||||
## The Problem
|
||||
|
||||
In the case of such hardware-accelerated decoding, the decoder will not
|
||||
output raw pixels that can easily be manipulated. Instead, it will
|
||||
output hardware/API-specific objects that can later be used to render a
|
||||
frame using the same API.
|
||||
|
||||
Even if we could transform such a buffer into raw pixels, we most likely
|
||||
would want to avoid that, in order to avoid the need to map the data
|
||||
back into system memory (and then later back to the GPU). It's much
|
||||
better to upload the much smaller encoded data to the GPU/DSP and then
|
||||
leave it there until rendered.
|
||||
|
||||
Before `GstVideoOverlayComposition` playbin only supported subtitles on
|
||||
top of raw decoded video. It would try to find a suitable overlay element
|
||||
from the plugin registry based on the input subtitle caps and the rank.
|
||||
(It is assumed that we will be able to convert any raw video format into
|
||||
any format required by the overlay using a converter such as videoconvert.)
|
||||
|
||||
It would not render subtitles if the video sent to the sink is not raw
|
||||
YUV or RGB or if conversions had been disabled by setting the
|
||||
native-video flag on playbin.
|
||||
|
||||
Subtitle rendering is considered an important feature. Enabling
|
||||
hardware-accelerated decoding by default should not lead to a major
|
||||
feature regression in this area.
|
||||
|
||||
This means that we need to support subtitle rendering on top of non-raw
|
||||
video.
|
||||
|
||||
## Possible Solutions
|
||||
|
||||
The goal is to keep knowledge of the subtitle format within the
|
||||
format-specific GStreamer plugins, and knowledge of any specific video
|
||||
acceleration API to the GStreamer plugins implementing that API. We do
|
||||
not want to make the pango/dvbsuboverlay/dvdspu/kate plugins link to
|
||||
libva/libvdpau/etc. and we do not want to make the vaapi/vdpau plugins
|
||||
link to all of libpango/libkate/libass etc.
|
||||
|
||||
Multiple possible solutions come to mind:
|
||||
|
||||
1) backend-specific overlay elements
|
||||
|
||||
e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
|
||||
vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
|
||||
|
||||
This assumes the overlay can be done directly on the
|
||||
backend-specific object passed around.
|
||||
|
||||
The main drawback with this solution is that it leads to a lot of
|
||||
code duplication and may also lead to uncertainty about distributing
|
||||
certain duplicated pieces of code. The code duplication is pretty
|
||||
much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
|
||||
kate, assrender, etc. available in form of base classes to derive
|
||||
from is not really an option. Similarly, one would not really want
|
||||
the vaapi/vdpau plugin to depend on a bunch of other libraries such
|
||||
as libpango, libkate, libtiger, libass, etc.
|
||||
|
||||
One could add some new kind of overlay plugin feature though in
|
||||
combination with a generic base class of some sort, but in order to
|
||||
accommodate all the different cases and formats one would end up
|
||||
with quite convoluted/tricky API.
|
||||
|
||||
(Of course there could also be a GstFancyVideoBuffer that provides
|
||||
an abstraction for such video accelerated objects and that could
|
||||
provide an API to add overlays to it in a generic way, but in the
|
||||
end this is just a less generic variant of (c), and it is not clear
|
||||
that there are real benefits to a specialised solution vs. a more
|
||||
generic one).
|
||||
|
||||
2) convert backend-specific object to raw pixels and then overlay
|
||||
|
||||
Even where possible technically, this is most likely very
|
||||
inefficient.
|
||||
|
||||
3) attach the overlay data to the backend-specific video frame buffers
|
||||
in a generic way and do the actual overlaying/blitting later in
|
||||
backend-specific code such as the video sink (or an accelerated
|
||||
encoder/transcoder)
|
||||
|
||||
In this case, the actual overlay rendering (i.e. the actual text
|
||||
rendering or decoding DVD/DVB data into pixels) is done in the
|
||||
subtitle-format-specific GStreamer plugin. All knowledge about the
|
||||
subtitle format is contained in the overlay plugin then, and all
|
||||
knowledge about the video backend in the video backend specific
|
||||
plugin.
|
||||
|
||||
The main question then is how to get the overlay pixels (and we will
|
||||
only deal with pixels here) from the overlay element to the video
|
||||
sink.
|
||||
|
||||
This could be done in multiple ways: One could send custom events
|
||||
downstream with the overlay data, or one could attach the overlay
|
||||
data directly to the video buffers in some way.
|
||||
|
||||
Sending inline events has the advantage that is is fairly
|
||||
transparent to any elements between the overlay element and the
|
||||
video sink: if an effects plugin creates a new video buffer for the
|
||||
output, nothing special needs to be done to maintain the subtitle
|
||||
overlay information, since the overlay data is not attached to the
|
||||
buffer. However, it slightly complicates things at the sink, since
|
||||
it would also need to look for the new event in question instead of
|
||||
just processing everything in its buffer render function.
|
||||
|
||||
If one attaches the overlay data to the buffer directly, any element
|
||||
between overlay and video sink that creates a new video buffer would
|
||||
need to be aware of the overlay data attached to it and copy it over
|
||||
to the newly-created buffer.
|
||||
|
||||
One would have to do implement a special kind of new query (e.g.
|
||||
FEATURE query) that is not passed on automatically by
|
||||
gst\_pad\_query\_default() in order to make sure that all elements
|
||||
downstream will handle the attached overlay data. (This is only a
|
||||
problem if we want to also attach overlay data to raw video pixel
|
||||
buffers; for new non-raw types we can just make it mandatory and
|
||||
assume support and be done with it; for existing non-raw types
|
||||
nothing changes anyway if subtitles don't work) (we need to maintain
|
||||
backwards compatibility for existing raw video pipelines like e.g.:
|
||||
..decoder \! suboverlay \! encoder..)
|
||||
|
||||
Even though slightly more work, attaching the overlay information to
|
||||
buffers seems more intuitive than sending it interleaved as events.
|
||||
And buffers stored or passed around (e.g. via the "last-buffer"
|
||||
property in the sink when doing screenshots via playbin) always
|
||||
contain all the information needed.
|
||||
|
||||
4) create a video/x-raw-\*-delta format and use a backend-specific
|
||||
videomixer
|
||||
|
||||
This possibility was hinted at already in the digression in section
|
||||
1. It would satisfy the goal of keeping subtitle format knowledge in
|
||||
the subtitle plugins and video backend knowledge in the video
|
||||
backend plugin. It would also add a concept that might be generally
|
||||
useful (think ximagesrc capture with xdamage). However, it would
|
||||
require adding foorender variants of all the existing overlay
|
||||
elements, and changing playbin to that new design, which is somewhat
|
||||
intrusive. And given the general nature of such a new format/API, we
|
||||
would need to take a lot of care to be able to accommodate all
|
||||
possible use cases when designing the API, which makes it
|
||||
considerably more ambitious. Lastly, we would need to write
|
||||
videomixer variants for the various accelerated video backends as
|
||||
well.
|
||||
|
||||
Overall (c) appears to be the most promising solution. It is the least
|
||||
intrusive and should be fairly straight-forward to implement with
|
||||
reasonable effort, requiring only small changes to existing elements and
|
||||
requiring no new elements.
|
||||
|
||||
Doing the final overlaying in the sink as opposed to a videomixer or
|
||||
overlay in the middle of the pipeline has other advantages:
|
||||
|
||||
- if video frames need to be dropped, e.g. for QoS reasons, we could
|
||||
also skip the actual subtitle overlaying and possibly the
|
||||
decoding/rendering as well, if the implementation and API allows for
|
||||
that to be delayed.
|
||||
|
||||
- the sink often knows the actual size of the window/surface/screen
|
||||
the output video is rendered to. This *may* make it possible to
|
||||
render the overlay image in a higher resolution than the input
|
||||
video, solving a long standing issue with pixelated subtitles on top
|
||||
of low-resolution videos that are then scaled up in the sink. This
|
||||
would require for the rendering to be delayed of course instead of
|
||||
just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
|
||||
in the overlay, but that could all be supported.
|
||||
|
||||
- if the video backend / sink has support for high-quality text
|
||||
rendering (clutter?) we could just pass the text or pango markup to
|
||||
the sink and let it do the rest (this is unlikely to be supported in
|
||||
the general case - text and glyph rendering is hard; also, we don't
|
||||
really want to make up our own text markup system, and pango markup
|
||||
is probably too limited for complex karaoke stuff).
|
||||
|
||||
## API needed
|
||||
|
||||
1) Representation of subtitle overlays to be rendered
|
||||
|
||||
We need to pass the overlay pixels from the overlay element to the
|
||||
sink somehow. Whatever the exact mechanism, let's assume we pass a
|
||||
refcounted GstVideoOverlayComposition struct or object.
|
||||
|
||||
A composition is made up of one or more overlays/rectangles.
|
||||
|
||||
In the simplest case an overlay rectangle is just a blob of
|
||||
RGBA/ABGR \[FIXME?\] or AYUV pixels with positioning info and other
|
||||
metadata, and there is only one rectangle to render.
|
||||
|
||||
We're keeping the naming generic ("OverlayFoo" rather than
|
||||
"SubtitleFoo") here, since this might also be handy for other use
|
||||
cases such as e.g. logo overlays or so. It is not designed for
|
||||
full-fledged video stream mixing
|
||||
though.
|
||||
|
||||
// Note: don't mind the exact implementation details, they'll be hidden
|
||||
|
||||
// FIXME: might be confusing in 0.11 though since GstXOverlay was
|
||||
// renamed to GstVideoOverlay in 0.11, but not much we can do,
|
||||
// maybe we can rename GstVideoOverlay to something better
|
||||
|
||||
struct GstVideoOverlayComposition
|
||||
{
|
||||
guint num_rectangles;
|
||||
GstVideoOverlayRectangle ** rectangles;
|
||||
|
||||
/* lowest rectangle sequence number still used by the upstream
|
||||
* overlay element. This way a renderer maintaining some kind of
|
||||
* rectangles <-> surface cache can know when to free cached
|
||||
* surfaces/rectangles. */
|
||||
guint min_seq_num_used;
|
||||
|
||||
/* sequence number for the composition (same series as rectangles) */
|
||||
guint seq_num;
|
||||
}
|
||||
|
||||
struct GstVideoOverlayRectangle
|
||||
{
|
||||
/* Position on video frame and dimension of output rectangle in
|
||||
* output frame terms (already adjusted for the PAR of the output
|
||||
* frame). x/y can be negative (overlay will be clipped then) */
|
||||
gint x, y;
|
||||
guint render_width, render_height;
|
||||
|
||||
/* Dimensions of overlay pixels */
|
||||
guint width, height, stride;
|
||||
|
||||
/* This is the PAR of the overlay pixels */
|
||||
guint par_n, par_d;
|
||||
|
||||
/* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
|
||||
* and BGRA on little-endian systems (i.e. pixels are treated as
|
||||
* 32-bit values and alpha is always in the most-significant byte,
|
||||
* and blue is in the least-significant byte).
|
||||
*
|
||||
* FIXME: does anyone actually use AYUV in practice? (we do
|
||||
* in our utility function to blend on top of raw video)
|
||||
* What about AYUV and endianness? Do we always have [A][Y][U][V]
|
||||
* in memory? */
|
||||
/* FIXME: maybe use our own enum? */
|
||||
GstVideoFormat format;
|
||||
|
||||
/* Refcounted blob of memory, no caps or timestamps */
|
||||
GstBuffer *pixels;
|
||||
|
||||
// FIXME: how to express source like text or pango markup?
|
||||
// (just add source type enum + source buffer with data)
|
||||
//
|
||||
// FOR 0.10: always send pixel blobs, but attach source data in
|
||||
// addition (reason: if downstream changes, we can't renegotiate
|
||||
// that properly, if we just do a query of supported formats from
|
||||
// the start). Sink will just ignore pixels and use pango markup
|
||||
// from source data if it supports that.
|
||||
//
|
||||
// FOR 0.11: overlay should query formats (pango markup, pixels)
|
||||
// supported by downstream and then only send that. We can
|
||||
// renegotiate via the reconfigure event.
|
||||
//
|
||||
|
||||
/* sequence number: useful for backends/renderers/sinks that want
|
||||
* to maintain a cache of rectangles <-> surfaces. The value of
|
||||
* the min_seq_num_used in the composition tells the renderer which
|
||||
* rectangles have expired. */
|
||||
guint seq_num;
|
||||
|
||||
/* FIXME: we also need a (private) way to cache converted/scaled
|
||||
* pixel blobs */
|
||||
}
|
||||
|
||||
(a1) Overlay consumer
|
||||
API:
|
||||
|
||||
How would this work in a video sink that supports scaling of textures:
|
||||
|
||||
gst_foo_sink_render () {
|
||||
/* assume only one for now */
|
||||
if video_buffer has composition:
|
||||
composition = video_buffer.get_composition()
|
||||
|
||||
for each rectangle in composition:
|
||||
if rectangle.source_data_type == PANGO_MARKUP
|
||||
actor = text_from_pango_markup (rectangle.get_source_data())
|
||||
else
|
||||
pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
|
||||
actor = texture_from_rgba (pixels, ...)
|
||||
|
||||
.. position + scale on top of video surface ...
|
||||
}
|
||||
|
||||
(a2) Overlay producer
|
||||
API:
|
||||
|
||||
e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
|
||||
|
||||
if (logoverlay->cached_composition == NULL) {
|
||||
comp = composition_new ();
|
||||
|
||||
rect = rectangle_new (format, pixels_buf,
|
||||
width, height, stride, par_n, par_d,
|
||||
x, y, render_width, render_height);
|
||||
|
||||
/* composition adds its own ref for the rectangle */
|
||||
composition_add_rectangle (comp, rect);
|
||||
rectangle_unref (rect);
|
||||
|
||||
/* buffer adds its own ref for the composition */
|
||||
video_buffer_attach_composition (comp);
|
||||
|
||||
/* we take ownership of the composition and save it for later */
|
||||
logoverlay->cached_composition = comp;
|
||||
} else {
|
||||
video_buffer_attach_composition (logoverlay->cached_composition);
|
||||
}
|
||||
|
||||
FIXME: also add some API to modify render position/dimensions of a
|
||||
rectangle (probably requires creation of new rectangle, unless we
|
||||
handle writability like with other mini objects).
|
||||
|
||||
2) Fallback overlay rendering/blitting on top of raw video
|
||||
|
||||
Eventually we want to use this overlay mechanism not only for
|
||||
hardware-accelerated video, but also for plain old raw video, either
|
||||
at the sink or in the overlay element directly.
|
||||
|
||||
Apart from the advantages listed earlier in section 3, this allows
|
||||
us to consolidate a lot of overlaying/blitting code that is
|
||||
currently repeated in every single overlay element in one location.
|
||||
This makes it considerably easier to support a whole range of raw
|
||||
video formats out of the box, add SIMD-optimised rendering using
|
||||
ORC, or handle corner cases correctly.
|
||||
|
||||
(Note: side-effect of overlaying raw video at the video sink is that
|
||||
if e.g. a screnshotter gets the last buffer via the last-buffer
|
||||
property of basesink, it would get an image without the subtitles on
|
||||
top. This could probably be fixed by re-implementing the property in
|
||||
GstVideoSink though. Playbin2 could handle this internally as well).
|
||||
|
||||
void
|
||||
gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
|
||||
GstBuffer * video_buf)
|
||||
{
|
||||
guint n;
|
||||
|
||||
g_return_if_fail (gst_buffer_is_writable (video_buf));
|
||||
g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
|
||||
|
||||
... parse video_buffer caps into BlendVideoFormatInfo ...
|
||||
|
||||
for each rectangle in the composition: {
|
||||
|
||||
if (gst_video_format_is_yuv (video_buf_format)) {
|
||||
overlay_format = FORMAT_AYUV;
|
||||
} else if (gst_video_format_is_rgb (video_buf_format)) {
|
||||
overlay_format = FORMAT_ARGB;
|
||||
} else {
|
||||
/* FIXME: grayscale? */
|
||||
return;
|
||||
}
|
||||
|
||||
/* this will scale and convert AYUV<->ARGB if needed */
|
||||
pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
|
||||
|
||||
... clip output rectangle ...
|
||||
|
||||
__do_blend (video_buf_format, video_buf->data,
|
||||
overlay_format, pixels->data,
|
||||
x, y, width, height, stride);
|
||||
|
||||
gst_buffer_unref (pixels);
|
||||
}
|
||||
}
|
||||
|
||||
3) Flatten all rectangles in a composition
|
||||
|
||||
We cannot assume that the video backend API can handle any number of
|
||||
rectangle overlays, it's possible that it only supports one single
|
||||
overlay, in which case we need to squash all rectangles into one.
|
||||
|
||||
However, we'll just declare this a corner case for now, and
|
||||
implement it only if someone actually needs it. It's easy to add
|
||||
later API-wise. Might be a bit tricky if we have rectangles with
|
||||
different PARs/formats (e.g. subs and a logo), though we could
|
||||
probably always just use the code from (b) with a fully transparent
|
||||
video buffer to create a flattened overlay buffer.
|
||||
|
||||
4) query support for the new video composition mechanism
|
||||
|
||||
This is handled via GstMeta and an ALLOCATION query - we can simply
|
||||
query whether downstream supports the GstVideoOverlayComposition meta.
|
||||
|
||||
There appears to be no issue with downstream possibly not being
|
||||
linked yet at the time when an overlay would want to do such a
|
||||
query, but we would just have to default to something and update
|
||||
ourselves later on a reconfigure event then.
|
||||
|
||||
Other considerations:
|
||||
|
||||
- renderers (overlays or sinks) may be able to handle only ARGB or
|
||||
only AYUV (for most graphics/hw-API it's likely ARGB of some sort,
|
||||
while our blending utility functions will likely want the same
|
||||
colour space as the underlying raw video format, which is usually
|
||||
YUV of some sort). We need to convert where required, and should
|
||||
cache the conversion.
|
||||
|
||||
- renderers may or may not be able to scale the overlay. We need to do
|
||||
the scaling internally if not (simple case: just horizontal scaling
|
||||
to adjust for PAR differences; complex case: both horizontal and
|
||||
vertical scaling, e.g. if subs come from a different source than the
|
||||
video or the video has been rescaled or cropped between overlay
|
||||
element and sink).
|
||||
|
||||
- renderers may be able to generate (possibly scaled) pixels on demand
|
||||
from the original data (e.g. a string or RLE-encoded data). We will
|
||||
ignore this for now, since this functionality can still be added
|
||||
later via API additions. The most interesting case would be to pass
|
||||
a pango markup string, since e.g. clutter can handle that natively.
|
||||
|
||||
- renderers may be able to write data directly on top of the video
|
||||
pixels (instead of creating an intermediary buffer with the overlay
|
||||
which is then blended on top of the actual video frame), e.g.
|
||||
dvdspu, dvbsuboverlay
|
||||
|
||||
However, in the interest of simplicity, we should probably ignore the
|
||||
fact that some elements can blend their overlays directly on top of the
|
||||
video (decoding/uncompressing them on the fly), even more so as it's not
|
||||
obvious that it's actually faster to decode the same overlay 70-90 times
|
||||
(say) (ie. ca. 3 seconds of video frames) and then blend it 70-90 times
|
||||
instead of decoding it once into a temporary buffer and then blending it
|
||||
directly from there, possibly SIMD-accelerated. Also, this is only
|
||||
relevant if the video is raw video and not some hardware-acceleration
|
||||
backend object.
|
||||
|
||||
And ultimately it is the overlay element that decides whether to do the
|
||||
overlay right there and then or have the sink do it (if supported). It
|
||||
could decide to keep doing the overlay itself for raw video and only use
|
||||
our new API for non-raw video.
|
||||
|
||||
- renderers may want to make sure they only upload the overlay pixels
|
||||
once per rectangle if that rectangle recurs in subsequent frames (as
|
||||
part of the same composition or a different composition), as is
|
||||
likely. This caching of e.g. surfaces needs to be done renderer-side
|
||||
and can be accomplished based on the sequence numbers. The
|
||||
composition contains the lowest sequence number still in use
|
||||
upstream (an overlay element may want to cache created
|
||||
compositions+rectangles as well after all to re-use them for
|
||||
multiple frames), based on that the renderer can expire cached
|
||||
objects. The caching needs to be done renderer-side because
|
||||
attaching renderer-specific objects to the rectangles won't work
|
||||
well given the refcounted nature of rectangles and compositions,
|
||||
making it unpredictable when a rectangle or composition will be
|
||||
freed or from which thread context it will be freed. The
|
||||
renderer-specific objects are likely bound to other types of
|
||||
renderer-specific contexts, and need to be managed in connection
|
||||
with those.
|
||||
|
||||
- composition/rectangles should internally provide a certain degree of
|
||||
thread-safety. Multiple elements (sinks, overlay element) might
|
||||
access or use the same objects from multiple threads at the same
|
||||
time, and it is expected that elements will keep a ref to
|
||||
compositions and rectangles they push downstream for a while, e.g.
|
||||
until the current subtitle composition expires.
|
||||
|
||||
## Future considerations
|
||||
|
||||
- alternatives: there may be multiple versions/variants of the same
|
||||
subtitle stream. On DVDs, there may be a 4:3 version and a 16:9
|
||||
version of the same subtitles. We could attach both variants and let
|
||||
the renderer pick the best one for the situation (currently we just
|
||||
use the 16:9 version). With totem, it's ultimately totem that adds
|
||||
the 'black bars' at the top/bottom, so totem also knows if it's got
|
||||
a 4:3 display and can/wants to fit 4:3 subs (which may render on top
|
||||
of the bars) or not, for example.
|
||||
|
||||
## Misc. FIXMEs
|
||||
|
||||
TEST: should these look (roughly) alike (note text distortion) - needs
|
||||
fixing in textoverlay
|
||||
|
||||
gst-launch-1.0 \
|
||||
videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 \
|
||||
! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
||||
videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 \
|
||||
! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
||||
videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 \
|
||||
! textoverlay text=Hello font-desc=72 ! xvimagesink
|
12
sitemap.txt
12
sitemap.txt
|
@ -141,6 +141,7 @@ index.md
|
|||
design/MT-refcounting.md
|
||||
design/TODO.md
|
||||
design/activation.md
|
||||
design/audiosinks.md
|
||||
design/buffer.md
|
||||
design/buffering.md
|
||||
design/bufferpool.md
|
||||
|
@ -149,10 +150,12 @@ index.md
|
|||
design/context.md
|
||||
design/controller.md
|
||||
design/conventions.md
|
||||
design/decodebin.md
|
||||
design/dynamic.md
|
||||
design/element-sink.md
|
||||
design/element-source.md
|
||||
design/element-transform.md
|
||||
design/encoding.md
|
||||
design/events.md
|
||||
design/framestep.md
|
||||
design/gstbin.md
|
||||
|
@ -162,8 +165,13 @@ index.md
|
|||
design/gstobject.md
|
||||
design/gstpipeline.md
|
||||
design/draft-klass.md
|
||||
design/interlaced-video.md
|
||||
design/keyframe-force.md
|
||||
design/latency.md
|
||||
design/live-source.md
|
||||
design/mediatype-audio-raw.md
|
||||
design/mediatype-text-raw.md
|
||||
design/mediatype-video-raw.md
|
||||
design/memory.md
|
||||
design/messages.md
|
||||
design/meta.md
|
||||
|
@ -171,7 +179,9 @@ index.md
|
|||
design/miniobject.md
|
||||
design/missing-plugins.md
|
||||
design/negotiation.md
|
||||
design/orc-integration.md
|
||||
design/overview.md
|
||||
design/playbin.md
|
||||
design/preroll.md
|
||||
design/probes.md
|
||||
design/progress.md
|
||||
|
@ -186,9 +196,11 @@ index.md
|
|||
design/sparsestreams.md
|
||||
design/standards.md
|
||||
design/states.md
|
||||
design/stereo-multiview-video.md
|
||||
design/stream-selection.md
|
||||
design/stream-status.md
|
||||
design/streams.md
|
||||
design/subtitle-overlays.md
|
||||
design/synchronisation.md
|
||||
design/draft-tagreading.md
|
||||
design/toc.md
|
||||
|
|
Loading…
Reference in a new issue