design: move over design docs from gst-plugins-base

Or most of them anyway (excl. draft-hw-acceleration
and draft-va which didn't seem particularly pertinent).
This commit is contained in:
Tim-Philipp Müller 2016-12-08 22:58:08 +00:00
parent a3fe9f6a7d
commit aff7ad1080
13 changed files with 3475 additions and 0 deletions

View file

@ -0,0 +1,129 @@
## Audiosink design
### Requirements
- must operate chain based. Most simple playback pipelines will push
audio from the decoders into the audio sink.
- must operate getrange based Most professional audio applications
will operate in a mode where the audio sink pulls samples from the
pipeline. This is typically done in a callback from the audiosink
requesting N samples. The callback is either scheduled from a thread
or from an interrupt from the audio hardware device.
- Exact sample accurate clocks. the audiosink must be able to provide
a clock that is sample accurate even if samples are dropped or when
discontinuities are found in the stream.
- Exact timing of playback. The audiosink must be able to play samples
at their exact times.
- use DMA access when possible. When the hardware can do DMA we should
use it. This should also work over bufferpools to avoid data copying
to/from kernel space.
### Design
The design is based on a set of base classes and the concept of a
ringbuffer of samples.
+-----------+ - provide preroll, rendering, timing
+ basesink + - caps nego
+-----+-----+
|
+-----V----------+ - manages ringbuffer
+ audiobasesink + - manages scheduling (push/pull)
+-----+----------+ - manages clock/query/seek
| - manages scheduling of samples in the ringbuffer
| - manages caps parsing
|
+-----V------+ - default ringbuffer implementation with a GThread
+ audiosink + - subclasses provide open/read/close methods
+------------+
The ringbuffer is a contiguous piece of memory divided into segtotal
pieces of segments. Each segment has segsize bytes.
play position
v
+---+---+---+-------------------------------------+----------+
+ 0 | 1 | 2 | .... | segtotal |
+---+---+---+-------------------------------------+----------+
<--->
segsize bytes = N samples * bytes_per_sample.
The ringbuffer has a play position, which is expressed in segments. The
play position is where the device is currently reading samples from the
buffer.
The ringbuffer can be put to the PLAYING or STOPPED state.
In the STOPPED state no samples are played to the device and the play
pointer does not advance.
In the PLAYING state samples are written to the device and the
ringbuffer should call a configurable callback after each segment is
written to the device. In this state the play pointer is advanced after
each segment is written.
A write operation to the ringbuffer will put new samples in the
ringbuffer. If there is not enough space in the ringbuffer, the write
operation will block. The playback of the buffer never stops, even if
the buffer is empty. When the buffer is empty, silence is played by the
device.
The ringbuffer is implemented with lockfree atomic operations,
especially on the reading side so that low-latency operations are
possible.
Whenever new samples are to be put into the ringbuffer, the position of
the read pointer is taken. The required write position is taken and the
diff is made between the required and actual position. If the difference
is \<0, the sample is too late. If the difference is bigger than
segtotal, the writing part has to wait for the play pointer to advance.
### Scheduling
#### chain based mode
In chain based mode, bytes are written into the ringbuffer. This
operation will eventually block when the ringbuffer is filled.
When no samples arrive in time, the ringbuffer will play silence. Each
buffer that arrives will be placed into the ringbuffer at the correct
times. This means that dropping samples or inserting silence is done
automatically and very accurate and independend of the play pointer.
In this mode, the ringbuffer is usually kept as full as possible. When
using a small buffer (small segsize and segtotal), the latency for audio
to start from the sink to when it is played can be kept low but at least
one context switch has to be made between read and write.
#### getrange based mode
In getrange based mode, the audiobasesink will use the callback
function of the ringbuffer to get a segsize samples from the peer
element. These samples will then be placed in the ringbuffer at the
next play position. It is assumed that the getrange function returns
fast enough to fill the ringbuffer before the play pointer reaches
the write pointer.
In this mode, the ringbuffer is usually kept as empty as possible.
There is no context switch needed between the elements that create
the samples and the actual writing of the samples to the device.
#### DMA mode
Elements that can do DMA based access to the audio device have to
subclass from the GstAudioBaseSink class and wrap the DMA ringbuffer
in a subclass of GstRingBuffer.
The ringbuffer subclass should trigger a callback after writing or
playing each sample to the device. This callback can be triggered
from a thread or from a signal from the audio device.
### Clocks
The GstAudioBaseSink class will use the ringbuffer to act as a clock
provider. It can do this by using the play pointer and the delay to
calculate the clock time.

View file

@ -0,0 +1,264 @@
# Decodebin design
## GstDecodeBin
### Description
- Autoplug and decode to raw media
- Input: single pad with ANY caps
- Output: Dynamic pads
### Contents
- a GstTypeFindElement connected to the single sink pad
- optionally a demuxer/parser
- optionally one or more DecodeGroup
### Autoplugging
The goal is to reach 'target' caps (by default raw media).
This is done by using the GstCaps of a source pad and finding the
available demuxers/decoders GstElement that can be linked to that pad.
The process starts with the source pad of typefind and stops when no
more non-target caps are left. It is commonly done while pre-rolling,
but can also happen whenever a new pad appears on any element.
Once a target caps has been found, that pad is ghosted and the
'pad-added' signal is emitted.
If no compatible elements can be found for a GstCaps, the pad is ghosted
and the 'unknown-type' signal is emitted.
### Assisted auto-plugging
When starting the auto-plugging process for a given GstCaps, two signals
are emitted in the following way in order to allow the application/user
to assist or fine-tune the process.
- **'autoplug-continue'**:
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
This signal is fired at the very beginning with the source pad GstCaps. If
the callback returns TRUE, the process continues normally. If the
callback returns FALSE, then the GstCaps are considered as a target caps
and the autoplugging process stops.
- **'autoplug-factories'**:
GValueArray user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps);
Get a list of elementfactories for @pad with @caps. This function is
used to instruct decodebin2 of the elements it should try to
autoplug. The default behaviour when this function is not overriden
is to get all elements that can handle @caps from the registry
sorted by rank.
- **'autoplug-select'**:
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps*caps, GValueArray* factories);
This signal is fired once autoplugging has got a list of compatible
GstElementFactory. The signal is emitted with the GstCaps of the
source pad and a pointer on the GValueArray of compatible factories.
The callback should return the index of the elementfactory in
@factories that should be tried next.
If the callback returns -1, the autoplugging process will stop as if
no compatible factories were found.
The default implementation of this function will try to autoplug the
first factory of the list.
### Target Caps
The target caps are a read/write GObject property of decodebin.
By default the target caps are:
- Raw audio: audio/x-raw
- Raw video: video/x-raw
- Raw text: text/x-raw, format={utf8,pango-markup}
### Media chain/group handling
When autoplugging, all streams coming out of a demuxer will be grouped
in a DecodeGroup.
All new source pads created on that demuxer after it has emitted the
'no-more-pads' signal will be put in another DecodeGroup.
Only one decodegroup can be active at any given time. If a new
decodegroup is created while another one exists, that decodegroup will
be set as blocking until the existing one has drained.
## DecodeGroup
### Description
Streams belonging to the same group/chain of a media file.
### Contents
The DecodeGroup contains:
- a GstMultiQueue to which all streams of a the media group are connected.
- the eventual decoders which are autoplugged in order to produce the
requested target pads.
### Proper group draining
The DecodeGroup takes care that all the streams in the group are
completely drained (EOS has come through all source ghost pads).
### Pre-roll and block
The DecodeGroup has a global blocking feature. If enabled, all the
ghosted source pads for that group will be blocked.
A method is available to unblock all blocked pads for that group.
## GstMultiQueue
Multiple input-output data queue.
`multiqueue` achieves the same functionality as `queue`, with a
few differences:
- Multiple streams handling.
The element handles queueing data on more than one stream at once.
To achieve such a feature it has request sink pads (sink\_%u) and
'sometimes' src pads (src\_%u).
When requesting a given sinkpad, the associated srcpad for that
stream will be created. Ex: requesting sink\_1 will generate src\_1.
- Non-starvation on multiple streams.
If more than one stream is used with the element, the streams'
queues will be dynamically grown (up to a limit), in order to ensure
that no stream is risking data starvation. This guarantees that at
any given time there are at least N bytes queued and available for
each individual stream.
If an EOS event comes through a srcpad, the associated queue should
be considered as 'not-empty' in the queue-size-growing algorithm.
- Non-linked srcpads graceful handling.
A GstTask is started for all srcpads when going to
GST\_STATE\_PAUSED.
The task are blocking against a GCondition which will be fired in
two different cases:
- When the associated queue has received a buffer.
- When the associated queue was previously declared as 'not-linked'
and the first buffer of the queue is scheduled to be pushed
synchronously in relation to the order in which it arrived globally
in the element (see 'Synchronous data pushing' below).
When woken up by the GCondition, the GstTask will try to push the
next GstBuffer/GstEvent on the queue. If pushing the
GstBuffer/GstEvent returns GST\_FLOW\_NOT\_LINKED, then the
associated queue is marked as 'not-linked'. If pushing the
GstBuffer/GstEvent succeeded the queue will no longer be marked as
'not-linked'.
If pushing on all srcpads returns GstFlowReturn different from
GST\_FLOW\_OK, then all the srcpads' tasks are stopped and
subsequent pushes on sinkpads will return GST\_FLOW\_NOT\_LINKED.
- Synchronous data pushing for non-linked pads.
In order to better support dynamic switching between streams, the
multiqueue (unlike the current GStreamer queue) continues to push
buffers on non-linked pads rather than shutting down.
In addition, to prevent a non-linked stream from very quickly
consuming all available buffers and thus 'racing ahead' of the other
streams, the element must ensure that buffers and inlined events for
a non-linked stream are pushed in the same order as they were
received, relative to the other streams controlled by the element.
This means that a buffer cannot be pushed to a non-linked pad any
sooner than buffers in any other stream which were received before
it.
## Parsers, decoders and auto-plugging
This section has DRAFT status.
Some media formats come in different "flavours" or "stream formats".
These formats differ in the way the setup data and media data is
signalled and/or packaged. An example for this is H.264 video, where
there is a bytestream format (with codec setup data signalled inline and
units prefixed by a sync code and packet length information) and a "raw"
format where codec setup data is signalled out of band (via the caps)
and the chunking is implicit in the way the buffers were muxed into a
container, to mention just two of the possible variants.
Especially on embedded platforms it is common that decoders can only
handle one particular stream format, and not all of them.
Where there are multiple stream formats, parsers are usually expected to
be able to convert between the different formats. This will, if
implemented correctly, work as expected in a static pipeline such as
... ! parser ! decoder ! sink
where the parser can query the decoder's capabilities even before
processing the first piece of data, and configure itself to convert
accordingly, if conversion is needed at all.
In an auto-plugging context this is not so straight-forward though,
because elements are plugged incrementally and not before the previous
element has processes some data and decided what it will output exactly
(unless the template caps are completely fixed, then it can continue
right away, this is not always the case here though, see below). A
parser will thus have to decide on *some* output format so auto-plugging
can continue. It doesn't know anything about the available decoders and
their capabilities though, so it's possible that it will choose a format
that is not supported by any of the available decoders, or by the
preferred decoder.
If the parser had sufficiently concise but fixed source pad template
caps, decodebin could continue to plug a decoder right away, allowing
the parser to configure itself in the same way as it would with a static
pipeline. This is not an option, unfortunately, because often the parser
needs to process some data to determine e.g. the format's profile or
other stream properties (resolution, sample rate, channel configuration,
etc.), and there may be different decoders for different profiles (e.g.
DSP codec for baseline profile, and software fallback for main/high
profile; or a DSP codec only supporting certain resolutions, with a
software fallback for unusual resolutions). So if decodebin just plugged
the most highest-ranking decoder, that decoder might not be be able to
handle the actual stream later on, which would yield an error (this is a
data flow error then which would be hard to intercept and avoid in
decodebin). In other words, we can't solve this issue by plugging a
decoder right away with the parser.
So decodebin needs to communicate to the parser the set of available
decoder caps (which would contain the relevant capabilities/restrictions
such as supported profiles, resolutions, etc.), after the usual
"autoplug-\*" signal filtering/sorting of course.
This is done by plugging a capsfilter element right after the parser,
and constructing set of filter caps from the list of available decoders
(one appends at the end just the name(s) of the caps structures from the
parser pad template caps to function as an 'ANY other' caps equivalent).
This let the parser negotiate to a supported stream format in the same
way as with the static pipeline mentioned above, but of course incur
some overhead through the additional capsfilter element.

469
markdown/design/encoding.md Normal file
View file

@ -0,0 +1,469 @@
## Encoding and Muxing
## Problems this proposal attempts to solve
- Duplication of pipeline code for gstreamer-based applications
wishing to encode and or mux streams, leading to subtle differences
and inconsistencies across those applications.
- No unified system for describing encoding targets for applications
in a user-friendly way.
- No unified system for creating encoding targets for applications,
resulting in duplication of code across all applications,
differences and inconsistencies that come with that duplication, and
applications hardcoding element names and settings resulting in poor
portability.
## Goals
1. Convenience encoding element
Create a convenience GstBin for encoding and muxing several streams,
hereafter called 'EncodeBin'.
This element will only contain one single property, which is a profile.
2. Define a encoding profile system
3. Encoding profile helper library
Create a helper library to:
- create EncodeBin instances based on profiles, and
- help applications to create/load/save/browse those profiles.
## EncodeBin
### Proposed API
EncodeBin is a GstBin subclass.
It implements the GstTagSetter interface, by which it will proxy the
calls to the muxer.
Only two introspectable property (i.e. usable without extra API):
- A GstEncodingProfile
- The name of the profile to use
When a profile is selected, encodebin will:
- Add REQUEST sinkpads for all the GstStreamProfile
- Create the muxer and expose the source pad
Whenever a request pad is created, encodebin will:
- Create the chain of elements for that pad
- Ghost the sink pad
- Return that ghost pad
This allows reducing the code to the minimum for applications wishing to
encode a source for a given profile:
encbin = gst_element_factory_make ("encodebin, NULL);
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
gst_element_link (encbin, filesink);
vsrcpad = gst_element_get_src_pad (source, "src1");
vsinkpad = gst_element_get_request\_pad (encbin, "video\_%u");
gst_pad_link (vsrcpad, vsinkpad);
### Explanation of the Various stages in EncodeBin
This describes the various stages which can happen in order to end up
with a multiplexed stream that can then be stored or streamed.
#### Incoming streams
The streams fed to EncodeBin can be of various types:
- Video
- Uncompressed (but maybe subsampled)
- Compressed
- Audio
- Uncompressed (audio/x-raw)
- Compressed
- Timed text
- Private streams
#### Steps involved for raw video encoding
0) Incoming Stream
1) Transform raw video feed (optional)
Here we modify the various fundamental properties of a raw video stream
to be compatible with the intersection of: \* The encoder GstCaps and \*
The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modified are: \* width/height
This is done with a video scaler. The DAR (Display Aspect Ratio) MUST be
respected. If needed, black borders can be added to comply with the
target DAR. \* framerate \* format/colorspace/depth All of this is done
with a colorspace converter
2) Actual encoding (optional for raw streams)
An encoder (with some optional settings) is used.
3) Muxing
A muxer (with some optional settings) is used.
4) Outgoing encoded and muxed stream
#### Steps involved for raw audio encoding
This is roughly the same as for raw video, expect for (1)
1) Transform raw audo feed (optional)
We modify the various fundamental properties of a raw audio stream to be
compatible with the intersection of: \* The encoder GstCaps and \* The
specified "Stream Restriction" of the profile/target
The fundamental properties that can be modifier are: \* Number of
channels \* Type of raw audio (integer or floating point) \* Depth
(number of bits required to encode one sample)
#### Steps involved for encoded audio/video streams
Steps (1) and (2) are replaced by a parser if a parser is available for
the given format.
#### Steps involved for other streams
Other streams will just be forwarded as-is to the muxer, provided the
muxer accepts the stream type.
## Encoding Profile System
This work is based on:
- The existing [GstPreset API documentation][gst-preset] system for elements
- The gnome-media [GConf audio profile system][gconf-audio-profile]
- The investigation done into device profiles by Arista and
Transmageddon: [Research on a Device Profile API][device-profile-api],
and [Research on defining presets usage][preset-usage].
### Terminology
- Encoding Target Category A Target Category is a classification of
devices/systems/use-cases for encoding.
Such a classification is required in order for: \* Applications with a
very-specific use-case to limit the number of profiles they can offer
the user. A screencasting application has no use with the online
services targets for example. \* Offering the user some initial
classification in the case of a more generic encoding application (like
a video editor or a transcoder).
Ex: Consumer devices Online service Intermediate Editing Format
Screencast Capture Computer
- Encoding Profile Target A Profile Target describes a specific entity
for which we wish to encode. A Profile Target must belong to at
least one Target Category. It will define at least one Encoding
Profile.
Examples (with category): Nokia N900 (Consumer device) Sony PlayStation 3
(Consumer device) Youtube (Online service) DNxHD (Intermediate editing
format) HuffYUV (Screencast) Theora (Computer)
- Encoding Profile A specific combination of muxer, encoders, presets
and limitations.
Examples: Nokia N900/H264 HQ, Ipod/High Quality, DVD/Pal,
Youtube/High Quality HTML5/Low Bandwith, DNxHD
### Encoding Profile
An encoding profile requires the following information:
- Name This string is not translatable and must be unique. A
recommendation to guarantee uniqueness of the naming could be:
<target>/<name>
- Description This is a translatable string describing the profile
- Muxing format This is a string containing the GStreamer media-type
of the container format.
- Muxing preset This is an optional string describing the preset(s) to
use on the muxer.
- Multipass setting This is a boolean describing whether the profile
requires several passes.
- List of Stream Profile
2.3.1 Stream Profiles
A Stream Profile consists of:
- Type The type of stream profile (audio, video, text, private-data)
- Encoding Format This is a string containing the GStreamer media-type
of the encoding format to be used. If encoding is not to be applied,
the raw audio media type will be used.
- Encoding preset This is an optional string describing the preset(s)
to use on the encoder.
- Restriction This is an optional GstCaps containing the restriction
of the stream that can be fed to the encoder. This will generally
containing restrictions in video width/heigh/framerate or audio
depth.
- presence This is an integer specifying how many streams can be used
in the containing profile. 0 means that any number of streams can be
used.
- pass This is an integer which is only meaningful if the multipass
flag has been set in the profile. If it has been set it indicates
which pass this Stream Profile corresponds to.
### 2.4 Example profile
The representation used here is XML only as an example. No decision is
made as to which formatting to use for storing targets and profiles.
<gst-encoding-target>
<name>Nokia N900</name>
<category>Consumer Device</category>
<profiles>
<profile>Nokia N900/H264 HQ</profile>
<profile>Nokia N900/MP3</profile>
<profile>Nokia N900/AAC</profile>
</profiles>
</gst-encoding-target>
<gst-encoding-profile>
<name>Nokia N900/H264 HQ</name>
<description>
High Quality H264/AAC for the Nokia N900
</description>
<format>video/quicktime,variant=iso</format>
<streams>
<stream-profile>
<type>audio</type>
<format>audio/mpeg,mpegversion=4</format>
<preset>Quality High/Main</preset>
<restriction>audio/x-raw,channels=[1,2]</restriction>
<presence>1</presence>
</stream-profile>
<stream-profile>
<type>video</type>
<format>video/x-h264</format>
<preset>Profile Baseline/Quality High</preset>
<restriction>
video/x-raw,width=[16, 800],\
height=[16, 480],framerate=[1/1, 30000/1001]
</restriction>
<presence>1</presence>
</stream-profile>
</streams>
</gst-encoding-profile>
### API
A proposed C API is contained in the gstprofile.h file in this
directory.
### Modifications required in the existing GstPreset system
#### Temporary preset.
Currently a preset needs to be saved on disk in order to be used.
This makes it impossible to have temporary presets (that exist only
during the lifetime of a process), which might be required in the new
proposed profile system
#### Categorisation of presets.
Currently presets are just aliases of a group of property/value without
any meanings or explanation as to how they exclude each other.
Take for example the H264 encoder. It can have presets for: \* passes
(1,2 or 3 passes) \* profiles (Baseline, Main, ...) \* quality (Low,
medium, High)
In order to programmatically know which presets exclude each other, we
here propose the categorisation of these presets.
This can be done in one of two ways 1. in the name (by making the name
be \[<category>:\]<name>) This would give for example: "Quality:High",
"Profile:Baseline" 2. by adding a new \_meta key This would give for
example: \_meta/category:quality
#### Aggregation of presets.
There can be more than one choice of presets to be done for an element
(quality, profile, pass).
This means that one can not currently describe the full configuration of
an element with a single string but with many.
The proposal here is to extend the GstPreset API to be able to set all
presets using one string and a well-known separator ('/').
This change only requires changes in the core preset handling code.
This would allow doing the following: gst\_preset\_load\_preset
(h264enc, "pass:1/profile:baseline/quality:high");
### Points to be determined
This document hasn't determined yet how to solve the following problems:
#### Storage of profiles
One proposal for storage would be to use a system wide directory (like
$prefix/share/gstreamer-0.10/profiles) and store XML files for every
individual profiles.
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
This poses some limitations as to what to do if some applications want
to have some profiles limited to their own usage.
## Helper library for profiles
These helper methods could also be added to existing libraries (like
GstPreset, GstPbUtils, ..).
The various API proposed are in the accompanying gstprofile.h file.
### Getting user-readable names for formats
This is already provided by GstPbUtils.
### Hierarchy of profiles
The goal is for applications to be able to present to the user a list of
combo-boxes for choosing their output profile:
\[ Category \] \# optional, depends on the application \[ Device/Site/..
\] \# optional, depends on the application \[ Profile \]
Convenience methods are offered to easily get lists of categories,
devices, and profiles.
### Creating Profiles
The goal is for applications to be able to easily create profiles.
The applications needs to be able to have a fast/efficient way to: \*
select a container format and see all compatible streams he can use with
it. \* select a codec format and see which container formats he can use
with it.
The remaining parts concern the restrictions to encoder input.
### Ensuring availability of plugins for Profiles
When an application wishes to use a Profile, it should be able to query
whether it has all the needed plugins to use it.
This part will use GstPbUtils to query, and if needed install the
missing plugins through the installed distribution plugin installer.
## Use-cases researched
This is a list of various use-cases where encoding/muxing is being used.
### Transcoding
The goal is to convert with as minimal loss of quality any input file
for a target use. A specific variant of this is transmuxing (see below).
Example applications: Arista, Transmageddon
### Rendering timelines
The incoming streams are a collection of various segments that need to
be rendered. Those segments can vary in nature (i.e. the video
width/height can change). This requires the use of identiy with the
single-segment property activated to transform the incoming collection
of segments to a single continuous segment.
Example applications: PiTiVi, Jokosher
### Encoding of live sources
The major risk to take into account is the encoder not encoding the
incoming stream fast enough. This is outside of the scope of encodebin,
and should be solved by using queues between the sources and encodebin,
as well as implementing QoS in encoders and sources (the encoders
emitting QoS events, and the upstream elements adapting themselves
accordingly).
Example applications: camerabin, cheese
### Screencasting applications
This is similar to encoding of live sources. The difference being that
due to the nature of the source (size and amount/frequency of updates)
one might want to do the encoding in two parts: \* The actual live
capture is encoded with a 'almost-lossless' codec (such as huffyuv) \*
Once the capture is done, the file created in the first step is then
rendered to the desired target format.
Fixing sources to only emit region-updates and having encoders capable
of encoding those streams would fix the need for the first step but is
outside of the scope of encodebin.
Example applications: Istanbul, gnome-shell, recordmydesktop
### Live transcoding
This is the case of an incoming live stream which will be
broadcasted/transmitted live. One issue to take into account is to
reduce the encoding latency to a minimum. This should mostly be done by
picking low-latency encoders.
Example applications: Rygel, Coherence
### Transmuxing
Given a certain file, the aim is to remux the contents WITHOUT decoding
into either a different container format or the same container format.
Remuxing into the same container format is useful when the file was not
created properly (for example, the index is missing). Whenever
available, parsers should be applied on the encoded streams to validate
and/or fix the streams before muxing them.
Metadata from the original file must be kept in the newly created file.
Example applications: Arista, Transmaggedon
### Loss-less cutting
Given a certain file, the aim is to extract a certain part of the file
without going through the process of decoding and re-encoding that file.
This is similar to the transmuxing use-case.
Example applications: PiTiVi, Transmageddon, Arista, ...
### Multi-pass encoding
Some encoders allow doing a multi-pass encoding. The initial pass(es)
are only used to collect encoding estimates and are not actually muxed
and outputted. The final pass uses previously collected information, and
the output is then muxed and outputted.
### Archiving and intermediary format
The requirement is to have lossless
### CD ripping
Example applications: Sound-juicer
### DVD ripping
Example application: Thoggen
### Research links
Some of these are still active documents, some other not
[gst-preset]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
[gconf-audio-profile]: http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
[device-profile-api]: http://gstreamer.freedesktop.org/wiki/DeviceProfile (FIXME: wiki is gone)
[preset-usage]: http://gstreamer.freedesktop.org/wiki/PresetDesign (FIXME: wiki is gone)

View file

@ -0,0 +1,102 @@
# Interlaced Video
Video buffers have a number of states identifiable through a combination
of caps and buffer flags.
Possible states:
- Progressive
- Interlaced
- Plain
- One field
- Two fields
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
- Telecine
- One field
- Two fields
- Progressive
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
Note: It can be seen that the difference between the plain interlaced
and telecine states is that in the telecine state, buffers containing
two fields may be progressive.
Tools for identification:
- GstVideoInfo
- GstVideoInterlaceMode - enum `GST_VIDEO_INTERLACE_MODE_...`
- PROGRESSIVE
- INTERLEAVED
- MIXED
- Buffers flags - `GST_VIDEO_BUFFER_FLAG_...`
- TFF
- RFF
- ONEFIELD
- INTERLACED
## Identification of Buffer States
Note that flags are not necessarily interpreted in the same way for all
different states nor are they necessarily required nor make sense in all
cases.
### Progressive
If the interlace mode in the video info corresponding to a buffer is
**"progressive"**, then the buffer is progressive.
### Plain Interlaced
If the video info interlace mode is **"interleaved"**, then the buffer is
plain interlaced.
`GST_VIDEO_BUFFER_FLAG_TFF` indicates whether the top or bottom field
is to be displayed first. The timestamp on the buffer corresponds to the
first field.
`GST_VIDEO_BUFFER_FLAG_RFF` indicates that the first field (indicated
by the TFF flag) should be repeated. This is generally only used for
telecine purposes but as the telecine state was added long after the
interlaced state was added and defined, this flag remains valid for
plain interlaced buffers.
`GST_VIDEO_BUFFER_FLAG_ONEFIELD` means that only the field indicated
through the TFF flag is to be used. The other field should be ignored.
### Telecine
If video info interlace mode is **"mixed"** then the buffers are in some
form of telecine state.
The `TFF` and `ONEFIELD` flags have the same semantics as for the plain
interlaced state.
`GST_VIDEO_BUFFER_FLAG_RFF` in the telecine state indicates that the
buffer contains only repeated fields that are present in other buffers
and are as such unneeded. For example, in a sequence of three telecined
frames, we might have:
AtAb AtBb BtBb
In this situation, we only need the first and third buffers as the
second buffer contains fields present in the first and third.
Note that the following state can have its second buffer identified
using the `ONEFIELD` flag (and `TFF` not set):
AtAb AtBb BtCb
The telecine state requires one additional flag to be able to identify
progressive buffers.
The presence of the `GST_VIDEO_BUFFER_FLAG_INTERLACED` means that the
buffer is an 'interlaced' or 'mixed' buffer that contains two fields
that, when combined with fields from adjacent buffers, allow
reconstruction of progressive frames. The absence of the flag implies
the buffer containing two fields is a progressive frame.
For example in the following sequence, the third buffer would be mixed
(yes, it is a strange pattern, but it can happen):
AtAb AtBb BtCb CtDb DtDb

View file

@ -0,0 +1,97 @@
# Forcing keyframes
Consider the following use case:
We have a pipeline that performs video and audio capture from a live
source, compresses and muxes the streams and writes the resulting data
into a file.
Inside the uncompressed video data we have a specific pattern inserted
at specific moments that should trigger a switch to a new file, meaning,
we close the existing file we are writing to and start writing to a new
file.
We want the new file to start with a keyframe so that one can start
decoding the file immediately.
## Components
1) We need an element that is able to detect the pattern in the video
stream.
2) We need to inform the video encoder that it should start encoding a
keyframe starting from exactly the frame with the pattern.
3) We need to inform the demuxer that it should flush out any pending
data and start creating the start of a new file with the keyframe as
a first video frame.
4) We need to inform the sink element that it should start writing to
the next file. This requires application interaction to instruct the
sink of the new filename. The application should also be free to
ignore the boundary and continue to write to the existing file. The
application will typically use an event pad probe to detect the
custom event.
## Implementation
### Downstream
The implementation would consist of generating a `GST_EVENT_CUSTOM_DOWNSTREAM`
event that marks the keyframe boundary. This event is inserted into the
pipeline by the application upon a certain trigger. In the above use case
this trigger would be given by the element that detects the pattern, in the
form of an element message.
The custom event would travel further downstream to instruct encoder,
muxer and sink about the possible switch.
The information passed in the event consists of:
**GstForceKeyUnit**
- **"timestamp"** (`G_TYPE_UINT64`): the timestamp of the buffer that
triggered the event.
- **"stream-time"** (`G_TYPE_UINT64`): the stream position that triggered the event.
- **"running-time"** (`G_TYPE_UINT64`): the running time of the stream when
the event was triggered.
- **"all-headers"** (`G_TYPE_BOOLEAN`): Send all headers, including
those in the caps or those sent at the start of the stream.
- **...**: optional other data fields.
Note that this event is purely informational, no element is required to
perform an action but it should forward the event downstream, just like
any other event it does not handle.
Elements understanding the event should behave as follows:
1) The video encoder receives the event before the next frame. Upon
reception of the event it schedules to encode the next frame as a
keyframe. Before pushing out the encoded keyframe it must push the
GstForceKeyUnit event downstream.
2) The muxer receives the GstForceKeyUnit event and flushes out its
current state, preparing to produce data that can be used as a
keyunit. Before pushing out the new data it pushes the
GstForceKeyUnit event downstream.
3) The application receives the GstForceKeyUnit on a sink padprobe of
the sink and reconfigures the sink to make it perform new actions
after receiving the next buffer.
### Upstream
When using RTP packets can get lost or receivers can be added at any
time, they may request a new key frame.
An downstream element sends an upstream "GstForceKeyUnit" event up the
pipeline.
When an element produces some kind of key unit in output, but has no
such concept in its input (like an encoder that takes raw frames), it
consumes the event (doesn't pass it upstream), and instead sends a
downstream GstForceKeyUnit event and a new keyframe.

View file

@ -0,0 +1,68 @@
# Raw Audio Media Types
**audio/x-raw**
- **format**, G\_TYPE\_STRING, mandatory The format of the audio samples, see
the Formats section for a list of valid sample formats.
- **rate**, G\_TYPE\_INT, mandatory The samplerate of the audio
- **channels**, G\_TYPE\_INT, mandatory The number of channels
- **channel-mask**, GST\_TYPE\_BITMASK, mandatory for more than 2 channels
Bitmask of channel positions present. May be omitted for mono and
stereo. May be set to 0 to denote that the channels are unpositioned.
- **layout**, G\_TYPE\_STRING, mandatory The layout of channels within a
buffer. Possible values are "interleaved" (for LRLRLRLR) and
"non-interleaved" (LLLLRRRR)
Use `GstAudioInfo` and related helper API to create and parse raw audio caps.
## Metadata
- `GstAudioDownmixMeta`: A matrix for downmixing multichannel audio to a
lower numer of channels.
## Formats
The following values can be used for the format string property.
- "S8" 8-bit signed PCM audio
- "U8" 8-bit unsigned PCM audio
- "S16LE" 16-bit signed PCM audio
- "S16BE" 16-bit signed PCM audio
- "U16LE" 16-bit unsigned PCM audio
- "U16BE" 16-bit unsigned PCM audio
- "S24\_32LE" 24-bit signed PCM audio packed into 32-bit
- "S24\_32BE" 24-bit signed PCM audio packed into 32-bit
- "U24\_32LE" 24-bit unsigned PCM audio packed into 32-bit
- "U24\_32BE" 24-bit unsigned PCM audio packed into 32-bit
- "S32LE" 32-bit signed PCM audio
- "S32BE" 32-bit signed PCM audio
- "U32LE" 32-bit unsigned PCM audio
- "U32BE" 32-bit unsigned PCM audio
- "S24LE" 24-bit signed PCM audio
- "S24BE" 24-bit signed PCM audio
- "U24LE" 24-bit unsigned PCM audio
- "U24BE" 24-bit unsigned PCM audio
- "S20LE" 20-bit signed PCM audio
- "S20BE" 20-bit signed PCM audio
- "U20LE" 20-bit unsigned PCM audio
- "U20BE" 20-bit unsigned PCM audio
- "S18LE" 18-bit signed PCM audio
- "S18BE" 18-bit signed PCM audio
- "U18LE" 18-bit unsigned PCM audio
- "U18BE" 18-bit unsigned PCM audio
- "F32LE" 32-bit floating-point audio
- "F32BE" 32-bit floating-point audio
- "F64LE" 64-bit floating-point audio
- "F64BE" 64-bit floating-point audio

View file

@ -0,0 +1,22 @@
# Raw Text Media Types
**text/x-raw**
- **format**, G\_TYPE\_STRING, mandatory The format of the text, see the
Formats section for a list of valid format strings.
## Metadata
There are no common metas for this raw format yet.
## Formats
- "utf8": plain timed utf8 text (formerly text/plain)
Parsed timed text in utf8 format.
- "pango-markup": plain timed utf8 text with pango markup
(formerly text/x-pango-markup). Same as "utf8", but text embedded in an
XML-style markup language for size, colour, emphasis, etc.
See [Pango Markup Format][pango-markup]
[pango-markup]: http://developer.gnome.org/pango/stable/PangoMarkupFormat.html

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,159 @@
# Orc Integration
## About Orc
Orc code can be in one of two forms: in .orc files that is converted by
orcc to C code that calls liborc functions, or C code that calls liborc
to create complex operations at runtime. The former is mostly for
functions with predetermined functionality. The latter is for
functionality that is determined at runtime, where writing .orc
functions for all combinations would be prohibitive. Orc also has a fast
memcpy and memset which are useful independently.
## Fast memcpy()
\*\*\* This part is not integrated yet. \*\*\*
Orc has built-in functions `orc_memcpy()` and `orc_memset()` that work
like `memcpy()` and `memset()`. These are meant for large copies only. A
reasonable cutoff for using `orc_memcpy()` instead of `memcpy()` is if the
number of bytes is generally greater than 100. **DO NOT** use `orc_memcpy()`
if the typical is size is less than 20 bytes, especially if the size is
known at compile time, as these cases are inlined by the compiler.
(Example: sys/ximage/ximagesink.c)
Add $(ORC\_CFLAGS) to libgstximagesink\_la\_CFLAGS and $(ORC\_LIBS) to
libgstximagesink\_la\_LIBADD. Then, in the source file, add:
\#ifdef HAVE\_ORC \#include <orc/orc.h> \#else \#define
orc\_memcpy(a,b,c) memcpy(a,b,c) \#endif
Then switch relevant uses of memcpy() to orc\_memcpy().
The above example works whether or not Orc is enabled at compile time.
## Normal Usage
The following lines are added near the top of Makefile.am for plugins
that use Orc code in .orc files (this is for the volume plugin):
ORC\_BASE=volume include $(top\_srcdir)/common/orc.mk
Also add the generated source file to the plugin build:
nodist\_libgstvolume\_la\_SOURCES = $(ORC\_SOURCES)
And of course, add $(ORC\_CFLAGS) to libgstvolume\_la\_CFLAGS, and
$(ORC\_LIBS) to libgstvolume\_la\_LIBADD.
The value assigned to ORC\_BASE does not need to be related to the name
of the plugin.
## Advanced Usage
The Holy Grail of Orc usage is to programmatically generate Orc code at
runtime, have liborc compile it into binary code at runtime, and then
execute this code. Currently, the best example of this is in
Schroedinger. An example of how this would be used is audioconvert:
given an input format, channel position manipulation, dithering and
quantizing configuration, and output format, a Orc code generator would
create an OrcProgram, add the appropriate instructions to do each step
based on the configuration, and then compile the program. Successfully
compiling the program would return a function pointer that can be called
to perform the operation.
This sort of advanced usage requires structural changes to current
plugins (e.g., audioconvert) and will probably be developed
incrementally. Moreover, if such code is intended to be used without Orc
as strict build/runtime requirement, two codepaths would need to be
developed and tested. For this reason, until GStreamer requires Orc, I
think it's a good idea to restrict such advanced usage to the cog plugin
in -bad, which requires Orc.
## Build Process
The goal of the build process is to make Orc non-essential for most
developers and users. This is not to say you shouldn't have Orc
installed -- without it, you will get slow backup C code, just that
people compiling GStreamer are not forced to switch from Liboil to Orc
immediately.
With Orc installed, the build process will use the Orc Compiler (orcc)
to convert each .orc file into a temporary C source (tmp-orc.c) and a
temporary header file (${name}orc.h if constructed from ${base}.orc).
The C source file is compiled and linked to the plugin, and the header
file is included by other source files in the plugin.
If 'make orc-update' is run in the source directory, the files tmp-orc.c
and ${base}orc.h are copied to ${base}orc-dist.c and ${base}orc-dist.h
respectively. The -dist.\[ch\] files are automatically disted via
orc.mk. The -dist.\[ch\] files should be checked in to git whenever the
.orc source is changed and checked in. Example workflow:
edit .orc file ... make, test, etc. make orc-update git add volume.orc
volumeorc-dist.c volumeorc-dist.h git commit
At 'make dist' time, all of the .orc files are compiled, and then copied
to their -dist.\[ch\] counterparts, and then the -dist.\[ch\] files are
added to the dist directory.
Without Orc installed (or --disable-orc given to configure), the
-dist.\[ch\] files are copied to tmp-orc.c and ${name}orc.h. When
compiled Orc disabled, DISABLE\_ORC is defined in config.h, and the C
backup code is compiled. This backup code is pure C, and does not
include orc headers or require linking against liborc.
The common/orc.mk build method is limited by the inflexibility of
automake. The file tmp-orc.c must be a fixed filename, using ORC\_NAME
to generate the filename does not work because it conflicts with
automake's dependency generation. Building multiple .orc files is not
possible due to this restriction.
## Testing
If you create another .orc file, please add it to tests/orc/Makefile.am.
This causes automatic test code to be generated and run during 'make
check'. Each function in the .orc file is tested by comparing the
results of executing the run-time compiled code and the C backup
function.
## Orc Limitations
### audioconvert
Orc doesn't have a mechanism for generating random numbers, which
prevents its use as-is for dithering. One way around this is to generate
suitable dithering values in one pass, then use those values in a second
Orc-based pass.
Orc doesn't handle 64-bit float, for no good reason.
Irrespective of Orc handling 64-bit float, it would be useful to have a
direct 32-bit float to 16-bit integer conversion.
audioconvert is a good candidate for programmatically generated Orc code.
audioconvert enumerates functions in terms of big-endian vs.
little-endian. Orc's functions are "native" and "swapped".
Programmatically generating code removes the need to worry about this.
Orc doesn't handle 24-bit samples. Fixing this is not a priority (for ds).
### videoscale
Orc doesn't handle horizontal resampling yet. The plan is to add special
sampling opcodes, for nearest, bilinear, and cubic interpolation.
### videotestsrc
Lots of code in videotestsrc needs to be rewritten to be SIMD (and Orc)
friendly, e.g., stuff that uses `oil_splat_u8()`.
A fast low-quality random number generator in Orc would be useful here.
### volume
Many of the comments on audioconvert apply here as well.
There are a bunch of FIXMEs in here that are due to misapplied patches.

View file

@ -0,0 +1,66 @@
# playbin
The purpose of this element is to decode and render the media contained
in a given generic uri. The element extends GstPipeline and is typically
used in playback situations.
Required features:
- accept and play any valid uri. This includes
- rendering video/audio
- overlaying subtitles on the video
- optionally read external subtitle files
- allow for hardware (non raw) sinks
- selection of audio/video/subtitle streams based on language.
- perform network buffering/incremental download
- gapless playback
- support for visualisations with configurable sizes
- ability to reject files that are too big, or of a format that would
require too much CPU/memory usage.
- be very efficient with adding elements such as converters to reduce
the amount of negotiation that has to happen.
- handle chained oggs. This includes having support for dynamic pad
add and remove from a demuxer.
## Components
### decodebin
- performs the autoplugging of demuxers/decoders
- emits signals when for steering the autoplugging
- to decide if a non-raw media format is acceptable as output
- to sort the possible decoders for a non-raw format
- see also decodebin2 design doc
### uridecodebin
- combination of a source to handle the given uri, an optional
queueing element and one or more decodebin2 elements to decode the
non-raw streams.
### playsink
- handles display of audio/video/text.
- has request audio/video/text input pad. There is only one sinkpad
per type. The requested pads define the configuration of the
internal pipeline.
- allows for setting audio/video sinks or does automatic
sink selection.
- allows for configuration of visualisation element.
- allows for enable/disable of visualisation, audio and video.
### playbin
- combination of one or more uridecodebin elements to read the uri and
subtitle uri.
- support for queuing new media to support gapless playback.
- handles stream selection.
- uses playsink to display.
- selection of sinks and configuration of uridecodebin with raw
output formats.
## Gapless playback feature
playbin has an "about-to-finish" signal. The application should
configure a new uri (and optional suburi) in the callback. When the
current media finishes, this new media will be played next.

View file

@ -0,0 +1,320 @@
# Stereoscopic & Multiview Video Handling
There are two cases to handle:
- Encoded video output from a demuxer to parser / decoder or from encoders
into a muxer.
- Raw video buffers
The design below is somewhat based on the proposals from
[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
Multiview is used as a generic term to refer to handling both
stereo content (left and right eye only) as well as extensions for videos
containing multiple independent viewpoints.
## Encoded Signalling
This is regarding the signalling in caps and buffers from demuxers to
parsers (sometimes) or out from encoders.
For backward compatibility with existing codecs many transports of
stereoscopic 3D content use normal 2D video with 2 views packed spatially
in some way, and put extra new descriptions in the container/mux.
Info in the demuxer seems to apply to stereo encodings only. For all
MVC methods I know, the multiview encoding is in the video bitstream itself
and therefore already available to decoders. Only stereo systems have been retro-fitted
into the demuxer.
Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
and it would be useful to be able to put the info onto caps and buffers from the
parser without decoding.
To handle both cases, we need to be able to output the required details on
encoded video for decoders to apply onto the raw video buffers they decode.
*If there ever is a need to transport multiview info for encoded data the
same system below for raw video or some variation should work*
### Encoded Video: Properties that need to be encoded into caps
1. multiview-mode (called "Channel Layout" in bug 611157)
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
(switches between mono and stereo - mp4 can do this)
* Uses a buffer flag to mark individual buffers as mono or "not mono"
(single|stereo|multiview) for mixed scenarios. The alternative (not
proposed) is for the demuxer to switch caps for each mono to not-mono
change, and not used a 'mixed' caps variant at all.
* _single_ refers to a stream of buffers that only contain 1 view.
It is different from mono in that the stream is a marked left or right
eye stream for later combining in a mixer or when displaying.
* _multiple_ marks a stream with multiple independent views encoded.
It is included in this list for completeness. As noted above, there's
currently no scenario that requires marking encoded buffers as MVC.
2. Frame-packing arrangements / view sequence orderings
* Possible frame packings: side-by-side, side-by-side-quincunx,
column-interleaved, row-interleaved, top-bottom, checker-board
* bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
I think that's covered by suitably adjusting pixel-aspect-ratio. If
not, they can be added later.
* _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
* _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
* _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
1 pixel offset of each eye needs to be accounted when upscaling or displaying
* there may be other packings (future expansion)
* Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
* _frame-by-frame_, each buffer is left, then right view etc
* _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
Demuxer info indicates which one is which.
Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
-> *Leave this for future expansion*
* _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
-> *Leave this for future expansion / deletion*
3. view encoding order
* Describes how to decide which piece of each frame corresponds to left or right eye
* Possible orderings left, right, left-then-right, right-then-left
- Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
- Need a buffer flag for marking the first buffer of a group.
4. "Frame layout flags"
* flags for view specific interpretation
* horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
* This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
* It might be better just to use a hex value / integer
## Buffer representation for raw video
- Transported as normal video buffers with extra metadata
- The caps define the overall buffer width/height, with helper functions to
extract the individual views for packed formats
- pixel-aspect-ratio adjusted if needed to double the overall width/height
- video sinks that don't know about multiview extensions yet will show the
packed view as-is. For frame-sequence outputs, things might look weird, but
just adding multiview-mode to the sink caps can disallow those transports.
- _row-interleaved_ packing is actually just side-by-side memory layout with
half frame width, twice the height, so can be handled by adjusting the
overall caps and strides
- Other exotic layouts need new pixel formats defined (checker-board,
column-interleaved, side-by-side-quincunx)
- _Frame-by-frame_ - one view per buffer, but with alternating metas marking
which buffer is which left/right/other view and using a new buffer flag as
described above to mark the start of a group of corresponding frames.
- New video caps addition as for encoded buffers
### Proposed Caps fields
Combining the requirements above and collapsing the combinations into mnemonics:
* multiview-mode =
mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
* multiview-flags =
+ 0x0000 none
+ 0x0001 right-view-first
+ 0x0002 left-h-flipped
+ 0x0004 left-v-flipped
+ 0x0008 right-h-flipped
+ 0x0010 right-v-flipped
### Proposed new buffer flags
Add two new `GST_VIDEO_BUFFER_*` flags in video-frame.h and make it clear that
those flags can apply to encoded video buffers too. wtay says that's currently
the case anyway, but the documentation should say it.
- **`GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW`** - Marks a buffer as representing
non-mono content, although it may be a single (left or right) eye view.
- **`GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE`** - for frame-sequential methods of
transport, mark the "first" of a left/right/other group of frames
### A new GstMultiviewMeta
This provides a place to describe all provided views in a buffer / stream,
and through Meta negotiation to inform decoders about which views to decode if
not all are wanted.
* Logical labels/names and mapping to GstVideoMeta numbers
* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
struct GstVideoMultiviewViewInfo {
guint view_label;
guint meta_id; // id of the GstVideoMeta for this view
padding;
}
struct GstVideoMultiviewMeta {
guint n_views;
GstVideoMultiviewViewInfo *view_info;
}
The meta is optional, and probably only useful later for MVC
## Outputting stereo content
The initial implementation for output will be stereo content in glimagesink
### Output Considerations with OpenGL
- If we have support for stereo GL buffer formats, we can output separate
left/right eye images and let the hardware take care of display.
- Otherwise, glimagesink needs to render one window with left/right in a
suitable frame packing and that will only show correctly in fullscreen on a
device set for the right 3D packing -> requires app intervention to set the
video mode.
- Which could be done manually on the TV, or with HDMI 1.4 by setting the
right video mode for the screen to inform the TV or third option, we support
rendering to two separate overlay areas on the screen - one for left eye,
one for right which can be supported using the 'splitter' element and two
output sinks or, better, add a 2nd window overlay for split stereo output
- Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so
initial implementation won't include that
## Other elements for handling multiview content
- videooverlay interface extensions
- __Q__: Should this be a new interface?
- Element message to communicate the presence of stereoscopic information to the app
- App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
- Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
- New API for the app to set rendering options for stereo/multiview content
- This might be best implemented as a **multiview GstContext**, so that
the pipeline can share app preferences for content interpretation and downmixing
to mono for output, or in the sink and have those down as far upstream/downstream as possible.
- Converter element
- convert different view layouts
- Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
- Mixer element
- take 2 video streams and output as stereo
- later take n video streams
- share code with the converter, it just takes input from n pads instead of one.
- Splitter element
- Output one pad per view
### Implementing MVC handling in decoders / parsers (and encoders)
Things to do to implement MVC handling
1. Parsing SEI in h264parse and setting caps (patches available in
bugzilla for parsing, see below)
2. Integrate gstreamer-vaapi MVC support with this proposal
3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
4. generating SEI in H.264 encoder
5. Support for MPEG2 MVC extensions
## Relevant bugs
- [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
- [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
- [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
## Other Information
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
## Open Questions
### Background
### Representation for GstGL
When uploading raw video frames to GL textures, the goal is to implement:
Split packed frames into separate GL textures when uploading, and
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
multiview-flags fields in the caps should change to reflect the conversion
from one incoming GstMemory to multiple GstGLMemory, and change the
width/height in the output info as needed.
This is (currently) targetted as 2 render passes - upload as normal
to a single stereo-packed RGBA texture, and then unpack into 2
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
2 GstGLMemory attached to one buffer. We can optimise the upload later
to go directly to 2 textures for common input formats.
Separat output textures have a few advantages:
- Filter elements can more easily apply filters in several passes to each
texture without fundamental changes to our filters to avoid mixing pixels
from separate views.
- Centralises the sampling of input video frame packings in the upload code,
which makes adding new packings in the future easier.
- Sampling multiple textures to generate various output frame-packings
for display is conceptually simpler than converting from any input packing
to any output packing.
- In implementations that support quad buffers, having separate textures
makes it trivial to do GL_LEFT/GL_RIGHT output
For either option, we'll need new glsink output API to pass more
information to applications about multiple views for the draw signal/callback.
I don't know if it's desirable to support *both* methods of representing
views. If so, that should be signalled in the caps too. That could be a
new multiview-mode for passing views in separate GstMemory objects
attached to a GstBuffer, which would not be GL specific.
### Overriding frame packing interpretation
Most sample videos available are frame packed, with no metadata
to say so. How should we override that interpretation?
- Simple answer: Use capssetter + new properties on playbin to
override the multiview fields. *Basically implemented in playbin, using*
*a pad probe. Needs more work for completeness*
### Adding extra GstVideoMeta to buffers
There should be one GstVideoMeta for the entire video frame in packed
layouts, and one GstVideoMeta per GstGLMemory when views are attached
to a GstBuffer separately. This should be done by the buffer pool,
which knows from the caps.
### videooverlay interface extensions
GstVideoOverlay needs:
- A way to announce the presence of multiview content when it is
detected/signalled in a stream.
- A way to tell applications which output methods are supported/available
- A way to tell the sink which output method it should use
- Possibly a way to tell the sink to override the input frame
interpretation / caps - depends on the answer to the question
above about how to model overriding input interpretation.
### What's implemented
- Caps handling
- gst-plugins-base libsgstvideo pieces
- playbin caps overriding
- conversion elements - glstereomix, gl3dconvert (needs a rename),
glstereosplit.
### Possible future enhancements
- Make GLupload split to separate textures at upload time?
- Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
- Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
- current done by packing then downloading which isn't OK overhead for RGBA download
- Think about how we integrate GLstereo - do we need to do anything special,
or can the app just render to stereo/quad buffers if they're available?

View file

@ -0,0 +1,527 @@
# Subtitle overlays, hardware-accelerated decoding and playbin
This document describes some of the considerations and requirements that
led to the current `GstVideoOverlayCompositionMeta` API which allows
attaching of subtitle bitmaps or logos to video buffers.
## Background
Subtitles can be muxed in containers or come from an external source.
Subtitles come in many shapes and colours. Usually they are either
text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
and the most common form of DVB subs). Bitmap based subtitles are
usually compressed in some way, like some form of run-length encoding.
Subtitles are currently decoded and rendered in subtitle-format-specific
overlay elements. These elements have two sink pads (one for raw video
and one for the subtitle format in question) and one raw video source
pad.
They will take care of synchronising the two input streams, and of
decoding and rendering the subtitles on top of the raw video stream.
Digression: one could theoretically have dedicated decoder/render
elements that output an AYUV or ARGB image, and then let a videomixer
element do the actual overlaying, but this is not very efficient,
because it requires us to allocate and blend whole pictures (1920x1080
AYUV = 8MB, 1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the
overlay region is only a small rectangle at the bottom. This wastes
memory and CPU. We could do something better by introducing a new format
that only encodes the region(s) of interest, but we don't have such a
format yet, and are not necessarily keen to rewrite this part of the
logic in playbin at this point - and we can't change existing elements'
behaviour, so would need to introduce new elements for this.
Playbin supports outputting compressed formats, i.e. it does not force
decoding to a raw format, but is happy to output to a non-raw format as
long as the sink supports that as well.
In case of certain hardware-accelerated decoding APIs, we will make use
of that functionality. However, the decoder will not output a raw video
format then, but some kind of hardware/API-specific format (in the caps)
and the buffers will reference hardware/API-specific objects that the
hardware/API-specific sink will know how to handle.
## The Problem
In the case of such hardware-accelerated decoding, the decoder will not
output raw pixels that can easily be manipulated. Instead, it will
output hardware/API-specific objects that can later be used to render a
frame using the same API.
Even if we could transform such a buffer into raw pixels, we most likely
would want to avoid that, in order to avoid the need to map the data
back into system memory (and then later back to the GPU). It's much
better to upload the much smaller encoded data to the GPU/DSP and then
leave it there until rendered.
Before `GstVideoOverlayComposition` playbin only supported subtitles on
top of raw decoded video. It would try to find a suitable overlay element
from the plugin registry based on the input subtitle caps and the rank.
(It is assumed that we will be able to convert any raw video format into
any format required by the overlay using a converter such as videoconvert.)
It would not render subtitles if the video sent to the sink is not raw
YUV or RGB or if conversions had been disabled by setting the
native-video flag on playbin.
Subtitle rendering is considered an important feature. Enabling
hardware-accelerated decoding by default should not lead to a major
feature regression in this area.
This means that we need to support subtitle rendering on top of non-raw
video.
## Possible Solutions
The goal is to keep knowledge of the subtitle format within the
format-specific GStreamer plugins, and knowledge of any specific video
acceleration API to the GStreamer plugins implementing that API. We do
not want to make the pango/dvbsuboverlay/dvdspu/kate plugins link to
libva/libvdpau/etc. and we do not want to make the vaapi/vdpau plugins
link to all of libpango/libkate/libass etc.
Multiple possible solutions come to mind:
1) backend-specific overlay elements
e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
This assumes the overlay can be done directly on the
backend-specific object passed around.
The main drawback with this solution is that it leads to a lot of
code duplication and may also lead to uncertainty about distributing
certain duplicated pieces of code. The code duplication is pretty
much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
kate, assrender, etc. available in form of base classes to derive
from is not really an option. Similarly, one would not really want
the vaapi/vdpau plugin to depend on a bunch of other libraries such
as libpango, libkate, libtiger, libass, etc.
One could add some new kind of overlay plugin feature though in
combination with a generic base class of some sort, but in order to
accommodate all the different cases and formats one would end up
with quite convoluted/tricky API.
(Of course there could also be a GstFancyVideoBuffer that provides
an abstraction for such video accelerated objects and that could
provide an API to add overlays to it in a generic way, but in the
end this is just a less generic variant of (c), and it is not clear
that there are real benefits to a specialised solution vs. a more
generic one).
2) convert backend-specific object to raw pixels and then overlay
Even where possible technically, this is most likely very
inefficient.
3) attach the overlay data to the backend-specific video frame buffers
in a generic way and do the actual overlaying/blitting later in
backend-specific code such as the video sink (or an accelerated
encoder/transcoder)
In this case, the actual overlay rendering (i.e. the actual text
rendering or decoding DVD/DVB data into pixels) is done in the
subtitle-format-specific GStreamer plugin. All knowledge about the
subtitle format is contained in the overlay plugin then, and all
knowledge about the video backend in the video backend specific
plugin.
The main question then is how to get the overlay pixels (and we will
only deal with pixels here) from the overlay element to the video
sink.
This could be done in multiple ways: One could send custom events
downstream with the overlay data, or one could attach the overlay
data directly to the video buffers in some way.
Sending inline events has the advantage that is is fairly
transparent to any elements between the overlay element and the
video sink: if an effects plugin creates a new video buffer for the
output, nothing special needs to be done to maintain the subtitle
overlay information, since the overlay data is not attached to the
buffer. However, it slightly complicates things at the sink, since
it would also need to look for the new event in question instead of
just processing everything in its buffer render function.
If one attaches the overlay data to the buffer directly, any element
between overlay and video sink that creates a new video buffer would
need to be aware of the overlay data attached to it and copy it over
to the newly-created buffer.
One would have to do implement a special kind of new query (e.g.
FEATURE query) that is not passed on automatically by
gst\_pad\_query\_default() in order to make sure that all elements
downstream will handle the attached overlay data. (This is only a
problem if we want to also attach overlay data to raw video pixel
buffers; for new non-raw types we can just make it mandatory and
assume support and be done with it; for existing non-raw types
nothing changes anyway if subtitles don't work) (we need to maintain
backwards compatibility for existing raw video pipelines like e.g.:
..decoder \! suboverlay \! encoder..)
Even though slightly more work, attaching the overlay information to
buffers seems more intuitive than sending it interleaved as events.
And buffers stored or passed around (e.g. via the "last-buffer"
property in the sink when doing screenshots via playbin) always
contain all the information needed.
4) create a video/x-raw-\*-delta format and use a backend-specific
videomixer
This possibility was hinted at already in the digression in section
1. It would satisfy the goal of keeping subtitle format knowledge in
the subtitle plugins and video backend knowledge in the video
backend plugin. It would also add a concept that might be generally
useful (think ximagesrc capture with xdamage). However, it would
require adding foorender variants of all the existing overlay
elements, and changing playbin to that new design, which is somewhat
intrusive. And given the general nature of such a new format/API, we
would need to take a lot of care to be able to accommodate all
possible use cases when designing the API, which makes it
considerably more ambitious. Lastly, we would need to write
videomixer variants for the various accelerated video backends as
well.
Overall (c) appears to be the most promising solution. It is the least
intrusive and should be fairly straight-forward to implement with
reasonable effort, requiring only small changes to existing elements and
requiring no new elements.
Doing the final overlaying in the sink as opposed to a videomixer or
overlay in the middle of the pipeline has other advantages:
- if video frames need to be dropped, e.g. for QoS reasons, we could
also skip the actual subtitle overlaying and possibly the
decoding/rendering as well, if the implementation and API allows for
that to be delayed.
- the sink often knows the actual size of the window/surface/screen
the output video is rendered to. This *may* make it possible to
render the overlay image in a higher resolution than the input
video, solving a long standing issue with pixelated subtitles on top
of low-resolution videos that are then scaled up in the sink. This
would require for the rendering to be delayed of course instead of
just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
in the overlay, but that could all be supported.
- if the video backend / sink has support for high-quality text
rendering (clutter?) we could just pass the text or pango markup to
the sink and let it do the rest (this is unlikely to be supported in
the general case - text and glyph rendering is hard; also, we don't
really want to make up our own text markup system, and pango markup
is probably too limited for complex karaoke stuff).
## API needed
1) Representation of subtitle overlays to be rendered
We need to pass the overlay pixels from the overlay element to the
sink somehow. Whatever the exact mechanism, let's assume we pass a
refcounted GstVideoOverlayComposition struct or object.
A composition is made up of one or more overlays/rectangles.
In the simplest case an overlay rectangle is just a blob of
RGBA/ABGR \[FIXME?\] or AYUV pixels with positioning info and other
metadata, and there is only one rectangle to render.
We're keeping the naming generic ("OverlayFoo" rather than
"SubtitleFoo") here, since this might also be handy for other use
cases such as e.g. logo overlays or so. It is not designed for
full-fledged video stream mixing
though.
// Note: don't mind the exact implementation details, they'll be hidden
// FIXME: might be confusing in 0.11 though since GstXOverlay was
// renamed to GstVideoOverlay in 0.11, but not much we can do,
// maybe we can rename GstVideoOverlay to something better
struct GstVideoOverlayComposition
{
guint num_rectangles;
GstVideoOverlayRectangle ** rectangles;
/* lowest rectangle sequence number still used by the upstream
* overlay element. This way a renderer maintaining some kind of
* rectangles <-> surface cache can know when to free cached
* surfaces/rectangles. */
guint min_seq_num_used;
/* sequence number for the composition (same series as rectangles) */
guint seq_num;
}
struct GstVideoOverlayRectangle
{
/* Position on video frame and dimension of output rectangle in
* output frame terms (already adjusted for the PAR of the output
* frame). x/y can be negative (overlay will be clipped then) */
gint x, y;
guint render_width, render_height;
/* Dimensions of overlay pixels */
guint width, height, stride;
/* This is the PAR of the overlay pixels */
guint par_n, par_d;
/* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
* and BGRA on little-endian systems (i.e. pixels are treated as
* 32-bit values and alpha is always in the most-significant byte,
* and blue is in the least-significant byte).
*
* FIXME: does anyone actually use AYUV in practice? (we do
* in our utility function to blend on top of raw video)
* What about AYUV and endianness? Do we always have [A][Y][U][V]
* in memory? */
/* FIXME: maybe use our own enum? */
GstVideoFormat format;
/* Refcounted blob of memory, no caps or timestamps */
GstBuffer *pixels;
// FIXME: how to express source like text or pango markup?
// (just add source type enum + source buffer with data)
//
// FOR 0.10: always send pixel blobs, but attach source data in
// addition (reason: if downstream changes, we can't renegotiate
// that properly, if we just do a query of supported formats from
// the start). Sink will just ignore pixels and use pango markup
// from source data if it supports that.
//
// FOR 0.11: overlay should query formats (pango markup, pixels)
// supported by downstream and then only send that. We can
// renegotiate via the reconfigure event.
//
/* sequence number: useful for backends/renderers/sinks that want
* to maintain a cache of rectangles <-> surfaces. The value of
* the min_seq_num_used in the composition tells the renderer which
* rectangles have expired. */
guint seq_num;
/* FIXME: we also need a (private) way to cache converted/scaled
* pixel blobs */
}
(a1) Overlay consumer
API:
How would this work in a video sink that supports scaling of textures:
gst_foo_sink_render () {
/* assume only one for now */
if video_buffer has composition:
composition = video_buffer.get_composition()
for each rectangle in composition:
if rectangle.source_data_type == PANGO_MARKUP
actor = text_from_pango_markup (rectangle.get_source_data())
else
pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
actor = texture_from_rgba (pixels, ...)
.. position + scale on top of video surface ...
}
(a2) Overlay producer
API:
e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
if (logoverlay->cached_composition == NULL) {
comp = composition_new ();
rect = rectangle_new (format, pixels_buf,
width, height, stride, par_n, par_d,
x, y, render_width, render_height);
/* composition adds its own ref for the rectangle */
composition_add_rectangle (comp, rect);
rectangle_unref (rect);
/* buffer adds its own ref for the composition */
video_buffer_attach_composition (comp);
/* we take ownership of the composition and save it for later */
logoverlay->cached_composition = comp;
} else {
video_buffer_attach_composition (logoverlay->cached_composition);
}
FIXME: also add some API to modify render position/dimensions of a
rectangle (probably requires creation of new rectangle, unless we
handle writability like with other mini objects).
2) Fallback overlay rendering/blitting on top of raw video
Eventually we want to use this overlay mechanism not only for
hardware-accelerated video, but also for plain old raw video, either
at the sink or in the overlay element directly.
Apart from the advantages listed earlier in section 3, this allows
us to consolidate a lot of overlaying/blitting code that is
currently repeated in every single overlay element in one location.
This makes it considerably easier to support a whole range of raw
video formats out of the box, add SIMD-optimised rendering using
ORC, or handle corner cases correctly.
(Note: side-effect of overlaying raw video at the video sink is that
if e.g. a screnshotter gets the last buffer via the last-buffer
property of basesink, it would get an image without the subtitles on
top. This could probably be fixed by re-implementing the property in
GstVideoSink though. Playbin2 could handle this internally as well).
void
gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
GstBuffer * video_buf)
{
guint n;
g_return_if_fail (gst_buffer_is_writable (video_buf));
g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
... parse video_buffer caps into BlendVideoFormatInfo ...
for each rectangle in the composition: {
if (gst_video_format_is_yuv (video_buf_format)) {
overlay_format = FORMAT_AYUV;
} else if (gst_video_format_is_rgb (video_buf_format)) {
overlay_format = FORMAT_ARGB;
} else {
/* FIXME: grayscale? */
return;
}
/* this will scale and convert AYUV<->ARGB if needed */
pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
... clip output rectangle ...
__do_blend (video_buf_format, video_buf->data,
overlay_format, pixels->data,
x, y, width, height, stride);
gst_buffer_unref (pixels);
}
}
3) Flatten all rectangles in a composition
We cannot assume that the video backend API can handle any number of
rectangle overlays, it's possible that it only supports one single
overlay, in which case we need to squash all rectangles into one.
However, we'll just declare this a corner case for now, and
implement it only if someone actually needs it. It's easy to add
later API-wise. Might be a bit tricky if we have rectangles with
different PARs/formats (e.g. subs and a logo), though we could
probably always just use the code from (b) with a fully transparent
video buffer to create a flattened overlay buffer.
4) query support for the new video composition mechanism
This is handled via GstMeta and an ALLOCATION query - we can simply
query whether downstream supports the GstVideoOverlayComposition meta.
There appears to be no issue with downstream possibly not being
linked yet at the time when an overlay would want to do such a
query, but we would just have to default to something and update
ourselves later on a reconfigure event then.
Other considerations:
- renderers (overlays or sinks) may be able to handle only ARGB or
only AYUV (for most graphics/hw-API it's likely ARGB of some sort,
while our blending utility functions will likely want the same
colour space as the underlying raw video format, which is usually
YUV of some sort). We need to convert where required, and should
cache the conversion.
- renderers may or may not be able to scale the overlay. We need to do
the scaling internally if not (simple case: just horizontal scaling
to adjust for PAR differences; complex case: both horizontal and
vertical scaling, e.g. if subs come from a different source than the
video or the video has been rescaled or cropped between overlay
element and sink).
- renderers may be able to generate (possibly scaled) pixels on demand
from the original data (e.g. a string or RLE-encoded data). We will
ignore this for now, since this functionality can still be added
later via API additions. The most interesting case would be to pass
a pango markup string, since e.g. clutter can handle that natively.
- renderers may be able to write data directly on top of the video
pixels (instead of creating an intermediary buffer with the overlay
which is then blended on top of the actual video frame), e.g.
dvdspu, dvbsuboverlay
However, in the interest of simplicity, we should probably ignore the
fact that some elements can blend their overlays directly on top of the
video (decoding/uncompressing them on the fly), even more so as it's not
obvious that it's actually faster to decode the same overlay 70-90 times
(say) (ie. ca. 3 seconds of video frames) and then blend it 70-90 times
instead of decoding it once into a temporary buffer and then blending it
directly from there, possibly SIMD-accelerated. Also, this is only
relevant if the video is raw video and not some hardware-acceleration
backend object.
And ultimately it is the overlay element that decides whether to do the
overlay right there and then or have the sink do it (if supported). It
could decide to keep doing the overlay itself for raw video and only use
our new API for non-raw video.
- renderers may want to make sure they only upload the overlay pixels
once per rectangle if that rectangle recurs in subsequent frames (as
part of the same composition or a different composition), as is
likely. This caching of e.g. surfaces needs to be done renderer-side
and can be accomplished based on the sequence numbers. The
composition contains the lowest sequence number still in use
upstream (an overlay element may want to cache created
compositions+rectangles as well after all to re-use them for
multiple frames), based on that the renderer can expire cached
objects. The caching needs to be done renderer-side because
attaching renderer-specific objects to the rectangles won't work
well given the refcounted nature of rectangles and compositions,
making it unpredictable when a rectangle or composition will be
freed or from which thread context it will be freed. The
renderer-specific objects are likely bound to other types of
renderer-specific contexts, and need to be managed in connection
with those.
- composition/rectangles should internally provide a certain degree of
thread-safety. Multiple elements (sinks, overlay element) might
access or use the same objects from multiple threads at the same
time, and it is expected that elements will keep a ref to
compositions and rectangles they push downstream for a while, e.g.
until the current subtitle composition expires.
## Future considerations
- alternatives: there may be multiple versions/variants of the same
subtitle stream. On DVDs, there may be a 4:3 version and a 16:9
version of the same subtitles. We could attach both variants and let
the renderer pick the best one for the situation (currently we just
use the 16:9 version). With totem, it's ultimately totem that adds
the 'black bars' at the top/bottom, so totem also knows if it's got
a 4:3 display and can/wants to fit 4:3 subs (which may render on top
of the bars) or not, for example.
## Misc. FIXMEs
TEST: should these look (roughly) alike (note text distortion) - needs
fixing in textoverlay
gst-launch-1.0 \
videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 \
! textoverlay text=Hello font-desc=72 ! xvimagesink \
videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 \
! textoverlay text=Hello font-desc=72 ! xvimagesink \
videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 \
! textoverlay text=Hello font-desc=72 ! xvimagesink

View file

@ -141,6 +141,7 @@ index.md
design/MT-refcounting.md
design/TODO.md
design/activation.md
design/audiosinks.md
design/buffer.md
design/buffering.md
design/bufferpool.md
@ -149,10 +150,12 @@ index.md
design/context.md
design/controller.md
design/conventions.md
design/decodebin.md
design/dynamic.md
design/element-sink.md
design/element-source.md
design/element-transform.md
design/encoding.md
design/events.md
design/framestep.md
design/gstbin.md
@ -162,8 +165,13 @@ index.md
design/gstobject.md
design/gstpipeline.md
design/draft-klass.md
design/interlaced-video.md
design/keyframe-force.md
design/latency.md
design/live-source.md
design/mediatype-audio-raw.md
design/mediatype-text-raw.md
design/mediatype-video-raw.md
design/memory.md
design/messages.md
design/meta.md
@ -171,7 +179,9 @@ index.md
design/miniobject.md
design/missing-plugins.md
design/negotiation.md
design/orc-integration.md
design/overview.md
design/playbin.md
design/preroll.md
design/probes.md
design/progress.md
@ -186,9 +196,11 @@ index.md
design/sparsestreams.md
design/standards.md
design/states.md
design/stereo-multiview-video.md
design/stream-selection.md
design/stream-status.md
design/streams.md
design/subtitle-overlays.md
design/synchronisation.md
design/draft-tagreading.md
design/toc.md