docs: design: move most design docs to gst-docs module

This commit is contained in:
Tim-Philipp Müller 2016-12-08 22:59:58 +00:00
parent 49653b058a
commit 46138b1b1d
13 changed files with 1 additions and 3652 deletions

View file

@ -2,16 +2,5 @@ SUBDIRS =
EXTRA_DIST = \
design-audiosinks.txt \
design-decodebin.txt \
design-encoding.txt \
design-orc-integration.txt \
draft-hw-acceleration.txt \
draft-keyframe-force.txt \
draft-subtitle-overlays.txt\
draft-va.txt \
part-interlaced-video.txt \
part-mediatype-audio-raw.txt\
part-mediatype-text-raw.txt\
part-mediatype-video-raw.txt\
part-playbin.txt
draft-va.txt

View file

@ -1,138 +0,0 @@
Audiosink design
----------------
Requirements:
- must operate chain based.
Most simple playback pipelines will push audio from the decoders
into the audio sink.
- must operate getrange based
Most professional audio applications will operate in a mode where
the audio sink pulls samples from the pipeline. This is typically
done in a callback from the audiosink requesting N samples. The
callback is either scheduled from a thread or from an interrupt
from the audio hardware device.
- Exact sample accurate clocks.
the audiosink must be able to provide a clock that is sample
accurate even if samples are dropped or when discontinuities are
found in the stream.
- Exact timing of playback.
The audiosink must be able to play samples at their exact times.
- use DMA access when possible.
When the hardware can do DMA we should use it. This should also
work over bufferpools to avoid data copying to/from kernel space.
Design:
The design is based on a set of base classes and the concept of a
ringbuffer of samples.
+-----------+ - provide preroll, rendering, timing
+ basesink + - caps nego
+-----+-----+
|
+-----V----------+ - manages ringbuffer
+ audiobasesink + - manages scheduling (push/pull)
+-----+----------+ - manages clock/query/seek
| - manages scheduling of samples in the ringbuffer
| - manages caps parsing
|
+-----V------+ - default ringbuffer implementation with a GThread
+ audiosink + - subclasses provide open/read/close methods
+------------+
The ringbuffer is a contiguous piece of memory divided into segtotal
pieces of segments. Each segment has segsize bytes.
play position
v
+---+---+---+-------------------------------------+----------+
+ 0 | 1 | 2 | .... | segtotal |
+---+---+---+-------------------------------------+----------+
<--->
segsize bytes = N samples * bytes_per_sample.
The ringbuffer has a play position, which is expressed in
segments. The play position is where the device is currently reading
samples from the buffer.
The ringbuffer can be put to the PLAYING or STOPPED state.
In the STOPPED state no samples are played to the device and the play
pointer does not advance.
In the PLAYING state samples are written to the device and the ringbuffer
should call a configurable callback after each segment is written to the
device. In this state the play pointer is advanced after each segment is
written.
A write operation to the ringbuffer will put new samples in the ringbuffer.
If there is not enough space in the ringbuffer, the write operation will
block. The playback of the buffer never stops, even if the buffer is
empty. When the buffer is empty, silence is played by the device.
The ringbuffer is implemented with lockfree atomic operations, especially
on the reading side so that low-latency operations are possible.
Whenever new samples are to be put into the ringbuffer, the position of the
read pointer is taken. The required write position is taken and the diff
is made between the required and actual position. If the difference is <0,
the sample is too late. If the difference is bigger than segtotal, the
writing part has to wait for the play pointer to advance.
Scheduling:
- chain based mode:
In chain based mode, bytes are written into the ringbuffer. This operation
will eventually block when the ringbuffer is filled.
When no samples arrive in time, the ringbuffer will play silence. Each
buffer that arrives will be placed into the ringbuffer at the correct
times. This means that dropping samples or inserting silence is done
automatically and very accurate and independend of the play pointer.
In this mode, the ringbuffer is usually kept as full as possible. When
using a small buffer (small segsize and segtotal), the latency for audio
to start from the sink to when it is played can be kept low but at least
one context switch has to be made between read and write.
- getrange based mode
In getrange based mode, the audiobasesink will use the callback function
of the ringbuffer to get a segsize samples from the peer element. These
samples will then be placed in the ringbuffer at the next play position.
It is assumed that the getrange function returns fast enough to fill the
ringbuffer before the play pointer reaches the write pointer.
In this mode, the ringbuffer is usually kept as empty as possible. There
is no context switch needed between the elements that create the samples
and the actual writing of the samples to the device.
DMA mode:
- Elements that can do DMA based access to the audio device have to subclass
from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
of GstRingBuffer.
The ringbuffer subclass should trigger a callback after writing or playing
each sample to the device. This callback can be triggered from a thread or
from a signal from the audio device.
Clocks:
The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
It can do this by using the play pointer and the delay to calculate the
clock time.

View file

@ -1,274 +0,0 @@
Decodebin design
GstDecodeBin
------------
Description:
Autoplug and decode to raw media
Input : single pad with ANY caps Output : Dynamic pads
* Contents
_ a GstTypeFindElement connected to the single sink pad
_ optionally a demuxer/parser
_ optionally one or more DecodeGroup
* Autoplugging
The goal is to reach 'target' caps (by default raw media).
This is done by using the GstCaps of a source pad and finding the available
demuxers/decoders GstElement that can be linked to that pad.
The process starts with the source pad of typefind and stops when no more
non-target caps are left. It is commonly done while pre-rolling, but can also
happen whenever a new pad appears on any element.
Once a target caps has been found, that pad is ghosted and the
'pad-added' signal is emitted.
If no compatible elements can be found for a GstCaps, the pad is ghosted and
the 'unknown-type' signal is emitted.
* Assisted auto-plugging
When starting the auto-plugging process for a given GstCaps, two signals are
emitted in the following way in order to allow the application/user to assist or
fine-tune the process.
_ 'autoplug-continue' :
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
This signal is fired at the very beginning with the source pad GstCaps. If
the callback returns TRUE, the process continues normally. If the callback
returns FALSE, then the GstCaps are considered as a target caps and the
autoplugging process stops.
- 'autoplug-factories' :
GValueArray user_function (GstElement* decodebin, GstPad* pad,
GstCaps* caps);
Get a list of elementfactories for @pad with @caps. This function is used to
instruct decodebin2 of the elements it should try to autoplug. The default
behaviour when this function is not overriden is to get all elements that
can handle @caps from the registry sorted by rank.
- 'autoplug-select' :
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
GValueArray* factories);
This signal is fired once autoplugging has got a list of compatible
GstElementFactory. The signal is emitted with the GstCaps of the source pad
and a pointer on the GValueArray of compatible factories.
The callback should return the index of the elementfactory in @factories
that should be tried next.
If the callback returns -1, the autoplugging process will stop as if no
compatible factories were found.
The default implementation of this function will try to autoplug the first
factory of the list.
* Target Caps
The target caps are a read/write GObject property of decodebin.
By default the target caps are:
_ Raw audio : audio/x-raw
_ and raw video : video/x-raw
_ and Text : text/plain, text/x-pango-markup
* media chain/group handling
When autoplugging, all streams coming out of a demuxer will be grouped in a
DecodeGroup.
All new source pads created on that demuxer after it has emitted the
'no-more-pads' signal will be put in another DecodeGroup.
Only one decodegroup can be active at any given time. If a new decodegroup is
created while another one exists, that decodegroup will be set as blocking until
the existing one has drained.
DecodeGroup
-----------
Description:
Streams belonging to the same group/chain of a media file.
* Contents
The DecodeGroup contains:
_ a GstMultiQueue to which all streams of a the media group are connected.
_ the eventual decoders which are autoplugged in order to produce the
requested target pads.
* Proper group draining
The DecodeGroup takes care that all the streams in the group are completely
drained (EOS has come through all source ghost pads).
* Pre-roll and block
The DecodeGroup has a global blocking feature. If enabled, all the ghosted
source pads for that group will be blocked.
A method is available to unblock all blocked pads for that group.
GstMultiQueue
-------------
Description:
Multiple input-output data queue
The GstMultiQueue achieves the same functionality as GstQueue, with a few
differences:
* Multiple streams handling.
The element handles queueing data on more than one stream at once. To
achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src
pads (src_%u).
When requesting a given sinkpad, the associated srcpad for that stream will
be created. Ex: requesting sink_1 will generate src_1.
* Non-starvation on multiple streams.
If more than one stream is used with the element, the streams' queues will
be dynamically grown (up to a limit), in order to ensure that no stream is
risking data starvation. This guarantees that at any given time there are at
least N bytes queued and available for each individual stream.
If an EOS event comes through a srcpad, the associated queue should be
considered as 'not-empty' in the queue-size-growing algorithm.
* Non-linked srcpads graceful handling.
A GstTask is started for all srcpads when going to GST_STATE_PAUSED.
The task are blocking against a GCondition which will be fired in two
different cases:
_ When the associated queue has received a buffer.
_ When the associated queue was previously declared as 'not-linked' and the
first buffer of the queue is scheduled to be pushed synchronously in
relation to the order in which it arrived globally in the element (see
'Synchronous data pushing' below).
When woken up by the GCondition, the GstTask will try to push the next
GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
'not-linked'.
If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
return GST_FLOW_NOT_LINKED.
* Synchronous data pushing for non-linked pads.
In order to better support dynamic switching between streams, the multiqueue
(unlike the current GStreamer queue) continues to push buffers on non-linked
pads rather than shutting down.
In addition, to prevent a non-linked stream from very quickly consuming all
available buffers and thus 'racing ahead' of the other streams, the element
must ensure that buffers and inlined events for a non-linked stream are pushed
in the same order as they were received, relative to the other streams
controlled by the element. This means that a buffer cannot be pushed to a
non-linked pad any sooner than buffers in any other stream which were received
before it.
=====================================
Parsers, decoders and auto-plugging
=====================================
This section has DRAFT status.
Some media formats come in different "flavours" or "stream formats". These
formats differ in the way the setup data and media data is signalled and/or
packaged. An example for this is H.264 video, where there is a bytestream
format (with codec setup data signalled inline and units prefixed by a sync
code and packet length information) and a "raw" format where codec setup
data is signalled out of band (via the caps) and the chunking is implicit
in the way the buffers were muxed into a container, to mention just two of
the possible variants.
Especially on embedded platforms it is common that decoders can only
handle one particular stream format, and not all of them.
Where there are multiple stream formats, parsers are usually expected
to be able to convert between the different formats. This will, if
implemented correctly, work as expected in a static pipeline such as
... ! parser ! decoder ! sink
where the parser can query the decoder's capabilities even before
processing the first piece of data, and configure itself to convert
accordingly, if conversion is needed at all.
In an auto-plugging context this is not so straight-forward though,
because elements are plugged incrementally and not before the previous
element has processes some data and decided what it will output exactly
(unless the template caps are completely fixed, then it can continue
right away, this is not always the case here though, see below). A
parser will thus have to decide on *some* output format so auto-plugging
can continue. It doesn't know anything about the available decoders and
their capabilities though, so it's possible that it will choose a format
that is not supported by any of the available decoders, or by the preferred
decoder.
If the parser had sufficiently concise but fixed source pad template caps,
decodebin could continue to plug a decoder right away, allowing the
parser to configure itself in the same way as it would with a static
pipeline. This is not an option, unfortunately, because often the
parser needs to process some data to determine e.g. the format's profile or
other stream properties (resolution, sample rate, channel configuration, etc.),
and there may be different decoders for different profiles (e.g. DSP codec
for baseline profile, and software fallback for main/high profile; or a DSP
codec only supporting certain resolutions, with a software fallback for
unusual resolutions). So if decodebin just plugged the most highest-ranking
decoder, that decoder might not be be able to handle the actual stream later
on, which would yield an error (this is a data flow error then which would
be hard to intercept and avoid in decodebin). In other words, we can't solve
this issue by plugging a decoder right away with the parser.
So decodebin needs to communicate to the parser the set of available decoder
caps (which would contain the relevant capabilities/restrictions such as
supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
filtering/sorting of course.
This is done by plugging a capsfilter element right after the parser, and
constructing set of filter caps from the list of available decoders (one
appends at the end just the name(s) of the caps structures from the parser
pad template caps to function as an 'ANY other' caps equivalent). This let
the parser negotiate to a supported stream format in the same way as with
the static pipeline mentioned above, but of course incur some overhead
through the additional capsfilter element.

View file

@ -1,571 +0,0 @@
Encoding and Muxing
-------------------
Summary
-------
A. Problems
B. Goals
1. EncodeBin
2. Encoding Profile System
3. Helper Library for Profiles
I. Use-cases researched
A. Problems this proposal attempts to solve
-------------------------------------------
* Duplication of pipeline code for gstreamer-based applications
wishing to encode and or mux streams, leading to subtle differences
and inconsistencies across those applications.
* No unified system for describing encoding targets for applications
in a user-friendly way.
* No unified system for creating encoding targets for applications,
resulting in duplication of code across all applications,
differences and inconsistencies that come with that duplication,
and applications hardcoding element names and settings resulting in
poor portability.
B. Goals
--------
1. Convenience encoding element
Create a convenience GstBin for encoding and muxing several streams,
hereafter called 'EncodeBin'.
This element will only contain one single property, which is a
profile.
2. Define a encoding profile system
2. Encoding profile helper library
Create a helper library to:
* create EncodeBin instances based on profiles, and
* help applications to create/load/save/browse those profiles.
1. EncodeBin
------------
1.1 Proposed API
----------------
EncodeBin is a GstBin subclass.
It implements the GstTagSetter interface, by which it will proxy the
calls to the muxer.
Only two introspectable property (i.e. usable without extra API):
* A GstEncodingProfile*
* The name of the profile to use
When a profile is selected, encodebin will:
* Add REQUEST sinkpads for all the GstStreamProfile
* Create the muxer and expose the source pad
Whenever a request pad is created, encodebin will:
* Create the chain of elements for that pad
* Ghost the sink pad
* Return that ghost pad
This allows reducing the code to the minimum for applications
wishing to encode a source for a given profile:
...
encbin = gst_element_factory_make("encodebin, NULL);
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
gst_element_link (encbin, filesink);
...
vsrcpad = gst_element_get_src_pad(source, "src1");
vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
gst_pad_link(vsrcpad, vsinkpad);
...
1.2 Explanation of the Various stages in EncodeBin
--------------------------------------------------
This describes the various stages which can happen in order to end
up with a multiplexed stream that can then be stored or streamed.
1.2.1 Incoming streams
The streams fed to EncodeBin can be of various types:
* Video
* Uncompressed (but maybe subsampled)
* Compressed
* Audio
* Uncompressed (audio/x-raw)
* Compressed
* Timed text
* Private streams
1.2.2 Steps involved for raw video encoding
(0) Incoming Stream
(1) Transform raw video feed (optional)
Here we modify the various fundamental properties of a raw video
stream to be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modified are:
* width/height
This is done with a video scaler.
The DAR (Display Aspect Ratio) MUST be respected.
If needed, black borders can be added to comply with the target DAR.
* framerate
* format/colorspace/depth
All of this is done with a colorspace converter
(2) Actual encoding (optional for raw streams)
An encoder (with some optional settings) is used.
(3) Muxing
A muxer (with some optional settings) is used.
(4) Outgoing encoded and muxed stream
1.2.3 Steps involved for raw audio encoding
This is roughly the same as for raw video, expect for (1)
(1) Transform raw audo feed (optional)
We modify the various fundamental properties of a raw audio stream to
be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modifier are:
* Number of channels
* Type of raw audio (integer or floating point)
* Depth (number of bits required to encode one sample)
1.2.4 Steps involved for encoded audio/video streams
Steps (1) and (2) are replaced by a parser if a parser is available
for the given format.
1.2.5 Steps involved for other streams
Other streams will just be forwarded as-is to the muxer, provided the
muxer accepts the stream type.
2. Encoding Profile System
--------------------------
This work is based on:
* The existing GstPreset system for elements [0]
* The gnome-media GConf audio profile system [1]
* The investigation done into device profiles by Arista and
Transmageddon [2 and 3]
2.2 Terminology
---------------
* Encoding Target Category
A Target Category is a classification of devices/systems/use-cases
for encoding.
Such a classification is required in order for:
* Applications with a very-specific use-case to limit the number of
profiles they can offer the user. A screencasting application has
no use with the online services targets for example.
* Offering the user some initial classification in the case of a
more generic encoding application (like a video editor or a
transcoder).
Ex:
Consumer devices
Online service
Intermediate Editing Format
Screencast
Capture
Computer
* Encoding Profile Target
A Profile Target describes a specific entity for which we wish to
encode.
A Profile Target must belong to at least one Target Category.
It will define at least one Encoding Profile.
Ex (with category):
Nokia N900 (Consumer device)
Sony PlayStation 3 (Consumer device)
Youtube (Online service)
DNxHD (Intermediate editing format)
HuffYUV (Screencast)
Theora (Computer)
* Encoding Profile
A specific combination of muxer, encoders, presets and limitations.
Ex:
Nokia N900/H264 HQ
Ipod/High Quality
DVD/Pal
Youtube/High Quality
HTML5/Low Bandwith
DNxHD
2.3 Encoding Profile
--------------------
An encoding profile requires the following information:
* Name
This string is not translatable and must be unique.
A recommendation to guarantee uniqueness of the naming could be:
<target>/<name>
* Description
This is a translatable string describing the profile
* Muxing format
This is a string containing the GStreamer media-type of the
container format.
* Muxing preset
This is an optional string describing the preset(s) to use on the
muxer.
* Multipass setting
This is a boolean describing whether the profile requires several
passes.
* List of Stream Profile
2.3.1 Stream Profiles
A Stream Profile consists of:
* Type
The type of stream profile (audio, video, text, private-data)
* Encoding Format
This is a string containing the GStreamer media-type of the encoding
format to be used. If encoding is not to be applied, the raw audio
media type will be used.
* Encoding preset
This is an optional string describing the preset(s) to use on the
encoder.
* Restriction
This is an optional GstCaps containing the restriction of the
stream that can be fed to the encoder.
This will generally containing restrictions in video
width/heigh/framerate or audio depth.
* presence
This is an integer specifying how many streams can be used in the
containing profile. 0 means that any number of streams can be
used.
* pass
This is an integer which is only meaningful if the multipass flag
has been set in the profile. If it has been set it indicates which
pass this Stream Profile corresponds to.
2.4 Example profile
-------------------
The representation used here is XML only as an example. No decision is
made as to which formatting to use for storing targets and profiles.
<gst-encoding-target>
<name>Nokia N900</name>
<category>Consumer Device</category>
<profiles>
<profile>Nokia N900/H264 HQ</profile>
<profile>Nokia N900/MP3</profile>
<profile>Nokia N900/AAC</profile>
</profiles>
</gst-encoding-target>
<gst-encoding-profile>
<name>Nokia N900/H264 HQ</name>
<description>
High Quality H264/AAC for the Nokia N900
</description>
<format>video/quicktime,variant=iso</format>
<streams>
<stream-profile>
<type>audio</type>
<format>audio/mpeg,mpegversion=4</format>
<preset>Quality High/Main</preset>
<restriction>audio/x-raw,channels=[1,2]</restriction>
<presence>1</presence>
</stream-profile>
<stream-profile>
<type>video</type>
<format>video/x-h264</format>
<preset>Profile Baseline/Quality High</preset>
<restriction>
video/x-raw,width=[16, 800],\
height=[16, 480],framerate=[1/1, 30000/1001]
</restriction>
<presence>1</presence>
</stream-profile>
</streams>
</gst-encoding-profile>
2.5 API
-------
A proposed C API is contained in the gstprofile.h file in this directory.
2.6 Modifications required in the existing GstPreset system
-----------------------------------------------------------
2.6.1. Temporary preset.
Currently a preset needs to be saved on disk in order to be
used.
This makes it impossible to have temporary presets (that exist only
during the lifetime of a process), which might be required in the
new proposed profile system
2.6.2 Categorisation of presets.
Currently presets are just aliases of a group of property/value
without any meanings or explanation as to how they exclude each
other.
Take for example the H264 encoder. It can have presets for:
* passes (1,2 or 3 passes)
* profiles (Baseline, Main, ...)
* quality (Low, medium, High)
In order to programmatically know which presets exclude each other,
we here propose the categorisation of these presets.
This can be done in one of two ways
1. in the name (by making the name be [<category>:]<name>)
This would give for example: "Quality:High", "Profile:Baseline"
2. by adding a new _meta key
This would give for example: _meta/category:quality
2.6.3 Aggregation of presets.
There can be more than one choice of presets to be done for an
element (quality, profile, pass).
This means that one can not currently describe the full
configuration of an element with a single string but with many.
The proposal here is to extend the GstPreset API to be able to set
all presets using one string and a well-known separator ('/').
This change only requires changes in the core preset handling code.
This would allow doing the following:
gst_preset_load_preset (h264enc,
"pass:1/profile:baseline/quality:high");
2.7 Points to be determined
---------------------------
This document hasn't determined yet how to solve the following
problems:
2.7.1 Storage of profiles
One proposal for storage would be to use a system wide directory
(like $prefix/share/gstreamer-0.10/profiles) and store XML files for
every individual profiles.
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
This poses some limitations as to what to do if some applications
want to have some profiles limited to their own usage.
3. Helper library for profiles
------------------------------
These helper methods could also be added to existing libraries (like
GstPreset, GstPbUtils, ..).
The various API proposed are in the accompanying gstprofile.h file.
3.1 Getting user-readable names for formats
This is already provided by GstPbUtils.
3.2 Hierarchy of profiles
The goal is for applications to be able to present to the user a list
of combo-boxes for choosing their output profile:
[ Category ] # optional, depends on the application
[ Device/Site/.. ] # optional, depends on the application
[ Profile ]
Convenience methods are offered to easily get lists of categories,
devices, and profiles.
3.3 Creating Profiles
The goal is for applications to be able to easily create profiles.
The applications needs to be able to have a fast/efficient way to:
* select a container format and see all compatible streams he can use
with it.
* select a codec format and see which container formats he can use
with it.
The remaining parts concern the restrictions to encoder
input.
3.4 Ensuring availability of plugins for Profiles
When an application wishes to use a Profile, it should be able to
query whether it has all the needed plugins to use it.
This part will use GstPbUtils to query, and if needed install the
missing plugins through the installed distribution plugin installer.
I. Use-cases researched
-----------------------
This is a list of various use-cases where encoding/muxing is being
used.
* Transcoding
The goal is to convert with as minimal loss of quality any input
file for a target use.
A specific variant of this is transmuxing (see below).
Example applications: Arista, Transmageddon
* Rendering timelines
The incoming streams are a collection of various segments that need
to be rendered.
Those segments can vary in nature (i.e. the video width/height can
change).
This requires the use of identiy with the single-segment property
activated to transform the incoming collection of segments to a
single continuous segment.
Example applications: PiTiVi, Jokosher
* Encoding of live sources
The major risk to take into account is the encoder not encoding the
incoming stream fast enough. This is outside of the scope of
encodebin, and should be solved by using queues between the sources
and encodebin, as well as implementing QoS in encoders and sources
(the encoders emitting QoS events, and the upstream elements
adapting themselves accordingly).
Example applications: camerabin, cheese
* Screencasting applications
This is similar to encoding of live sources.
The difference being that due to the nature of the source (size and
amount/frequency of updates) one might want to do the encoding in
two parts:
* The actual live capture is encoded with a 'almost-lossless' codec
(such as huffyuv)
* Once the capture is done, the file created in the first step is
then rendered to the desired target format.
Fixing sources to only emit region-updates and having encoders
capable of encoding those streams would fix the need for the first
step but is outside of the scope of encodebin.
Example applications: Istanbul, gnome-shell, recordmydesktop
* Live transcoding
This is the case of an incoming live stream which will be
broadcasted/transmitted live.
One issue to take into account is to reduce the encoding latency to
a minimum. This should mostly be done by picking low-latency
encoders.
Example applications: Rygel, Coherence
* Transmuxing
Given a certain file, the aim is to remux the contents WITHOUT
decoding into either a different container format or the same
container format.
Remuxing into the same container format is useful when the file was
not created properly (for example, the index is missing).
Whenever available, parsers should be applied on the encoded streams
to validate and/or fix the streams before muxing them.
Metadata from the original file must be kept in the newly created
file.
Example applications: Arista, Transmaggedon
* Loss-less cutting
Given a certain file, the aim is to extract a certain part of the
file without going through the process of decoding and re-encoding
that file.
This is similar to the transmuxing use-case.
Example applications: PiTiVi, Transmageddon, Arista, ...
* Multi-pass encoding
Some encoders allow doing a multi-pass encoding.
The initial pass(es) are only used to collect encoding estimates and
are not actually muxed and outputted.
The final pass uses previously collected information, and the output
is then muxed and outputted.
* Archiving and intermediary format
The requirement is to have lossless
* CD ripping
Example applications: Sound-juicer
* DVD ripping
Example application: Thoggen
* Research links
Some of these are still active documents, some other not
[0] GstPreset API documentation
http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
[1] gnome-media GConf profiles
http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
[2] Research on a Device Profile API
http://gstreamer.freedesktop.org/wiki/DeviceProfile
[3] Research on defining presets usage
http://gstreamer.freedesktop.org/wiki/PresetDesign

View file

@ -1,204 +0,0 @@
Orc Integration
===============
Sections
--------
- About Orc
- Fast memcpy()
- Normal Usage
- Build Process
- Testing
- Orc Limitations
About Orc
---------
Orc code can be in one of two forms: in .orc files that is converted
by orcc to C code that calls liborc functions, or C code that calls
liborc to create complex operations at runtime. The former is mostly
for functions with predetermined functionality. The latter is for
functionality that is determined at runtime, where writing .orc
functions for all combinations would be prohibitive. Orc also has
a fast memcpy and memset which are useful independently.
Fast memcpy()
-------------
*** This part is not integrated yet. ***
Orc has built-in functions orc_memcpy() and orc_memset() that work
like memcpy() and memset(). These are meant for large copies only.
A reasonable cutoff for using orc_memcpy() instead of memcpy() is
if the number of bytes is generally greater than 100. DO NOT use
orc_memcpy() if the typical is size is less than 20 bytes, especially
if the size is known at compile time, as these cases are inlined by
the compiler.
(Example: sys/ximage/ximagesink.c)
Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to
libgstximagesink_la_LIBADD. Then, in the source file, add:
#ifdef HAVE_ORC
#include <orc/orc.h>
#else
#define orc_memcpy(a,b,c) memcpy(a,b,c)
#endif
Then switch relevant uses of memcpy() to orc_memcpy().
The above example works whether or not Orc is enabled at compile
time.
Normal Usage
------------
The following lines are added near the top of Makefile.am for plugins
that use Orc code in .orc files (this is for the volume plugin):
ORC_BASE=volume
include $(top_srcdir)/common/orc.mk
Also add the generated source file to the plugin build:
nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES)
And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and
$(ORC_LIBS) to libgstvolume_la_LIBADD.
The value assigned to ORC_BASE does not need to be related to
the name of the plugin.
Advanced Usage
--------------
The Holy Grail of Orc usage is to programmatically generate Orc code
at runtime, have liborc compile it into binary code at runtime, and
then execute this code. Currently, the best example of this is in
Schroedinger. An example of how this would be used is audioconvert:
given an input format, channel position manipulation, dithering and
quantizing configuration, and output format, a Orc code generator
would create an OrcProgram, add the appropriate instructions to do
each step based on the configuration, and then compile the program.
Successfully compiling the program would return a function pointer
that can be called to perform the operation.
This sort of advanced usage requires structural changes to current
plugins (e.g., audioconvert) and will probably be developed
incrementally. Moreover, if such code is intended to be used without
Orc as strict build/runtime requirement, two codepaths would need to
be developed and tested. For this reason, until GStreamer requires
Orc, I think it's a good idea to restrict such advanced usage to the
cog plugin in -bad, which requires Orc.
Build Process
-------------
The goal of the build process is to make Orc non-essential for most
developers and users. This is not to say you shouldn't have Orc
installed -- without it, you will get slow backup C code, just that
people compiling GStreamer are not forced to switch from Liboil to
Orc immediately.
With Orc installed, the build process will use the Orc Compiler (orcc)
to convert each .orc file into a temporary C source (tmp-orc.c) and a
temporary header file (${name}orc.h if constructed from ${base}.orc).
The C source file is compiled and linked to the plugin, and the header
file is included by other source files in the plugin.
If 'make orc-update' is run in the source directory, the files
tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and
${base}orc-dist.h respectively. The -dist.[ch] files are automatically
disted via orc.mk. The -dist.[ch] files should be checked in to
git whenever the .orc source is changed and checked in. Example
workflow:
edit .orc file
... make, test, etc.
make orc-update
git add volume.orc volumeorc-dist.c volumeorc-dist.h
git commit
At 'make dist' time, all of the .orc files are compiled, and then
copied to their -dist.[ch] counterparts, and then the -dist.[ch]
files are added to the dist directory.
Without Orc installed (or --disable-orc given to configure), the
-dist.[ch] files are copied to tmp-orc.c and ${name}orc.h. When
compiled Orc disabled, DISABLE_ORC is defined in config.h, and
the C backup code is compiled. This backup code is pure C, and
does not include orc headers or require linking against liborc.
The common/orc.mk build method is limited by the inflexibility of
automake. The file tmp-orc.c must be a fixed filename, using ORC_NAME
to generate the filename does not work because it conflicts with
automake's dependency generation. Building multiple .orc files
is not possible due to this restriction.
Testing
-------
If you create another .orc file, please add it to
tests/orc/Makefile.am. This causes automatic test code to be
generated and run during 'make check'. Each function in the .orc
file is tested by comparing the results of executing the run-time
compiled code and the C backup function.
Orc Limitations
---------------
audioconvert
Orc doesn't have a mechanism for generating random numbers, which
prevents its use as-is for dithering. One way around this is to
generate suitable dithering values in one pass, then use those
values in a second Orc-based pass.
Orc doesn't handle 64-bit float, for no good reason.
Irrespective of Orc handling 64-bit float, it would be useful to
have a direct 32-bit float to 16-bit integer conversion.
audioconvert is a good candidate for programmatically generated
Orc code.
audioconvert enumerates functions in terms of big-endian vs.
little-endian. Orc's functions are "native" and "swapped".
Programmatically generating code removes the need to worry about
this.
Orc doesn't handle 24-bit samples. Fixing this is not a priority
(for ds).
videoscale
Orc doesn't handle horizontal resampling yet. The plan is to add
special sampling opcodes, for nearest, bilinear, and cubic
interpolation.
videotestsrc
Lots of code in videotestsrc needs to be rewritten to be SIMD
(and Orc) friendly, e.g., stuff that uses oil_splat_u8().
A fast low-quality random number generator in Orc would be useful
here.
volume
Many of the comments on audioconvert apply here as well.
There are a bunch of FIXMEs in here that are due to misapplied
patches.

View file

@ -1,91 +0,0 @@
Forcing keyframes
-----------------
Consider the following use case:
We have a pipeline that performs video and audio capture from a live source,
compresses and muxes the streams and writes the resulting data into a file.
Inside the uncompressed video data we have a specific pattern inserted at
specific moments that should trigger a switch to a new file, meaning, we close
the existing file we are writing to and start writing to a new file.
We want the new file to start with a keyframe so that one can start decoding
the file immediately.
Components:
1) We need an element that is able to detect the pattern in the video stream.
2) We need to inform the video encoder that it should start encoding a keyframe
starting from exactly the frame with the pattern.
3) We need to inform the demuxer that it should flush out any pending data and
start creating the start of a new file with the keyframe as a first video
frame.
4) We need to inform the sink element that it should start writing to the next
file. This requires application interaction to instruct the sink of the new
filename. The application should also be free to ignore the boundary and
continue to write to the existing file. The application will typically use
an event pad probe to detect the custom event.
Implementation:
The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM
event that marks the keyframe boundary. This event is inserted into the
pipeline by the application upon a certain trigger. In the above use case this
trigger would be given by the element that detects the pattern, in the form of
an element message.
The custom event would travel further downstream to instruct encoder, muxer and
sink about the possible switch.
The information passed in the event consists of:
name: GstForceKeyUnit
(G_TYPE_UINT64)"timestamp" : the timestamp of the buffer that
triggered the event.
(G_TYPE_UINT64)"stream-time" : the stream position that triggered the
event.
(G_TYPE_UINT64)"running-time" : the running time of the stream when the
event was triggered.
(G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in
the caps or those sent at the start of
the stream.
.... : optional other data fields.
Note that this event is purely informational, no element is required to
perform an action but it should forward the event downstream, just like any
other event it does not handle.
Elements understanding the event should behave as follows:
1) The video encoder receives the event before the next frame. Upon reception
of the event it schedules to encode the next frame as a keyframe.
Before pushing out the encoded keyframe it must push the GstForceKeyUnit
event downstream.
2) The muxer receives the GstForceKeyUnit event and flushes out its current state,
preparing to produce data that can be used as a keyunit. Before pushing out
the new data it pushes the GstForceKeyUnit event downstream.
3) The application receives the GstForceKeyUnit on a sink padprobe of the sink
and reconfigures the sink to make it perform new actions after receiving
the next buffer.
Upstream
--------
When using RTP packets can get lost or receivers can be added at any time,
they may request a new key frame.
An downstream element sends an upstream "GstForceKeyUnit" event up the
pipeline.
When an element produces some kind of key unit in output, but has
no such concept in its input (like an encoder that takes raw frames),
it consumes the event (doesn't pass it upstream), and instead sends
a downstream GstForceKeyUnit event and a new keyframe.

View file

@ -1,546 +0,0 @@
===============================================================
Subtitle overlays, hardware-accelerated decoding and playbin
===============================================================
Status: EARLY DRAFT / BRAINSTORMING
=== 1. Background ===
Subtitles can be muxed in containers or come from an external source.
Subtitles come in many shapes and colours. Usually they are either
text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
and the most common form of DVB subs). Bitmap based subtitles are
usually compressed in some way, like some form of run-length encoding.
Subtitles are currently decoded and rendered in subtitle-format-specific
overlay elements. These elements have two sink pads (one for raw video
and one for the subtitle format in question) and one raw video source pad.
They will take care of synchronising the two input streams, and of
decoding and rendering the subtitles on top of the raw video stream.
Digression: one could theoretically have dedicated decoder/render elements
that output an AYUV or ARGB image, and then let a videomixer element do
the actual overlaying, but this is not very efficient, because it requires
us to allocate and blend whole pictures (1920x1080 AYUV = 8MB,
1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region
is only a small rectangle at the bottom. This wastes memory and CPU.
We could do something better by introducing a new format that only
encodes the region(s) of interest, but we don't have such a format yet, and
are not necessarily keen to rewrite this part of the logic in playbin
at this point - and we can't change existing elements' behaviour, so would
need to introduce new elements for this.
Playbin2 supports outputting compressed formats, i.e. it does not
force decoding to a raw format, but is happy to output to a non-raw
format as long as the sink supports that as well.
In case of certain hardware-accelerated decoding APIs, we will make use
of that functionality. However, the decoder will not output a raw video
format then, but some kind of hardware/API-specific format (in the caps)
and the buffers will reference hardware/API-specific objects that
the hardware/API-specific sink will know how to handle.
=== 2. The Problem ===
In the case of such hardware-accelerated decoding, the decoder will not
output raw pixels that can easily be manipulated. Instead, it will
output hardware/API-specific objects that can later be used to render
a frame using the same API.
Even if we could transform such a buffer into raw pixels, we most
likely would want to avoid that, in order to avoid the need to
map the data back into system memory (and then later back to the GPU).
It's much better to upload the much smaller encoded data to the GPU/DSP
and then leave it there until rendered.
Currently playbin only supports subtitles on top of raw decoded video.
It will try to find a suitable overlay element from the plugin registry
based on the input subtitle caps and the rank. (It is assumed that we
will be able to convert any raw video format into any format required
by the overlay using a converter such as videoconvert.)
It will not render subtitles if the video sent to the sink is not
raw YUV or RGB or if conversions have been disabled by setting the
native-video flag on playbin.
Subtitle rendering is considered an important feature. Enabling
hardware-accelerated decoding by default should not lead to a major
feature regression in this area.
This means that we need to support subtitle rendering on top of
non-raw video.
=== 3. Possible Solutions ===
The goal is to keep knowledge of the subtitle format within the
format-specific GStreamer plugins, and knowledge of any specific
video acceleration API to the GStreamer plugins implementing
that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate
plugins link to libva/libvdpau/etc. and we do not want to make
the vaapi/vdpau plugins link to all of libpango/libkate/libass etc.
Multiple possible solutions come to mind:
(a) backend-specific overlay elements
e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
This assumes the overlay can be done directly on the backend-specific
object passed around.
The main drawback with this solution is that it leads to a lot of
code duplication and may also lead to uncertainty about distributing
certain duplicated pieces of code. The code duplication is pretty
much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
kate, assrender, etc. available in form of base classes to derive
from is not really an option. Similarly, one would not really want
the vaapi/vdpau plugin to depend on a bunch of other libraries
such as libpango, libkate, libtiger, libass, etc.
One could add some new kind of overlay plugin feature though in
combination with a generic base class of some sort, but in order
to accommodate all the different cases and formats one would end
up with quite convoluted/tricky API.
(Of course there could also be a GstFancyVideoBuffer that provides
an abstraction for such video accelerated objects and that could
provide an API to add overlays to it in a generic way, but in the
end this is just a less generic variant of (c), and it is not clear
that there are real benefits to a specialised solution vs. a more
generic one).
(b) convert backend-specific object to raw pixels and then overlay
Even where possible technically, this is most likely very
inefficient.
(c) attach the overlay data to the backend-specific video frame buffers
in a generic way and do the actual overlaying/blitting later in
backend-specific code such as the video sink (or an accelerated
encoder/transcoder)
In this case, the actual overlay rendering (i.e. the actual text
rendering or decoding DVD/DVB data into pixels) is done in the
subtitle-format-specific GStreamer plugin. All knowledge about
the subtitle format is contained in the overlay plugin then,
and all knowledge about the video backend in the video backend
specific plugin.
The main question then is how to get the overlay pixels (and
we will only deal with pixels here) from the overlay element
to the video sink.
This could be done in multiple ways: One could send custom
events downstream with the overlay data, or one could attach
the overlay data directly to the video buffers in some way.
Sending inline events has the advantage that is is fairly
transparent to any elements between the overlay element and
the video sink: if an effects plugin creates a new video
buffer for the output, nothing special needs to be done to
maintain the subtitle overlay information, since the overlay
data is not attached to the buffer. However, it slightly
complicates things at the sink, since it would also need to
look for the new event in question instead of just processing
everything in its buffer render function.
If one attaches the overlay data to the buffer directly, any
element between overlay and video sink that creates a new
video buffer would need to be aware of the overlay data
attached to it and copy it over to the newly-created buffer.
One would have to do implement a special kind of new query
(e.g. FEATURE query) that is not passed on automatically by
gst_pad_query_default() in order to make sure that all elements
downstream will handle the attached overlay data. (This is only
a problem if we want to also attach overlay data to raw video
pixel buffers; for new non-raw types we can just make it
mandatory and assume support and be done with it; for existing
non-raw types nothing changes anyway if subtitles don't work)
(we need to maintain backwards compatibility for existing raw
video pipelines like e.g.: ..decoder ! suboverlay ! encoder..)
Even though slightly more work, attaching the overlay information
to buffers seems more intuitive than sending it interleaved as
events. And buffers stored or passed around (e.g. via the
"last-buffer" property in the sink when doing screenshots via
playbin) always contain all the information needed.
(d) create a video/x-raw-*-delta format and use a backend-specific videomixer
This possibility was hinted at already in the digression in
section 1. It would satisfy the goal of keeping subtitle format
knowledge in the subtitle plugins and video backend knowledge
in the video backend plugin. It would also add a concept that
might be generally useful (think ximagesrc capture with xdamage).
However, it would require adding foorender variants of all the
existing overlay elements, and changing playbin to that new
design, which is somewhat intrusive. And given the general
nature of such a new format/API, we would need to take a lot
of care to be able to accommodate all possible use cases when
designing the API, which makes it considerably more ambitious.
Lastly, we would need to write videomixer variants for the
various accelerated video backends as well.
Overall (c) appears to be the most promising solution. It is the least
intrusive and should be fairly straight-forward to implement with
reasonable effort, requiring only small changes to existing elements
and requiring no new elements.
Doing the final overlaying in the sink as opposed to a videomixer
or overlay in the middle of the pipeline has other advantages:
- if video frames need to be dropped, e.g. for QoS reasons,
we could also skip the actual subtitle overlaying and
possibly the decoding/rendering as well, if the
implementation and API allows for that to be delayed.
- the sink often knows the actual size of the window/surface/screen
the output video is rendered to. This *may* make it possible to
render the overlay image in a higher resolution than the input
video, solving a long standing issue with pixelated subtitles on
top of low-resolution videos that are then scaled up in the sink.
This would require for the rendering to be delayed of course instead
of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
in the overlay, but that could all be supported.
- if the video backend / sink has support for high-quality text
rendering (clutter?) we could just pass the text or pango markup
to the sink and let it do the rest (this is unlikely to be
supported in the general case - text and glyph rendering is
hard; also, we don't really want to make up our own text markup
system, and pango markup is probably too limited for complex
karaoke stuff).
=== 4. API needed ===
(a) Representation of subtitle overlays to be rendered
We need to pass the overlay pixels from the overlay element to the
sink somehow. Whatever the exact mechanism, let's assume we pass
a refcounted GstVideoOverlayComposition struct or object.
A composition is made up of one or more overlays/rectangles.
In the simplest case an overlay rectangle is just a blob of
RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other
metadata, and there is only one rectangle to render.
We're keeping the naming generic ("OverlayFoo" rather than
"SubtitleFoo") here, since this might also be handy for
other use cases such as e.g. logo overlays or so. It is not
designed for full-fledged video stream mixing though.
// Note: don't mind the exact implementation details, they'll be hidden
// FIXME: might be confusing in 0.11 though since GstXOverlay was
// renamed to GstVideoOverlay in 0.11, but not much we can do,
// maybe we can rename GstVideoOverlay to something better
struct GstVideoOverlayComposition
{
guint num_rectangles;
GstVideoOverlayRectangle ** rectangles;
/* lowest rectangle sequence number still used by the upstream
* overlay element. This way a renderer maintaining some kind of
* rectangles <-> surface cache can know when to free cached
* surfaces/rectangles. */
guint min_seq_num_used;
/* sequence number for the composition (same series as rectangles) */
guint seq_num;
}
struct GstVideoOverlayRectangle
{
/* Position on video frame and dimension of output rectangle in
* output frame terms (already adjusted for the PAR of the output
* frame). x/y can be negative (overlay will be clipped then) */
gint x, y;
guint render_width, render_height;
/* Dimensions of overlay pixels */
guint width, height, stride;
/* This is the PAR of the overlay pixels */
guint par_n, par_d;
/* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
* and BGRA on little-endian systems (i.e. pixels are treated as
* 32-bit values and alpha is always in the most-significant byte,
* and blue is in the least-significant byte).
*
* FIXME: does anyone actually use AYUV in practice? (we do
* in our utility function to blend on top of raw video)
* What about AYUV and endianness? Do we always have [A][Y][U][V]
* in memory? */
/* FIXME: maybe use our own enum? */
GstVideoFormat format;
/* Refcounted blob of memory, no caps or timestamps */
GstBuffer *pixels;
// FIXME: how to express source like text or pango markup?
// (just add source type enum + source buffer with data)
//
// FOR 0.10: always send pixel blobs, but attach source data in
// addition (reason: if downstream changes, we can't renegotiate
// that properly, if we just do a query of supported formats from
// the start). Sink will just ignore pixels and use pango markup
// from source data if it supports that.
//
// FOR 0.11: overlay should query formats (pango markup, pixels)
// supported by downstream and then only send that. We can
// renegotiate via the reconfigure event.
//
/* sequence number: useful for backends/renderers/sinks that want
* to maintain a cache of rectangles <-> surfaces. The value of
* the min_seq_num_used in the composition tells the renderer which
* rectangles have expired. */
guint seq_num;
/* FIXME: we also need a (private) way to cache converted/scaled
* pixel blobs */
}
(a1) Overlay consumer API:
How would this work in a video sink that supports scaling of textures:
gst_foo_sink_render () {
/* assume only one for now */
if video_buffer has composition:
composition = video_buffer.get_composition()
for each rectangle in composition:
if rectangle.source_data_type == PANGO_MARKUP
actor = text_from_pango_markup (rectangle.get_source_data())
else
pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
actor = texture_from_rgba (pixels, ...)
.. position + scale on top of video surface ...
}
(a2) Overlay producer API:
e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
if (logoverlay->cached_composition == NULL) {
comp = composition_new ();
rect = rectangle_new (format, pixels_buf,
width, height, stride, par_n, par_d,
x, y, render_width, render_height);
/* composition adds its own ref for the rectangle */
composition_add_rectangle (comp, rect);
rectangle_unref (rect);
/* buffer adds its own ref for the composition */
video_buffer_attach_composition (comp);
/* we take ownership of the composition and save it for later */
logoverlay->cached_composition = comp;
} else {
video_buffer_attach_composition (logoverlay->cached_composition);
}
FIXME: also add some API to modify render position/dimensions of
a rectangle (probably requires creation of new rectangle, unless
we handle writability like with other mini objects).
(b) Fallback overlay rendering/blitting on top of raw video
Eventually we want to use this overlay mechanism not only for
hardware-accelerated video, but also for plain old raw video,
either at the sink or in the overlay element directly.
Apart from the advantages listed earlier in section 3, this
allows us to consolidate a lot of overlaying/blitting code that
is currently repeated in every single overlay element in one
location. This makes it considerably easier to support a whole
range of raw video formats out of the box, add SIMD-optimised
rendering using ORC, or handle corner cases correctly.
(Note: side-effect of overlaying raw video at the video sink is
that if e.g. a screnshotter gets the last buffer via the last-buffer
property of basesink, it would get an image without the subtitles
on top. This could probably be fixed by re-implementing the
property in GstVideoSink though. Playbin2 could handle this
internally as well).
void
gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
GstBuffer * video_buf)
{
guint n;
g_return_if_fail (gst_buffer_is_writable (video_buf));
g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
... parse video_buffer caps into BlendVideoFormatInfo ...
for each rectangle in the composition: {
if (gst_video_format_is_yuv (video_buf_format)) {
overlay_format = FORMAT_AYUV;
} else if (gst_video_format_is_rgb (video_buf_format)) {
overlay_format = FORMAT_ARGB;
} else {
/* FIXME: grayscale? */
return;
}
/* this will scale and convert AYUV<->ARGB if needed */
pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
... clip output rectangle ...
__do_blend (video_buf_format, video_buf->data,
overlay_format, pixels->data,
x, y, width, height, stride);
gst_buffer_unref (pixels);
}
}
(c) Flatten all rectangles in a composition
We cannot assume that the video backend API can handle any
number of rectangle overlays, it's possible that it only
supports one single overlay, in which case we need to squash
all rectangles into one.
However, we'll just declare this a corner case for now, and
implement it only if someone actually needs it. It's easy
to add later API-wise. Might be a bit tricky if we have
rectangles with different PARs/formats (e.g. subs and a logo),
though we could probably always just use the code from (b)
with a fully transparent video buffer to create a flattened
overlay buffer.
(d) core API: new FEATURE query
For 0.10 we need to add a FEATURE query, so the overlay element
can query whether the sink downstream and all elements between
the overlay element and the sink support the new overlay API.
Elements in between need to support it because the render
positions and dimensions need to be updated if the video is
cropped or rescaled, for example.
In order to ensure that all elements support the new API,
we need to drop the query in the pad default query handler
(so it only succeeds if all elements handle it explicitly).
Might want two variants of the feature query - one where
all elements in the chain need to support it explicitly
and one where it's enough if some element downstream
supports it.
In 0.11 this could probably be handled via GstMeta and
ALLOCATION queries (and/or we could simply require
elements to be aware of this API from the start).
There appears to be no issue with downstream possibly
not being linked yet at the time when an overlay would
want to do such a query.
Other considerations:
- renderers (overlays or sinks) may be able to handle only ARGB or only AYUV
(for most graphics/hw-API it's likely ARGB of some sort, while our
blending utility functions will likely want the same colour space as
the underlying raw video format, which is usually YUV of some sort).
We need to convert where required, and should cache the conversion.
- renderers may or may not be able to scale the overlay. We need to
do the scaling internally if not (simple case: just horizontal scaling
to adjust for PAR differences; complex case: both horizontal and vertical
scaling, e.g. if subs come from a different source than the video or the
video has been rescaled or cropped between overlay element and sink).
- renderers may be able to generate (possibly scaled) pixels on demand
from the original data (e.g. a string or RLE-encoded data). We will
ignore this for now, since this functionality can still be added later
via API additions. The most interesting case would be to pass a pango
markup string, since e.g. clutter can handle that natively.
- renderers may be able to write data directly on top of the video pixels
(instead of creating an intermediary buffer with the overlay which is
then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay
However, in the interest of simplicity, we should probably ignore the
fact that some elements can blend their overlays directly on top of the
video (decoding/uncompressing them on the fly), even more so as it's
not obvious that it's actually faster to decode the same overlay
70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend
it 70-90 times instead of decoding it once into a temporary buffer
and then blending it directly from there, possibly SIMD-accelerated.
Also, this is only relevant if the video is raw video and not some
hardware-acceleration backend object.
And ultimately it is the overlay element that decides whether to do
the overlay right there and then or have the sink do it (if supported).
It could decide to keep doing the overlay itself for raw video and
only use our new API for non-raw video.
- renderers may want to make sure they only upload the overlay pixels once
per rectangle if that rectangle recurs in subsequent frames (as part of
the same composition or a different composition), as is likely. This caching
of e.g. surfaces needs to be done renderer-side and can be accomplished
based on the sequence numbers. The composition contains the lowest
sequence number still in use upstream (an overlay element may want to
cache created compositions+rectangles as well after all to re-use them
for multiple frames), based on that the renderer can expire cached
objects. The caching needs to be done renderer-side because attaching
renderer-specific objects to the rectangles won't work well given the
refcounted nature of rectangles and compositions, making it unpredictable
when a rectangle or composition will be freed or from which thread
context it will be freed. The renderer-specific objects are likely bound
to other types of renderer-specific contexts, and need to be managed
in connection with those.
- composition/rectangles should internally provide a certain degree of
thread-safety. Multiple elements (sinks, overlay element) might access
or use the same objects from multiple threads at the same time, and it
is expected that elements will keep a ref to compositions and rectangles
they push downstream for a while, e.g. until the current subtitle
composition expires.
=== 5. Future considerations ===
- alternatives: there may be multiple versions/variants of the same subtitle
stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same
subtitles. We could attach both variants and let the renderer pick the best
one for the situation (currently we just use the 16:9 version). With totem,
it's ultimately totem that adds the 'black bars' at the top/bottom, so totem
also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which
may render on top of the bars) or not, for example.
=== 6. Misc. FIXMEs ===
TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay
gst-launch-0.10 \
videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink
~~~ THE END ~~~

View file

@ -1,107 +0,0 @@
Interlaced Video
================
Video buffers have a number of states identifiable through a combination of caps
and buffer flags.
Possible states:
- Progressive
- Interlaced
- Plain
- One field
- Two fields
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
- Telecine
- One field
- Two fields
- Progressive
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
Note: It can be seen that the difference between the plain interlaced and
telecine states is that in the telecine state, buffers containing two fields may
be progressive.
Tools for identification:
- GstVideoInfo
- GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_...
- PROGRESSIVE
- INTERLEAVED
- MIXED
- Buffers flags - GST_VIDEO_BUFFER_FLAG_...
- TFF
- RFF
- ONEFIELD
- INTERLACED
Identification of Buffer States
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note that flags are not necessarily interpreted in the same way for all
different states nor are they necessarily required nor make sense in all cases.
Progressive
...........
If the interlace mode in the video info corresponding to a buffer is
"progressive", then the buffer is progressive.
Plain Interlaced
................
If the video info interlace mode is "interleaved", then the buffer is plain
interlaced.
GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be
displayed first. The timestamp on the buffer corresponds to the first field.
GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag)
should be repeated. This is generally only used for telecine purposes but as the
telecine state was added long after the interlaced state was added and defined,
this flag remains valid for plain interlaced buffers.
GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF
flag is to be used. The other field should be ignored.
Telecine
........
If video info interlace mode is "mixed" then the buffers are in some form of
telecine state.
The TFF and ONEFIELD flags have the same semantics as for the plain interlaced
state.
GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains
only repeated fields that are present in other buffers and are as such
unneeded. For example, in a sequence of three telecined frames, we might have:
AtAb AtBb BtBb
In this situation, we only need the first and third buffers as the second
buffer contains fields present in the first and third.
Note that the following state can have its second buffer identified using the
ONEFIELD flag (and TFF not set):
AtAb AtBb BtCb
The telecine state requires one additional flag to be able to identify
progressive buffers.
The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an
'interlaced' or 'mixed' buffer that contains two fields that, when combined
with fields from adjacent buffers, allow reconstruction of progressive frames.
The absence of the flag implies the buffer containing two fields is a
progressive frame.
For example in the following sequence, the third buffer would be mixed (yes, it
is a strange pattern, but it can happen):
AtAb AtBb BtCb CtDb DtDb

View file

@ -1,76 +0,0 @@
Media Types
-----------
audio/x-raw
format, G_TYPE_STRING, mandatory
The format of the audio samples, see the Formats section for a list
of valid sample formats.
rate, G_TYPE_INT, mandatory
The samplerate of the audio
channels, G_TYPE_INT, mandatory
The number of channels
channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels
Bitmask of channel positions present. May be omitted for mono and
stereo. May be set to 0 to denote that the channels are unpositioned.
layout, G_TYPE_STRING, mandatory
The layout of channels within a buffer. Possible values are
"interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR)
Use GstAudioInfo and related helper API to create and parse raw audio caps.
Metadata
--------
"GstAudioDownmixMeta"
A matrix for downmixing multichannel audio to a lower numer of channels.
Formats
-------
The following values can be used for the format string property.
"S8" 8-bit signed PCM audio
"U8" 8-bit unsigned PCM audio
"S16LE" 16-bit signed PCM audio
"S16BE" 16-bit signed PCM audio
"U16LE" 16-bit unsigned PCM audio
"U16BE" 16-bit unsigned PCM audio
"S24_32LE" 24-bit signed PCM audio packed into 32-bit
"S24_32BE" 24-bit signed PCM audio packed into 32-bit
"U24_32LE" 24-bit unsigned PCM audio packed into 32-bit
"U24_32BE" 24-bit unsigned PCM audio packed into 32-bit
"S32LE" 32-bit signed PCM audio
"S32BE" 32-bit signed PCM audio
"U32LE" 32-bit unsigned PCM audio
"U32BE" 32-bit unsigned PCM audio
"S24LE" 24-bit signed PCM audio
"S24BE" 24-bit signed PCM audio
"U24LE" 24-bit unsigned PCM audio
"U24BE" 24-bit unsigned PCM audio
"S20LE" 20-bit signed PCM audio
"S20BE" 20-bit signed PCM audio
"U20LE" 20-bit unsigned PCM audio
"U20BE" 20-bit unsigned PCM audio
"S18LE" 18-bit signed PCM audio
"S18BE" 18-bit signed PCM audio
"U18LE" 18-bit unsigned PCM audio
"U18BE" 18-bit unsigned PCM audio
"F32LE" 32-bit floating-point audio
"F32BE" 32-bit floating-point audio
"F64LE" 64-bit floating-point audio
"F64BE" 64-bit floating-point audio

View file

@ -1,28 +0,0 @@
Media Types
-----------
text/x-raw
format, G_TYPE_STRING, mandatory
The format of the text, see the Formats section for a list of valid format
strings.
Metadata
--------
There are no common metas for this raw format yet.
Formats
-------
"utf8" plain timed utf8 text (formerly text/plain)
Parsed timed text in utf8 format.
"pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup)
Same as "utf8", but text embedded in an XML-style markup language for
size, colour, emphasis, etc.
See http://developer.gnome.org/pango/stable/PangoMarkupFormat.html

File diff suppressed because it is too large Load diff

View file

@ -1,69 +0,0 @@
playbin
--------
The purpose of this element is to decode and render the media contained in a
given generic uri. The element extends GstPipeline and is typically used in
playback situations.
Required features:
- accept and play any valid uri. This includes
- rendering video/audio
- overlaying subtitles on the video
- optionally read external subtitle files
- allow for hardware (non raw) sinks
- selection of audio/video/subtitle streams based on language.
- perform network buffering/incremental download
- gapless playback
- support for visualisations with configurable sizes
- ability to reject files that are too big, or of a format that would require
too much CPU/memory usage.
- be very efficient with adding elements such as converters to reduce the
amount of negotiation that has to happen.
- handle chained oggs. This includes having support for dynamic pad add and
remove from a demuxer.
Components
----------
* decodebin2
- performs the autoplugging of demuxers/decoders
- emits signals when for steering the autoplugging
- to decide if a non-raw media format is acceptable as output
- to sort the possible decoders for a non-raw format
- see also decodebin2 design doc
* uridecodebin
- combination of a source to handle the given uri, an optional queueing element
and one or more decodebin2 elements to decode the non-raw streams.
* playsink
- handles display of audio/video/text.
- has request audio/video/text input pad. There is only one sinkpad per type.
The requested pads define the configuration of the internal pipeline.
- allows for setting audio/video sinks or does automatic sink selection.
- allows for configuration of visualisation element.
- allows for enable/disable of visualisation, audio and video.
* playbin
- combination of one or more uridecodebin elements to read the uri and subtitle
uri.
- support for queuing new media to support gapless playback.
- handles stream selection.
- uses playsink to display.
- selection of sinks and configuration of uridecodebin with raw output formats.
Gapless playback
----------------
playbin has an "about-to-finish" signal. The application should configure a new
uri (and optional suburi) in the callback. When the current media finishes, this
new media will be played next.

View file

@ -1,278 +0,0 @@
Design for Stereoscopic & Multiview Video Handling
==================================================
There are two cases to handle:
* Encoded video output from a demuxer to parser / decoder or from encoders into a muxer.
* Raw video buffers
The design below is somewhat based on the proposals from
[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
Multiview is used as a generic term to refer to handling both
stereo content (left and right eye only) as well as extensions for videos
containing multiple independent viewpoints.
Encoded Signalling
------------------
This is regarding the signalling in caps and buffers from demuxers to
parsers (sometimes) or out from encoders.
For backward compatibility with existing codecs many transports of
stereoscopic 3D content use normal 2D video with 2 views packed spatially
in some way, and put extra new descriptions in the container/mux.
Info in the demuxer seems to apply to stereo encodings only. For all
MVC methods I know, the multiview encoding is in the video bitstream itself
and therefore already available to decoders. Only stereo systems have been retro-fitted
into the demuxer.
Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
and it would be useful to be able to put the info onto caps and buffers from the
parser without decoding.
To handle both cases, we need to be able to output the required details on
encoded video for decoders to apply onto the raw video buffers they decode.
*If there ever is a need to transport multiview info for encoded data the
same system below for raw video or some variation should work*
### Encoded Video: Properties that need to be encoded into caps
1. multiview-mode (called "Channel Layout" in bug 611157)
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
(switches between mono and stereo - mp4 can do this)
* Uses a buffer flag to mark individual buffers as mono or "not mono"
(single|stereo|multiview) for mixed scenarios. The alternative (not
proposed) is for the demuxer to switch caps for each mono to not-mono
change, and not used a 'mixed' caps variant at all.
* _single_ refers to a stream of buffers that only contain 1 view.
It is different from mono in that the stream is a marked left or right
eye stream for later combining in a mixer or when displaying.
* _multiple_ marks a stream with multiple independent views encoded.
It is included in this list for completeness. As noted above, there's
currently no scenario that requires marking encoded buffers as MVC.
2. Frame-packing arrangements / view sequence orderings
* Possible frame packings: side-by-side, side-by-side-quincunx,
column-interleaved, row-interleaved, top-bottom, checker-board
* bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
I think that's covered by suitably adjusting pixel-aspect-ratio. If
not, they can be added later.
* _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
* _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
* _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
1 pixel offset of each eye needs to be accounted when upscaling or displaying
* there may be other packings (future expansion)
* Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
* _frame-by-frame_, each buffer is left, then right view etc
* _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
Demuxer info indicates which one is which.
Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
-> *Leave this for future expansion*
* _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
-> *Leave this for future expansion / deletion*
3. view encoding order
* Describes how to decide which piece of each frame corresponds to left or right eye
* Possible orderings left, right, left-then-right, right-then-left
- Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
- Need a buffer flag for marking the first buffer of a group.
4. "Frame layout flags"
* flags for view specific interpretation
* horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
* This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
* It might be better just to use a hex value / integer
Buffer representation for raw video
-----------------------------------
* Transported as normal video buffers with extra metadata
* The caps define the overall buffer width/height, with helper functions to
extract the individual views for packed formats
* pixel-aspect-ratio adjusted if needed to double the overall width/height
* video sinks that don't know about multiview extensions yet will show the packed view as-is
For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps
can disallow those transports.
* _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice
the height, so can be handled by adjusting the overall caps and strides
* Other exotic layouts need new pixel formats defined (checker-board, column-interleaved, side-by-side-quincunx)
* _Frame-by-frame_ - one view per buffer, but with alternating metas marking which buffer is which left/right/other view and using a new buffer flag as described above
to mark the start of a group of corresponding frames.
* New video caps addition as for encoded buffers
### Proposed Caps fields
Combining the requirements above and collapsing the combinations into mnemonics:
* multiview-mode =
mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
* multiview-flags =
+ 0x0000 none
+ 0x0001 right-view-first
+ 0x0002 left-h-flipped
+ 0x0004 left-v-flipped
+ 0x0008 right-h-flipped
+ 0x0010 right-v-flipped
### Proposed new buffer flags
Add two new GST_VIDEO_BUFFER flags in video-frame.h and make it clear that those
flags can apply to encoded video buffers too. wtay says that's currently the
case anyway, but the documentation should say it.
**GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW** - Marks a buffer as representing non-mono content, although it may be a single (left or right) eye view.
**GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE** - for frame-sequential methods of transport, mark the "first" of a left/right/other group of frames
### A new GstMultiviewMeta
This provides a place to describe all provided views in a buffer / stream,
and through Meta negotiation to inform decoders about which views to decode if
not all are wanted.
* Logical labels/names and mapping to GstVideoMeta numbers
* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
struct GstVideoMultiviewViewInfo {
guint view_label;
guint meta_id; // id of the GstVideoMeta for this view
padding;
}
struct GstVideoMultiviewMeta {
guint n_views;
GstVideoMultiviewViewInfo *view_info;
}
The meta is optional, and probably only useful later for MVC
Outputting stereo content
-------------------------
The initial implementation for output will be stereo content in glimagesink
### Output Considerations with OpenGL
* If we have support for stereo GL buffer formats, we can output separate left/right eye images and let the hardware take care of display.
* Otherwise, glimagesink needs to render one window with left/right in a suitable frame packing
and that will only show correctly in fullscreen on a device set for the right 3D packing -> requires app intervention to set the video mode.
* Which could be done manually on the TV, or with HDMI 1.4 by setting the right video mode for the screen to inform the TV or third option, we
support rendering to two separate overlay areas on the screen - one for left eye, one for right which can be supported using the 'splitter' element and 2 output sinks or, better, add a 2nd window overlay for split stereo output
* Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so initial implementation won't include that
## Other elements for handling multiview content
* videooverlay interface extensions
* __Q__: Should this be a new interface?
* Element message to communicate the presence of stereoscopic information to the app
* App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
* Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
* New API for the app to set rendering options for stereo/multiview content
* This might be best implemented as a **multiview GstContext**, so that
the pipeline can share app preferences for content interpretation and downmixing
to mono for output, or in the sink and have those down as far upstream/downstream as possible.
* Converter element
* convert different view layouts
* Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
* Mixer element
* take 2 video streams and output as stereo
* later take n video streams
* share code with the converter, it just takes input from n pads instead of one.
* Splitter element
* Output one pad per view
### Implementing MVC handling in decoders / parsers (and encoders)
Things to do to implement MVC handling
1. Parsing SEI in h264parse and setting caps (patches available in
bugzilla for parsing, see below)
2. Integrate gstreamer-vaapi MVC support with this proposal
3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
4. generating SEI in H.264 encoder
5. Support for MPEG2 MVC extensions
## Relevant bugs
[bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
[bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
[bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
## Other Information
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
## Open Questions
### Background
### Representation for GstGL
When uploading raw video frames to GL textures, the goal is to implement:
2. Split packed frames into separate GL textures when uploading, and
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
multiview-flags fields in the caps should change to reflect the conversion
from one incoming GstMemory to multiple GstGLMemory, and change the
width/height in the output info as needed.
This is (currently) targetted as 2 render passes - upload as normal
to a single stereo-packed RGBA texture, and then unpack into 2
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
2 GstGLMemory attached to one buffer. We can optimise the upload later
to go directly to 2 textures for common input formats.
Separat output textures have a few advantages:
* Filter elements can more easily apply filters in several passes to each
texture without fundamental changes to our filters to avoid mixing pixels
from separate views.
* Centralises the sampling of input video frame packings in the upload code,
which makes adding new packings in the future easier.
* Sampling multiple textures to generate various output frame-packings
for display is conceptually simpler than converting from any input packing
to any output packing.
* In implementations that support quad buffers, having separate textures
makes it trivial to do GL_LEFT/GL_RIGHT output
For either option, we'll need new glsink output API to pass more
information to applications about multiple views for the draw signal/callback.
I don't know if it's desirable to support *both* methods of representing
views. If so, that should be signalled in the caps too. That could be a
new multiview-mode for passing views in separate GstMemory objects
attached to a GstBuffer, which would not be GL specific.
### Overriding frame packing interpretation
Most sample videos available are frame packed, with no metadata
to say so. How should we override that interpretation?
* Simple answer: Use capssetter + new properties on playbin to
override the multiview fields
*Basically implemented in playbin, using a pad probe. Needs more work for completeness*
### Adding extra GstVideoMeta to buffers
There should be one GstVideoMeta for the entire video frame in packed
layouts, and one GstVideoMeta per GstGLMemory when views are attached
to a GstBuffer separately. This should be done by the buffer pool,
which knows from the caps.
### videooverlay interface extensions
GstVideoOverlay needs:
* A way to announce the presence of multiview content when it is
detected/signalled in a stream.
* A way to tell applications which output methods are supported/available
* A way to tell the sink which output method it should use
* Possibly a way to tell the sink to override the input frame
interpretation / caps - depends on the answer to the question
above about how to model overriding input interpretation.
### What's implemented
* Caps handling
* gst-plugins-base libsgstvideo pieces
* playbin caps overriding
* conversion elements - glstereomix, gl3dconvert (needs a rename),
glstereosplit.
### Possible future enhancements
* Make GLupload split to separate textures at upload time?
* Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
* Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
- current done by packing then downloading which isn't OK overhead for RGBA download
* Think about how we integrate GLstereo - do we need to do anything special,
or can the app just render to stereo/quad buffers if they're available?