design: move over design docs from gst-plugins-base

Or most of them anyway (excl. draft-hw-acceleration and draft-va which didn't seem particularly pertinent).
2025-01-31 03:29:50 +00:00 · 2016-12-08 22:58:08 +00:00 · 2016-12-08 22:58:08 +00:00 · aff7ad1080
commit aff7ad1080
parent a3fe9f6a7d
13 changed files with 3475 additions and 0 deletions
--- a/markdown/design/audiosinks.md
+++ b/markdown/design/audiosinks.md
@ -0,0 +1,129 @@
+## Audiosink design
+
+### Requirements
+
+  - must operate chain based. Most simple playback pipelines will push
+    audio from the decoders into the audio sink.
+
+  - must operate getrange based Most professional audio applications
+    will operate in a mode where the audio sink pulls samples from the
+    pipeline. This is typically done in a callback from the audiosink
+    requesting N samples. The callback is either scheduled from a thread
+    or from an interrupt from the audio hardware device.
+
+  - Exact sample accurate clocks. the audiosink must be able to provide
+    a clock that is sample accurate even if samples are dropped or when
+    discontinuities are found in the stream.
+
+  - Exact timing of playback. The audiosink must be able to play samples
+    at their exact times.
+
+  - use DMA access when possible. When the hardware can do DMA we should
+    use it. This should also work over bufferpools to avoid data copying
+    to/from kernel space.
+
+### Design
+
+The design is based on a set of base classes and the concept of a
+ringbuffer of samples.
+
+    +-----------+   - provide preroll, rendering, timing
+    + basesink  +   - caps nego
+    +-----+-----+
+          |
+    +-----V----------+   - manages ringbuffer
+    + audiobasesink  +   - manages scheduling (push/pull)
+    +-----+----------+   - manages clock/query/seek
+          |              - manages scheduling of samples in the ringbuffer
+          |              - manages caps parsing
+          |
+    +-----V------+   - default ringbuffer implementation with a GThread
+    + audiosink  +   - subclasses provide open/read/close methods
+    +------------+
+
+The ringbuffer is a contiguous piece of memory divided into segtotal
+pieces of segments. Each segment has segsize bytes.
+
+          play position 
+            v          
+    +---+---+---+-------------------------------------+----------+
+    + 0 | 1 | 2 | ....                                | segtotal |
+    +---+---+---+-------------------------------------+----------+
+    <--->
+      segsize bytes = N samples * bytes_per_sample.
+  
+The ringbuffer has a play position, which is expressed in segments. The
+play position is where the device is currently reading samples from the
+buffer.
+
+The ringbuffer can be put to the PLAYING or STOPPED state.
+
+In the STOPPED state no samples are played to the device and the play
+pointer does not advance.
+
+In the PLAYING state samples are written to the device and the
+ringbuffer should call a configurable callback after each segment is
+written to the device. In this state the play pointer is advanced after
+each segment is written.
+
+A write operation to the ringbuffer will put new samples in the
+ringbuffer. If there is not enough space in the ringbuffer, the write
+operation will block. The playback of the buffer never stops, even if
+the buffer is empty. When the buffer is empty, silence is played by the
+device.
+
+The ringbuffer is implemented with lockfree atomic operations,
+especially on the reading side so that low-latency operations are
+possible.
+
+Whenever new samples are to be put into the ringbuffer, the position of
+the read pointer is taken. The required write position is taken and the
+diff is made between the required and actual position. If the difference
+is \<0, the sample is too late. If the difference is bigger than
+segtotal, the writing part has to wait for the play pointer to advance.
+
+### Scheduling
+
+#### chain based mode
+
+In chain based mode, bytes are written into the ringbuffer. This
+operation will eventually block when the ringbuffer is filled.
+
+When no samples arrive in time, the ringbuffer will play silence. Each
+buffer that arrives will be placed into the ringbuffer at the correct
+times. This means that dropping samples or inserting silence is done
+automatically and very accurate and independend of the play pointer.
+
+In this mode, the ringbuffer is usually kept as full as possible. When
+using a small buffer (small segsize and segtotal), the latency for audio
+to start from the sink to when it is played can be kept low but at least
+one context switch has to be made between read and write.
+
+#### getrange based mode
+
+In getrange based mode, the audiobasesink will use the callback
+function of the ringbuffer to get a segsize samples from the peer
+element. These samples will then be placed in the ringbuffer at the
+next play position. It is assumed that the getrange function returns
+fast enough to fill the ringbuffer before the play pointer reaches
+the write pointer.
+    
+In this mode, the ringbuffer is usually kept as empty as possible.
+There is no context switch needed between the elements that create
+the samples and the actual writing of the samples to the device.
+
+#### DMA mode
+
+Elements that can do DMA based access to the audio device have to
+subclass from the GstAudioBaseSink class and wrap the DMA ringbuffer
+in a subclass of GstRingBuffer.
+    
+The ringbuffer subclass should trigger a callback after writing or
+playing each sample to the device. This callback can be triggered
+from a thread or from a signal from the audio device.
+
+### Clocks
+
+The GstAudioBaseSink class will use the ringbuffer to act as a clock
+provider. It can do this by using the play pointer and the delay to
+calculate the clock time.
--- a/markdown/design/decodebin.md
+++ b/markdown/design/decodebin.md
@ -0,0 +1,264 @@
+# Decodebin design
+
+## GstDecodeBin
+
+### Description
+
+ - Autoplug and decode to raw media
+
+ - Input: single pad with ANY caps
+
+ - Output: Dynamic pads
+
+### Contents
+
+ - a GstTypeFindElement connected to the single sink pad
+
+ - optionally a demuxer/parser
+
+ - optionally one or more DecodeGroup
+
+### Autoplugging
+
+The goal is to reach 'target' caps (by default raw media).
+
+This is done by using the GstCaps of a source pad and finding the
+available demuxers/decoders GstElement that can be linked to that pad.
+
+The process starts with the source pad of typefind and stops when no
+more non-target caps are left. It is commonly done while pre-rolling,
+but can also happen whenever a new pad appears on any element.
+
+Once a target caps has been found, that pad is ghosted and the
+'pad-added' signal is emitted.
+
+If no compatible elements can be found for a GstCaps, the pad is ghosted
+and the 'unknown-type' signal is emitted.
+
+### Assisted auto-plugging
+
+When starting the auto-plugging process for a given GstCaps, two signals
+are emitted in the following way in order to allow the application/user
+to assist or fine-tune the process.
+
+ - **'autoplug-continue'**:
+
+        gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
+    
+    This signal is fired at the very beginning with the source pad GstCaps. If
+    the callback returns TRUE, the process continues normally. If the
+    callback returns FALSE, then the GstCaps are considered as a target caps
+    and the autoplugging process stops.
+
+  - **'autoplug-factories'**:
+    
+        GValueArray user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps);
+    
+    Get a list of elementfactories for @pad with @caps. This function is
+    used to instruct decodebin2 of the elements it should try to
+    autoplug. The default behaviour when this function is not overriden
+    is to get all elements that can handle @caps from the registry
+    sorted by rank.
+
+  - **'autoplug-select'**:
+    
+        gint user_function (GstElement* decodebin, GstPad* pad, GstCaps*caps, GValueArray* factories);
+    
+    This signal is fired once autoplugging has got a list of compatible
+    GstElementFactory. The signal is emitted with the GstCaps of the
+    source pad and a pointer on the GValueArray of compatible factories.
+    
+    The callback should return the index of the elementfactory in
+    @factories that should be tried next.
+    
+    If the callback returns -1, the autoplugging process will stop as if
+    no compatible factories were found.
+
+The default implementation of this function will try to autoplug the
+first factory of the list.
+
+### Target Caps
+
+The target caps are a read/write GObject property of decodebin.
+
+By default the target caps are:
+
+ - Raw audio: audio/x-raw
+
+ - Raw video: video/x-raw
+
+ - Raw text: text/x-raw, format={utf8,pango-markup}
+
+### Media chain/group handling
+
+When autoplugging, all streams coming out of a demuxer will be grouped
+in a DecodeGroup.
+
+All new source pads created on that demuxer after it has emitted the
+'no-more-pads' signal will be put in another DecodeGroup.
+
+Only one decodegroup can be active at any given time. If a new
+decodegroup is created while another one exists, that decodegroup will
+be set as blocking until the existing one has drained.
+
+## DecodeGroup
+
+### Description
+
+Streams belonging to the same group/chain of a media file.
+
+### Contents
+
+The DecodeGroup contains:
+
+ - a GstMultiQueue to which all streams of a the media group are connected.
+
+ - the eventual decoders which are autoplugged in order to produce the
+   requested target pads.
+
+### Proper group draining
+
+The DecodeGroup takes care that all the streams in the group are
+completely drained (EOS has come through all source ghost pads).
+
+### Pre-roll and block
+
+The DecodeGroup has a global blocking feature. If enabled, all the
+ghosted source pads for that group will be blocked.
+
+A method is available to unblock all blocked pads for that group.
+
+## GstMultiQueue
+
+Multiple input-output data queue.
+
+`multiqueue` achieves the same functionality as `queue`, with a
+few differences:
+
+  - Multiple streams handling.
+    
+    The element handles queueing data on more than one stream at once.
+    To achieve such a feature it has request sink pads (sink\_%u) and
+    'sometimes' src pads (src\_%u).
+    
+    When requesting a given sinkpad, the associated srcpad for that
+    stream will be created. Ex: requesting sink\_1 will generate src\_1.
+
+  - Non-starvation on multiple streams.
+    
+    If more than one stream is used with the element, the streams'
+    queues will be dynamically grown (up to a limit), in order to ensure
+    that no stream is risking data starvation. This guarantees that at
+    any given time there are at least N bytes queued and available for
+    each individual stream.
+    
+    If an EOS event comes through a srcpad, the associated queue should
+    be considered as 'not-empty' in the queue-size-growing algorithm.
+
+  - Non-linked srcpads graceful handling.
+    
+    A GstTask is started for all srcpads when going to
+    GST\_STATE\_PAUSED.
+    
+    The task are blocking against a GCondition which will be fired in
+    two different cases:
+    
+    - When the associated queue has received a buffer.
+    
+    - When the associated queue was previously declared as 'not-linked'
+      and the first buffer of the queue is scheduled to be pushed
+      synchronously in relation to the order in which it arrived globally
+      in the element (see 'Synchronous data pushing' below).
+    
+    When woken up by the GCondition, the GstTask will try to push the
+    next GstBuffer/GstEvent on the queue. If pushing the
+    GstBuffer/GstEvent returns GST\_FLOW\_NOT\_LINKED, then the
+    associated queue is marked as 'not-linked'. If pushing the
+    GstBuffer/GstEvent succeeded the queue will no longer be marked as
+    'not-linked'.
+    
+    If pushing on all srcpads returns GstFlowReturn different from
+    GST\_FLOW\_OK, then all the srcpads' tasks are stopped and
+    subsequent pushes on sinkpads will return GST\_FLOW\_NOT\_LINKED.
+
+  - Synchronous data pushing for non-linked pads.
+    
+    In order to better support dynamic switching between streams, the
+    multiqueue (unlike the current GStreamer queue) continues to push
+    buffers on non-linked pads rather than shutting down.
+    
+    In addition, to prevent a non-linked stream from very quickly
+    consuming all available buffers and thus 'racing ahead' of the other
+    streams, the element must ensure that buffers and inlined events for
+    a non-linked stream are pushed in the same order as they were
+    received, relative to the other streams controlled by the element.
+    This means that a buffer cannot be pushed to a non-linked pad any
+    sooner than buffers in any other stream which were received before
+    it.
+
+## Parsers, decoders and auto-plugging
+
+This section has DRAFT status.
+
+Some media formats come in different "flavours" or "stream formats".
+These formats differ in the way the setup data and media data is
+signalled and/or packaged. An example for this is H.264 video, where
+there is a bytestream format (with codec setup data signalled inline and
+units prefixed by a sync code and packet length information) and a "raw"
+format where codec setup data is signalled out of band (via the caps)
+and the chunking is implicit in the way the buffers were muxed into a
+container, to mention just two of the possible variants.
+
+Especially on embedded platforms it is common that decoders can only
+handle one particular stream format, and not all of them.
+
+Where there are multiple stream formats, parsers are usually expected to
+be able to convert between the different formats. This will, if
+implemented correctly, work as expected in a static pipeline such as
+
+    ... ! parser ! decoder ! sink
+
+where the parser can query the decoder's capabilities even before
+processing the first piece of data, and configure itself to convert
+accordingly, if conversion is needed at all.
+
+In an auto-plugging context this is not so straight-forward though,
+because elements are plugged incrementally and not before the previous
+element has processes some data and decided what it will output exactly
+(unless the template caps are completely fixed, then it can continue
+right away, this is not always the case here though, see below). A
+parser will thus have to decide on *some* output format so auto-plugging
+can continue. It doesn't know anything about the available decoders and
+their capabilities though, so it's possible that it will choose a format
+that is not supported by any of the available decoders, or by the
+preferred decoder.
+
+If the parser had sufficiently concise but fixed source pad template
+caps, decodebin could continue to plug a decoder right away, allowing
+the parser to configure itself in the same way as it would with a static
+pipeline. This is not an option, unfortunately, because often the parser
+needs to process some data to determine e.g. the format's profile or
+other stream properties (resolution, sample rate, channel configuration,
+etc.), and there may be different decoders for different profiles (e.g.
+DSP codec for baseline profile, and software fallback for main/high
+profile; or a DSP codec only supporting certain resolutions, with a
+software fallback for unusual resolutions). So if decodebin just plugged
+the most highest-ranking decoder, that decoder might not be be able to
+handle the actual stream later on, which would yield an error (this is a
+data flow error then which would be hard to intercept and avoid in
+decodebin). In other words, we can't solve this issue by plugging a
+decoder right away with the parser.
+
+So decodebin needs to communicate to the parser the set of available
+decoder caps (which would contain the relevant capabilities/restrictions
+such as supported profiles, resolutions, etc.), after the usual
+"autoplug-\*" signal filtering/sorting of course.
+
+This is done by plugging a capsfilter element right after the parser,
+and constructing set of filter caps from the list of available decoders
+(one appends at the end just the name(s) of the caps structures from the
+parser pad template caps to function as an 'ANY other' caps equivalent).
+This let the parser negotiate to a supported stream format in the same
+way as with the static pipeline mentioned above, but of course incur
+some overhead through the additional capsfilter element.
+
--- a/markdown/design/encoding.md
+++ b/markdown/design/encoding.md
@ -0,0 +1,469 @@
+## Encoding and Muxing
+
+## Problems this proposal attempts to solve
+
+  - Duplication of pipeline code for gstreamer-based applications
+    wishing to encode and or mux streams, leading to subtle differences
+    and inconsistencies across those applications.
+
+  - No unified system for describing encoding targets for applications
+    in a user-friendly way.
+
+  - No unified system for creating encoding targets for applications,
+    resulting in duplication of code across all applications,
+    differences and inconsistencies that come with that duplication, and
+    applications hardcoding element names and settings resulting in poor
+    portability.
+
+## Goals
+
+1.  Convenience encoding element
+
+    Create a convenience GstBin for encoding and muxing several streams,
+    hereafter called 'EncodeBin'.
+
+    This element will only contain one single property, which is a profile.
+
+2.  Define a encoding profile system
+
+3.  Encoding profile helper library
+
+Create a helper library to:
+
+ - create EncodeBin instances based on profiles, and
+
+ - help applications to create/load/save/browse those profiles.
+
+## EncodeBin
+
+### Proposed API
+
+EncodeBin is a GstBin subclass.
+
+It implements the GstTagSetter interface, by which it will proxy the
+calls to the muxer.
+
+Only two introspectable property (i.e. usable without extra API):
+ - A GstEncodingProfile
+ - The name of the profile to use
+
+When a profile is selected, encodebin will:
+
+ - Add REQUEST sinkpads for all the GstStreamProfile
+ - Create the muxer and expose the source pad
+
+Whenever a request pad is created, encodebin will:
+
+ - Create the chain of elements for that pad
+ - Ghost the sink pad
+ - Return that ghost pad
+
+This allows reducing the code to the minimum for applications wishing to
+encode a source for a given profile:
+
+    encbin = gst_element_factory_make ("encodebin, NULL);
+    g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
+    gst_element_link (encbin, filesink);
+
+    vsrcpad = gst_element_get_src_pad (source, "src1");
+    vsinkpad = gst_element_get_request\_pad (encbin, "video\_%u");
+    gst_pad_link (vsrcpad, vsinkpad);
+
+### Explanation of the Various stages in EncodeBin
+
+This describes the various stages which can happen in order to end up
+with a multiplexed stream that can then be stored or streamed.
+
+#### Incoming streams
+
+The streams fed to EncodeBin can be of various types:
+
+  - Video
+  - Uncompressed (but maybe subsampled)
+  - Compressed
+  - Audio
+  - Uncompressed (audio/x-raw)
+  - Compressed
+  - Timed text
+  - Private streams
+
+#### Steps involved for raw video encoding
+
+0)  Incoming Stream
+
+1)  Transform raw video feed (optional)
+
+Here we modify the various fundamental properties of a raw video stream
+to be compatible with the intersection of: \* The encoder GstCaps and \*
+The specified "Stream Restriction" of the profile/target
+
+The fundamental properties that can be modified are: \* width/height
+This is done with a video scaler. The DAR (Display Aspect Ratio) MUST be
+respected. If needed, black borders can be added to comply with the
+target DAR. \* framerate \* format/colorspace/depth All of this is done
+with a colorspace converter
+
+2)  Actual encoding (optional for raw streams)
+
+An encoder (with some optional settings) is used.
+
+3)  Muxing
+
+A muxer (with some optional settings) is used.
+
+4)  Outgoing encoded and muxed stream
+
+#### Steps involved for raw audio encoding
+
+This is roughly the same as for raw video, expect for (1)
+
+1)  Transform raw audo feed (optional)
+
+We modify the various fundamental properties of a raw audio stream to be
+compatible with the intersection of: \* The encoder GstCaps and \* The
+specified "Stream Restriction" of the profile/target
+
+The fundamental properties that can be modifier are: \* Number of
+channels \* Type of raw audio (integer or floating point) \* Depth
+(number of bits required to encode one sample)
+
+#### Steps involved for encoded audio/video streams
+
+Steps (1) and (2) are replaced by a parser if a parser is available for
+the given format.
+
+#### Steps involved for other streams
+
+Other streams will just be forwarded as-is to the muxer, provided the
+muxer accepts the stream type.
+
+## Encoding Profile System
+
+This work is based on:
+
+ - The existing [GstPreset API documentation][gst-preset] system for elements
+
+ - The gnome-media [GConf audio profile system][gconf-audio-profile]
+
+ - The investigation done into device profiles by Arista and
+   Transmageddon: [Research on a Device Profile API][device-profile-api],
+   and [Research on defining presets usage][preset-usage].
+
+### Terminology
+
+  - Encoding Target Category A Target Category is a classification of
+    devices/systems/use-cases for encoding.
+
+Such a classification is required in order for: \* Applications with a
+very-specific use-case to limit the number of profiles they can offer
+the user. A screencasting application has no use with the online
+services targets for example. \* Offering the user some initial
+classification in the case of a more generic encoding application (like
+a video editor or a transcoder).
+
+Ex: Consumer devices Online service Intermediate Editing Format
+Screencast Capture Computer
+
+  - Encoding Profile Target A Profile Target describes a specific entity
+    for which we wish to encode. A Profile Target must belong to at
+    least one Target Category. It will define at least one Encoding
+    Profile.
+
+    Examples (with category): Nokia N900 (Consumer device) Sony PlayStation 3
+    (Consumer device) Youtube (Online service) DNxHD (Intermediate editing
+    format) HuffYUV (Screencast) Theora (Computer)
+
+  - Encoding Profile A specific combination of muxer, encoders, presets
+    and limitations.
+
+    Examples: Nokia N900/H264 HQ, Ipod/High Quality, DVD/Pal,
+    Youtube/High Quality HTML5/Low Bandwith, DNxHD
+
+### Encoding Profile
+
+An encoding profile requires the following information:
+
+  - Name This string is not translatable and must be unique. A
+    recommendation to guarantee uniqueness of the naming could be:
+    <target>/<name>
+  - Description This is a translatable string describing the profile
+  - Muxing format This is a string containing the GStreamer media-type
+    of the container format.
+  - Muxing preset This is an optional string describing the preset(s) to
+    use on the muxer.
+  - Multipass setting This is a boolean describing whether the profile
+    requires several passes.
+  - List of Stream Profile
+
+2.3.1 Stream Profiles
+
+A Stream Profile consists of:
+
+  - Type The type of stream profile (audio, video, text, private-data)
+  - Encoding Format This is a string containing the GStreamer media-type
+    of the encoding format to be used. If encoding is not to be applied,
+    the raw audio media type will be used.
+  - Encoding preset This is an optional string describing the preset(s)
+    to use on the encoder.
+  - Restriction This is an optional GstCaps containing the restriction
+    of the stream that can be fed to the encoder. This will generally
+    containing restrictions in video width/heigh/framerate or audio
+    depth.
+  - presence This is an integer specifying how many streams can be used
+    in the containing profile. 0 means that any number of streams can be
+    used.
+  - pass This is an integer which is only meaningful if the multipass
+    flag has been set in the profile. If it has been set it indicates
+    which pass this Stream Profile corresponds to.
+
+### 2.4 Example profile
+
+The representation used here is XML only as an example. No decision is
+made as to which formatting to use for storing targets and profiles.
+
+<gst-encoding-target>
+      <name>Nokia N900</name>
+      <category>Consumer Device</category>
+      <profiles>
+        <profile>Nokia N900/H264 HQ</profile>
+        <profile>Nokia N900/MP3</profile>
+        <profile>Nokia N900/AAC</profile>
+      </profiles>
+    </gst-encoding-target>
+    
+    <gst-encoding-profile>
+      <name>Nokia N900/H264 HQ</name>
+      <description>
+        High Quality H264/AAC for the Nokia N900
+      </description>
+      <format>video/quicktime,variant=iso</format>
+      <streams>
+        <stream-profile>
+          <type>audio</type>
+          <format>audio/mpeg,mpegversion=4</format>
+          <preset>Quality High/Main</preset>
+          <restriction>audio/x-raw,channels=[1,2]</restriction>
+          <presence>1</presence>
+        </stream-profile>
+        <stream-profile>
+          <type>video</type>
+          <format>video/x-h264</format>
+          <preset>Profile Baseline/Quality High</preset>
+          <restriction>
+            video/x-raw,width=[16, 800],\
+	    height=[16, 480],framerate=[1/1, 30000/1001]
+          </restriction>
+          <presence>1</presence>
+        </stream-profile>
+      </streams>  
+    </gst-encoding-profile>
+
+### API
+
+A proposed C API is contained in the gstprofile.h file in this
+directory.
+
+### Modifications required in the existing GstPreset system
+
+#### Temporary preset.
+
+Currently a preset needs to be saved on disk in order to be used.
+
+This makes it impossible to have temporary presets (that exist only
+during the lifetime of a process), which might be required in the new
+proposed profile system
+
+#### Categorisation of presets.
+
+Currently presets are just aliases of a group of property/value without
+any meanings or explanation as to how they exclude each other.
+
+Take for example the H264 encoder. It can have presets for: \* passes
+(1,2 or 3 passes) \* profiles (Baseline, Main, ...) \* quality (Low,
+medium, High)
+
+In order to programmatically know which presets exclude each other, we
+here propose the categorisation of these presets.
+
+This can be done in one of two ways 1. in the name (by making the name
+be \[<category>:\]<name>) This would give for example: "Quality:High",
+"Profile:Baseline" 2. by adding a new \_meta key This would give for
+example: \_meta/category:quality
+
+#### Aggregation of presets.
+
+There can be more than one choice of presets to be done for an element
+(quality, profile, pass).
+
+This means that one can not currently describe the full configuration of
+an element with a single string but with many.
+
+The proposal here is to extend the GstPreset API to be able to set all
+presets using one string and a well-known separator ('/').
+
+This change only requires changes in the core preset handling code.
+
+This would allow doing the following: gst\_preset\_load\_preset
+(h264enc, "pass:1/profile:baseline/quality:high");
+
+### Points to be determined
+
+This document hasn't determined yet how to solve the following problems:
+
+#### Storage of profiles
+
+One proposal for storage would be to use a system wide directory (like
+$prefix/share/gstreamer-0.10/profiles) and store XML files for every
+individual profiles.
+
+Users could then add their own profiles in ~/.gstreamer-0.10/profiles
+
+This poses some limitations as to what to do if some applications want
+to have some profiles limited to their own usage.
+
+## Helper library for profiles
+
+These helper methods could also be added to existing libraries (like
+GstPreset, GstPbUtils, ..).
+
+The various API proposed are in the accompanying gstprofile.h file.
+
+### Getting user-readable names for formats
+
+This is already provided by GstPbUtils.
+
+### Hierarchy of profiles
+
+The goal is for applications to be able to present to the user a list of
+combo-boxes for choosing their output profile:
+
+\[ Category \] \# optional, depends on the application \[ Device/Site/..
+\] \# optional, depends on the application \[ Profile \]
+
+Convenience methods are offered to easily get lists of categories,
+devices, and profiles.
+
+### Creating Profiles
+
+The goal is for applications to be able to easily create profiles.
+
+The applications needs to be able to have a fast/efficient way to: \*
+select a container format and see all compatible streams he can use with
+it. \* select a codec format and see which container formats he can use
+with it.
+
+The remaining parts concern the restrictions to encoder input.
+
+### Ensuring availability of plugins for Profiles
+
+When an application wishes to use a Profile, it should be able to query
+whether it has all the needed plugins to use it.
+
+This part will use GstPbUtils to query, and if needed install the
+missing plugins through the installed distribution plugin installer.
+
+## Use-cases researched
+
+This is a list of various use-cases where encoding/muxing is being used.
+
+### Transcoding
+
+The goal is to convert with as minimal loss of quality any input file
+for a target use. A specific variant of this is transmuxing (see below).
+
+Example applications: Arista, Transmageddon
+
+### Rendering timelines
+
+The incoming streams are a collection of various segments that need to
+be rendered. Those segments can vary in nature (i.e. the video
+width/height can change). This requires the use of identiy with the
+single-segment property activated to transform the incoming collection
+of segments to a single continuous segment.
+
+Example applications: PiTiVi, Jokosher
+
+### Encoding of live sources
+
+The major risk to take into account is the encoder not encoding the
+incoming stream fast enough. This is outside of the scope of encodebin,
+and should be solved by using queues between the sources and encodebin,
+as well as implementing QoS in encoders and sources (the encoders
+emitting QoS events, and the upstream elements adapting themselves
+accordingly).
+
+Example applications: camerabin, cheese
+
+### Screencasting applications
+
+This is similar to encoding of live sources. The difference being that
+due to the nature of the source (size and amount/frequency of updates)
+one might want to do the encoding in two parts: \* The actual live
+capture is encoded with a 'almost-lossless' codec (such as huffyuv) \*
+Once the capture is done, the file created in the first step is then
+rendered to the desired target format.
+
+Fixing sources to only emit region-updates and having encoders capable
+of encoding those streams would fix the need for the first step but is
+outside of the scope of encodebin.
+
+Example applications: Istanbul, gnome-shell, recordmydesktop
+
+### Live transcoding
+
+This is the case of an incoming live stream which will be
+broadcasted/transmitted live. One issue to take into account is to
+reduce the encoding latency to a minimum. This should mostly be done by
+picking low-latency encoders.
+
+Example applications: Rygel, Coherence
+
+### Transmuxing
+
+Given a certain file, the aim is to remux the contents WITHOUT decoding
+into either a different container format or the same container format.
+Remuxing into the same container format is useful when the file was not
+created properly (for example, the index is missing). Whenever
+available, parsers should be applied on the encoded streams to validate
+and/or fix the streams before muxing them.
+
+Metadata from the original file must be kept in the newly created file.
+
+Example applications: Arista, Transmaggedon
+
+### Loss-less cutting
+
+Given a certain file, the aim is to extract a certain part of the file
+without going through the process of decoding and re-encoding that file.
+This is similar to the transmuxing use-case.
+
+Example applications: PiTiVi, Transmageddon, Arista, ...
+
+### Multi-pass encoding
+
+Some encoders allow doing a multi-pass encoding. The initial pass(es)
+are only used to collect encoding estimates and are not actually muxed
+and outputted. The final pass uses previously collected information, and
+the output is then muxed and outputted.
+
+### Archiving and intermediary format
+
+The requirement is to have lossless
+
+### CD ripping
+
+Example applications: Sound-juicer
+
+### DVD ripping
+
+Example application: Thoggen
+
+### Research links
+
+Some of these are still active documents, some other not
+
+[gst-preset]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
+[gconf-audio-profile]: http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
+[device-profile-api]: http://gstreamer.freedesktop.org/wiki/DeviceProfile (FIXME: wiki is gone)
+[preset-usage]: http://gstreamer.freedesktop.org/wiki/PresetDesign  (FIXME: wiki is gone)
+
--- a/markdown/design/interlaced-video.md
+++ b/markdown/design/interlaced-video.md
@ -0,0 +1,102 @@
+# Interlaced Video
+
+Video buffers have a number of states identifiable through a combination
+of caps and buffer flags.
+
+Possible states:
+- Progressive
+- Interlaced
+ - Plain
+  - One field
+  - Two fields
+  - Three fields - this should be a progressive buffer with a repeated 'first'
+    field that can be used for telecine pulldown
+ - Telecine
+   - One field
+   - Two fields
+    - Progressive
+    - Interlaced (a.k.a. 'mixed'; the fields are from different frames)
+   - Three fields - this should be a progressive buffer with a repeated 'first'
+     field that can be used for telecine pulldown
+
+Note: It can be seen that the difference between the plain interlaced
+and telecine states is that in the telecine state, buffers containing
+two fields may be progressive.
+
+Tools for identification:
+ - GstVideoInfo
+  - GstVideoInterlaceMode - enum `GST_VIDEO_INTERLACE_MODE_...`
+   - PROGRESSIVE
+   - INTERLEAVED
+   - MIXED
+ - Buffers flags - `GST_VIDEO_BUFFER_FLAG_...`
+   - TFF
+   - RFF
+   - ONEFIELD
+   - INTERLACED
+
+## Identification of Buffer States
+
+Note that flags are not necessarily interpreted in the same way for all
+different states nor are they necessarily required nor make sense in all
+cases.
+
+### Progressive
+
+If the interlace mode in the video info corresponding to a buffer is
+**"progressive"**, then the buffer is progressive.
+
+### Plain Interlaced
+
+If the video info interlace mode is **"interleaved"**, then the buffer is
+plain interlaced.
+
+`GST_VIDEO_BUFFER_FLAG_TFF` indicates whether the top or bottom field
+is to be displayed first. The timestamp on the buffer corresponds to the
+first field.
+
+`GST_VIDEO_BUFFER_FLAG_RFF` indicates that the first field (indicated
+by the TFF flag) should be repeated. This is generally only used for
+telecine purposes but as the telecine state was added long after the
+interlaced state was added and defined, this flag remains valid for
+plain interlaced buffers.
+
+`GST_VIDEO_BUFFER_FLAG_ONEFIELD` means that only the field indicated
+through the TFF flag is to be used. The other field should be ignored.
+
+### Telecine
+
+If video info interlace mode is **"mixed"** then the buffers are in some
+form of telecine state.
+
+The `TFF` and `ONEFIELD` flags have the same semantics as for the plain
+interlaced state.
+
+`GST_VIDEO_BUFFER_FLAG_RFF` in the telecine state indicates that the
+buffer contains only repeated fields that are present in other buffers
+and are as such unneeded. For example, in a sequence of three telecined
+frames, we might have:
+
+    AtAb AtBb BtBb
+
+In this situation, we only need the first and third buffers as the
+second buffer contains fields present in the first and third.
+
+Note that the following state can have its second buffer identified
+using the `ONEFIELD` flag (and `TFF` not set):
+
+    AtAb AtBb BtCb
+
+The telecine state requires one additional flag to be able to identify
+progressive buffers.
+
+The presence of the `GST_VIDEO_BUFFER_FLAG_INTERLACED` means that the
+buffer is an 'interlaced' or 'mixed' buffer that contains two fields
+that, when combined with fields from adjacent buffers, allow
+reconstruction of progressive frames. The absence of the flag implies
+the buffer containing two fields is a progressive frame.
+
+For example in the following sequence, the third buffer would be mixed
+(yes, it is a strange pattern, but it can happen):
+
+    AtAb AtBb BtCb CtDb DtDb
--- a/markdown/design/keyframe-force.md
+++ b/markdown/design/keyframe-force.md
@ -0,0 +1,97 @@
+# Forcing keyframes
+
+Consider the following use case:
+
+We have a pipeline that performs video and audio capture from a live
+source, compresses and muxes the streams and writes the resulting data
+into a file.
+
+Inside the uncompressed video data we have a specific pattern inserted
+at specific moments that should trigger a switch to a new file, meaning,
+we close the existing file we are writing to and start writing to a new
+file.
+
+We want the new file to start with a keyframe so that one can start
+decoding the file immediately.
+
+## Components
+
+1)  We need an element that is able to detect the pattern in the video
+    stream.
+
+2)  We need to inform the video encoder that it should start encoding a
+    keyframe starting from exactly the frame with the pattern.
+
+3)  We need to inform the demuxer that it should flush out any pending
+    data and start creating the start of a new file with the keyframe as
+    a first video frame.
+
+4)  We need to inform the sink element that it should start writing to
+    the next file. This requires application interaction to instruct the
+    sink of the new filename. The application should also be free to
+    ignore the boundary and continue to write to the existing file. The
+    application will typically use an event pad probe to detect the
+    custom event.
+
+## Implementation
+
+### Downstream
+
+The implementation would consist of generating a `GST_EVENT_CUSTOM_DOWNSTREAM`
+event that marks the keyframe boundary. This event is inserted into the
+pipeline by the application upon a certain trigger. In the above use case
+this trigger would be given by the element that detects the pattern, in the
+form of an element message.
+
+The custom event would travel further downstream to instruct encoder,
+muxer and sink about the possible switch.
+
+The information passed in the event consists of:
+
+**GstForceKeyUnit**
+
+ - **"timestamp"** (`G_TYPE_UINT64`): the timestamp of the buffer that
+   triggered the event.
+
+ - **"stream-time"** (`G_TYPE_UINT64`): the stream position that triggered the event.
+
+ - **"running-time"** (`G_TYPE_UINT64`): the running time of the stream when
+   the event was triggered.
+
+ - **"all-headers"**  (`G_TYPE_BOOLEAN`): Send all headers, including
+   those in the caps or those sent at the start of the stream.
+
+ - **...**: optional other data fields.
+
+Note that this event is purely informational, no element is required to
+perform an action but it should forward the event downstream, just like
+any other event it does not handle.
+
+Elements understanding the event should behave as follows:
+
+1)  The video encoder receives the event before the next frame. Upon
+    reception of the event it schedules to encode the next frame as a
+    keyframe. Before pushing out the encoded keyframe it must push the
+    GstForceKeyUnit event downstream.
+
+2)  The muxer receives the GstForceKeyUnit event and flushes out its
+    current state, preparing to produce data that can be used as a
+    keyunit. Before pushing out the new data it pushes the
+    GstForceKeyUnit event downstream.
+
+3)  The application receives the GstForceKeyUnit on a sink padprobe of
+    the sink and reconfigures the sink to make it perform new actions
+    after receiving the next buffer.
+
+### Upstream
+
+When using RTP packets can get lost or receivers can be added at any
+time, they may request a new key frame.
+
+An downstream element sends an upstream "GstForceKeyUnit" event up the
+pipeline.
+
+When an element produces some kind of key unit in output, but has no
+such concept in its input (like an encoder that takes raw frames), it
+consumes the event (doesn't pass it upstream), and instead sends a
+downstream GstForceKeyUnit event and a new keyframe.
--- a/markdown/design/mediatype-audio-raw.md
+++ b/markdown/design/mediatype-audio-raw.md
@ -0,0 +1,68 @@
+# Raw Audio Media Types
+
+**audio/x-raw**
+
+ - **format**, G\_TYPE\_STRING, mandatory The format of the audio samples, see
+   the Formats section for a list of valid sample formats.
+
+ - **rate**, G\_TYPE\_INT, mandatory The samplerate of the audio
+
+ - **channels**, G\_TYPE\_INT, mandatory The number of channels
+
+ - **channel-mask**, GST\_TYPE\_BITMASK, mandatory for more than 2 channels
+   Bitmask of channel positions present. May be omitted for mono and
+   stereo. May be set to 0 to denote that the channels are unpositioned.
+
+ - **layout**, G\_TYPE\_STRING, mandatory The layout of channels within a
+   buffer. Possible values are "interleaved" (for LRLRLRLR) and
+   "non-interleaved" (LLLLRRRR)
+
+Use `GstAudioInfo` and related helper API to create and parse raw audio caps.
+
+## Metadata
+
+ - `GstAudioDownmixMeta`: A matrix for downmixing multichannel audio to a
+   lower numer of channels.
+
+## Formats
+
+The following values can be used for the format string property.
+
+ - "S8" 8-bit signed PCM audio
+ - "U8" 8-bit unsigned PCM audio
+
+ - "S16LE" 16-bit signed PCM audio
+ - "S16BE" 16-bit signed PCM audio
+ - "U16LE" 16-bit unsigned PCM audio
+ - "U16BE" 16-bit unsigned PCM audio
+
+ - "S24\_32LE" 24-bit signed PCM audio packed into 32-bit
+ - "S24\_32BE" 24-bit signed PCM audio packed into 32-bit
+ - "U24\_32LE" 24-bit unsigned PCM audio packed into 32-bit
+ - "U24\_32BE" 24-bit unsigned PCM audio packed into 32-bit
+
+ - "S32LE" 32-bit signed PCM audio
+ - "S32BE" 32-bit signed PCM audio
+ - "U32LE" 32-bit unsigned PCM audio
+ - "U32BE" 32-bit unsigned PCM audio
+
+ - "S24LE" 24-bit signed PCM audio
+ - "S24BE" 24-bit signed PCM audio
+ - "U24LE" 24-bit unsigned PCM audio
+ - "U24BE" 24-bit unsigned PCM audio
+
+ - "S20LE" 20-bit signed PCM audio
+ - "S20BE" 20-bit signed PCM audio
+ - "U20LE" 20-bit unsigned PCM audio
+ - "U20BE" 20-bit unsigned PCM audio
+
+ - "S18LE" 18-bit signed PCM audio
+ - "S18BE" 18-bit signed PCM audio
+ - "U18LE" 18-bit unsigned PCM audio
+ - "U18BE" 18-bit unsigned PCM audio
+
+ - "F32LE" 32-bit floating-point audio
+ - "F32BE" 32-bit floating-point audio
+ - "F64LE" 64-bit floating-point audio
+ - "F64BE" 64-bit floating-point audio
+
--- a/markdown/design/mediatype-text-raw.md
+++ b/markdown/design/mediatype-text-raw.md
@ -0,0 +1,22 @@
+# Raw Text Media Types
+
+**text/x-raw**
+
+ - **format**, G\_TYPE\_STRING, mandatory The format of the text, see the
+   Formats section for a list of valid format strings.
+
+## Metadata
+
+There are no common metas for this raw format yet.
+
+## Formats
+
+ - "utf8": plain timed utf8 text (formerly text/plain)
+   Parsed timed text in utf8 format.
+
+ - "pango-markup": plain timed utf8 text with pango markup
+   (formerly text/x-pango-markup). Same as "utf8", but text embedded in an
+   XML-style markup language for size, colour, emphasis, etc.
+   See [Pango Markup Format][pango-markup]
+
+[pango-markup]: http://developer.gnome.org/pango/stable/PangoMarkupFormat.html
--- a/markdown/design/mediatype-video-raw.md
+++ b/markdown/design/mediatype-video-raw.md
--- a/markdown/design/orc-integration.md
+++ b/markdown/design/orc-integration.md
@ -0,0 +1,159 @@
+# Orc Integration
+
+## About Orc
+
+Orc code can be in one of two forms: in .orc files that is converted by
+orcc to C code that calls liborc functions, or C code that calls liborc
+to create complex operations at runtime. The former is mostly for
+functions with predetermined functionality. The latter is for
+functionality that is determined at runtime, where writing .orc
+functions for all combinations would be prohibitive. Orc also has a fast
+memcpy and memset which are useful independently.
+
+## Fast memcpy()
+
+\*\*\* This part is not integrated yet. \*\*\*
+
+Orc has built-in functions `orc_memcpy()` and `orc_memset()` that work
+like `memcpy()` and `memset()`. These are meant for large copies only. A
+reasonable cutoff for using `orc_memcpy()` instead of `memcpy()` is if the
+number of bytes is generally greater than 100. **DO NOT** use `orc_memcpy()`
+if the typical is size is less than 20 bytes, especially if the size is
+known at compile time, as these cases are inlined by the compiler.
+
+(Example: sys/ximage/ximagesink.c)
+
+Add $(ORC\_CFLAGS) to libgstximagesink\_la\_CFLAGS and $(ORC\_LIBS) to
+libgstximagesink\_la\_LIBADD. Then, in the source file, add:
+
+\#ifdef HAVE\_ORC \#include <orc/orc.h> \#else \#define
+orc\_memcpy(a,b,c) memcpy(a,b,c) \#endif
+
+Then switch relevant uses of memcpy() to orc\_memcpy().
+
+The above example works whether or not Orc is enabled at compile time.
+
+## Normal Usage
+
+The following lines are added near the top of Makefile.am for plugins
+that use Orc code in .orc files (this is for the volume plugin):
+
+ORC\_BASE=volume include $(top\_srcdir)/common/orc.mk
+
+Also add the generated source file to the plugin build:
+
+nodist\_libgstvolume\_la\_SOURCES = $(ORC\_SOURCES)
+
+And of course, add $(ORC\_CFLAGS) to libgstvolume\_la\_CFLAGS, and
+$(ORC\_LIBS) to libgstvolume\_la\_LIBADD.
+
+The value assigned to ORC\_BASE does not need to be related to the name
+of the plugin.
+
+## Advanced Usage
+
+The Holy Grail of Orc usage is to programmatically generate Orc code at
+runtime, have liborc compile it into binary code at runtime, and then
+execute this code. Currently, the best example of this is in
+Schroedinger. An example of how this would be used is audioconvert:
+given an input format, channel position manipulation, dithering and
+quantizing configuration, and output format, a Orc code generator would
+create an OrcProgram, add the appropriate instructions to do each step
+based on the configuration, and then compile the program. Successfully
+compiling the program would return a function pointer that can be called
+to perform the operation.
+
+This sort of advanced usage requires structural changes to current
+plugins (e.g., audioconvert) and will probably be developed
+incrementally. Moreover, if such code is intended to be used without Orc
+as strict build/runtime requirement, two codepaths would need to be
+developed and tested. For this reason, until GStreamer requires Orc, I
+think it's a good idea to restrict such advanced usage to the cog plugin
+in -bad, which requires Orc.
+
+## Build Process
+
+The goal of the build process is to make Orc non-essential for most
+developers and users. This is not to say you shouldn't have Orc
+installed -- without it, you will get slow backup C code, just that
+people compiling GStreamer are not forced to switch from Liboil to Orc
+immediately.
+
+With Orc installed, the build process will use the Orc Compiler (orcc)
+to convert each .orc file into a temporary C source (tmp-orc.c) and a
+temporary header file (${name}orc.h if constructed from ${base}.orc).
+The C source file is compiled and linked to the plugin, and the header
+file is included by other source files in the plugin.
+
+If 'make orc-update' is run in the source directory, the files tmp-orc.c
+and ${base}orc.h are copied to ${base}orc-dist.c and ${base}orc-dist.h
+respectively. The -dist.\[ch\] files are automatically disted via
+orc.mk. The -dist.\[ch\] files should be checked in to git whenever the
+.orc source is changed and checked in. Example workflow:
+
+edit .orc file ... make, test, etc. make orc-update git add volume.orc
+volumeorc-dist.c volumeorc-dist.h git commit
+
+At 'make dist' time, all of the .orc files are compiled, and then copied
+to their -dist.\[ch\] counterparts, and then the -dist.\[ch\] files are
+added to the dist directory.
+
+Without Orc installed (or --disable-orc given to configure), the
+-dist.\[ch\] files are copied to tmp-orc.c and ${name}orc.h. When
+compiled Orc disabled, DISABLE\_ORC is defined in config.h, and the C
+backup code is compiled. This backup code is pure C, and does not
+include orc headers or require linking against liborc.
+
+The common/orc.mk build method is limited by the inflexibility of
+automake. The file tmp-orc.c must be a fixed filename, using ORC\_NAME
+to generate the filename does not work because it conflicts with
+automake's dependency generation. Building multiple .orc files is not
+possible due to this restriction.
+
+## Testing
+
+If you create another .orc file, please add it to tests/orc/Makefile.am.
+This causes automatic test code to be generated and run during 'make
+check'. Each function in the .orc file is tested by comparing the
+results of executing the run-time compiled code and the C backup
+function.
+
+## Orc Limitations
+
+### audioconvert
+
+Orc doesn't have a mechanism for generating random numbers, which
+prevents its use as-is for dithering. One way around this is to generate
+suitable dithering values in one pass, then use those values in a second
+Orc-based pass.
+
+Orc doesn't handle 64-bit float, for no good reason.
+
+Irrespective of Orc handling 64-bit float, it would be useful to have a
+direct 32-bit float to 16-bit integer conversion.
+
+audioconvert is a good candidate for programmatically generated Orc code.
+
+audioconvert enumerates functions in terms of big-endian vs.
+little-endian. Orc's functions are "native" and "swapped".
+Programmatically generating code removes the need to worry about this.
+
+Orc doesn't handle 24-bit samples. Fixing this is not a priority (for ds).
+
+### videoscale
+
+Orc doesn't handle horizontal resampling yet. The plan is to add special
+sampling opcodes, for nearest, bilinear, and cubic interpolation.
+
+### videotestsrc
+
+Lots of code in videotestsrc needs to be rewritten to be SIMD (and Orc)
+friendly, e.g., stuff that uses `oil_splat_u8()`.
+
+A fast low-quality random number generator in Orc would be useful here.
+
+### volume
+
+Many of the comments on audioconvert apply here as well.
+
+There are a bunch of FIXMEs in here that are due to misapplied patches.
--- a/markdown/design/playbin.md
+++ b/markdown/design/playbin.md
@ -0,0 +1,66 @@
+# playbin
+
+The purpose of this element is to decode and render the media contained
+in a given generic uri. The element extends GstPipeline and is typically
+used in playback situations.
+
+Required features:
+
+ - accept and play any valid uri. This includes
+ - rendering video/audio
+ - overlaying subtitles on the video
+ - optionally read external subtitle files
+ - allow for hardware (non raw) sinks
+ - selection of audio/video/subtitle streams based on language.
+ - perform network buffering/incremental download
+ - gapless playback
+ - support for visualisations with configurable sizes
+ - ability to reject files that are too big, or of a format that would
+   require too much CPU/memory usage.
+ - be very efficient with adding elements such as converters to reduce
+   the amount of negotiation that has to happen.
+ - handle chained oggs. This includes having support for dynamic pad
+   add and remove from a demuxer.
+
+## Components
+
+### decodebin
+
+ - performs the autoplugging of demuxers/decoders
+ - emits signals when for steering the autoplugging
+ - to decide if a non-raw media format is acceptable as output
+ - to sort the possible decoders for a non-raw format
+ - see also decodebin2 design doc
+
+### uridecodebin
+
+ - combination of a source to handle the given uri, an optional
+   queueing element and one or more decodebin2 elements to decode the
+   non-raw streams.
+
+### playsink
+
+ - handles display of audio/video/text.
+ - has request audio/video/text input pad. There is only one sinkpad
+   per type. The requested pads define the configuration of the
+   internal pipeline.
+ - allows for setting audio/video sinks or does automatic
+   sink selection.
+ - allows for configuration of visualisation element.
+ - allows for enable/disable of visualisation, audio and video.
+
+### playbin
+
+ - combination of one or more uridecodebin elements to read the uri and
+   subtitle uri.
+ - support for queuing new media to support gapless playback.
+ - handles stream selection.
+ - uses playsink to display.
+ - selection of sinks and configuration of uridecodebin with raw
+   output formats.
+
+## Gapless playback feature
+
+playbin has an "about-to-finish" signal. The application should
+configure a new uri (and optional suburi) in the callback. When the
+current media finishes, this new media will be played next.
--- a/markdown/design/stereo-multiview-video.md
+++ b/markdown/design/stereo-multiview-video.md
@ -0,0 +1,320 @@
+# Stereoscopic & Multiview Video Handling
+
+There are two cases to handle:
+
+ - Encoded video output from a demuxer to parser / decoder or from encoders
+   into a muxer.
+
+ - Raw video buffers
+
+The design below is somewhat based on the proposals from
+[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
+
+Multiview is used as a generic term to refer to handling both
+stereo content (left and right eye only) as well as extensions for videos
+containing multiple independent viewpoints.
+
+## Encoded Signalling
+
+This is regarding the signalling in caps and buffers from demuxers to
+parsers (sometimes) or out from encoders.
+
+For backward compatibility with existing codecs many transports of
+stereoscopic 3D content use normal 2D video with 2 views packed spatially
+in some way, and put extra new descriptions in the container/mux.
+
+Info in the demuxer seems to apply to stereo encodings only. For all
+MVC methods I know, the multiview encoding is in the video bitstream itself
+and therefore already available to decoders. Only stereo systems have been retro-fitted
+into the demuxer.
+
+Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
+and it would be useful to be able to put the info onto caps and buffers from the
+parser without decoding.
+
+To handle both cases, we need to be able to output the required details on
+encoded video for decoders to apply onto the raw video buffers they decode.
+
+*If there ever is a need to transport multiview info for encoded data the
+same system below for raw video or some variation should work*
+
+### Encoded Video: Properties that need to be encoded into caps
+
+1. multiview-mode (called "Channel Layout" in bug 611157)
+    * Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
+      (switches between mono and stereo - mp4 can do this)
+    * Uses a buffer flag to mark individual buffers as mono or "not mono"
+      (single|stereo|multiview) for mixed scenarios. The alternative (not
+      proposed) is for the demuxer to switch caps for each mono to not-mono
+      change, and not used a 'mixed' caps variant at all.
+    * _single_ refers to a stream of buffers that only contain 1 view.
+      It is different from mono in that the stream is a marked left or right
+      eye stream for later combining in a mixer or when displaying.
+    * _multiple_ marks a stream with multiple independent views encoded.
+      It is included in this list for completeness. As noted above, there's
+      currently no scenario that requires marking encoded buffers as MVC.
+
+2. Frame-packing arrangements / view sequence orderings
+    * Possible frame packings: side-by-side, side-by-side-quincunx,
+      column-interleaved, row-interleaved, top-bottom, checker-board
+    * bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
+      I think that's covered by suitably adjusting pixel-aspect-ratio. If
+      not, they can be added later.
+    * _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
+    * _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
+    * _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
+      1 pixel offset of each eye needs to be accounted when upscaling or displaying
+    * there may be other packings (future expansion)
+    * Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
+    * _frame-by-frame_, each buffer is left, then right view etc
+    * _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
+      Demuxer info indicates which one is which.
+      Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
+      -> *Leave this for future expansion*
+    * _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
+      -> *Leave this for future expansion / deletion*
+
+3. view encoding order
+    * Describes how to decide which piece of each frame corresponds to left or right eye
+    * Possible orderings left, right, left-then-right, right-then-left
+    - Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
+    - Need a buffer flag for marking the first buffer of a group.
+
+4. "Frame layout flags"
+    * flags for view specific interpretation
+    * horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
+      Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
+    * This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
+    * It might be better just to use a hex value / integer
+
+## Buffer representation for raw video
+
+ - Transported as normal video buffers with extra metadata
+ - The caps define the overall buffer width/height, with helper functions to
+   extract the individual views for packed formats
+ - pixel-aspect-ratio adjusted if needed to double the overall width/height
+ - video sinks that don't know about multiview extensions yet will show the
+   packed view as-is. For frame-sequence outputs, things might look weird, but
+   just adding multiview-mode to the sink caps can disallow those transports.
+ - _row-interleaved_ packing is actually just side-by-side memory layout with
+   half frame width, twice the height, so can be handled by adjusting the
+   overall caps and strides
+ - Other exotic layouts need new pixel formats defined (checker-board,
+   column-interleaved, side-by-side-quincunx)
+ - _Frame-by-frame_ - one view per buffer, but with alternating metas marking
+   which buffer is which left/right/other view and using a new buffer flag as
+   described above to mark the start of a group of corresponding frames.
+ - New video caps addition as for encoded buffers
+
+### Proposed Caps fields
+
+Combining the requirements above and collapsing the combinations into mnemonics:
+
+* multiview-mode =
+   mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
+   frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
+   mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
+
+* multiview-flags =
+    + 0x0000 none
+    + 0x0001 right-view-first
+    + 0x0002 left-h-flipped
+    + 0x0004 left-v-flipped
+    + 0x0008 right-h-flipped
+    + 0x0010 right-v-flipped
+
+### Proposed new buffer flags
+
+Add two new `GST_VIDEO_BUFFER_*` flags in video-frame.h and make it clear that
+those flags can apply to encoded video buffers too. wtay says that's currently
+the case anyway, but the documentation should say it.
+
+ - **`GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW`** - Marks a buffer as representing
+   non-mono content, although it may be a single (left or right) eye view.
+
+ - **`GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE`** - for frame-sequential methods of
+   transport, mark the "first" of a left/right/other group of frames
+
+### A new GstMultiviewMeta
+
+This provides a place to describe all provided views in a buffer / stream,
+and through Meta negotiation to inform decoders about which views to decode if
+not all are wanted.
+
+* Logical labels/names and mapping to GstVideoMeta numbers
+* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
+
+        GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
+        GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
+
+        struct GstVideoMultiviewViewInfo {
+            guint view_label;
+            guint meta_id; // id of the GstVideoMeta for this view
+
+            padding;
+        }
+
+        struct GstVideoMultiviewMeta {
+            guint n_views;
+            GstVideoMultiviewViewInfo *view_info;
+        }
+
+The meta is optional, and probably only useful later for MVC
+
+
+## Outputting stereo content
+
+The initial implementation for output will be stereo content in glimagesink
+
+### Output Considerations with OpenGL
+
+ - If we have support for stereo GL buffer formats, we can output separate
+   left/right eye images and let the hardware take care of display.
+
+ - Otherwise, glimagesink needs to render one window with left/right in a
+   suitable frame packing and that will only show correctly in fullscreen on a
+   device set for the right 3D packing -> requires app intervention to set the
+   video mode.
+
+ - Which could be done manually on the TV, or with HDMI 1.4 by setting the
+   right video mode for the screen to inform the TV or third option, we support
+   rendering to two separate overlay areas on the screen - one for left eye,
+   one for right which can be supported using the 'splitter' element and two
+   output sinks or, better, add a 2nd window overlay for split stereo output
+
+ - Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so
+   initial implementation won't include that
+
+## Other elements for handling multiview content
+
+ - videooverlay interface extensions
+   - __Q__: Should this be a new interface?
+   - Element message to communicate the presence of stereoscopic information to the app
+   - App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
+     - Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
+   - New API for the app to set rendering options for stereo/multiview content
+   - This might be best implemented as a **multiview GstContext**, so that
+     the pipeline can share app preferences for content interpretation and downmixing
+     to mono for output, or in the sink and have those down as far upstream/downstream as possible.
+
+ - Converter element
+   - convert different view layouts
+   - Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
+
+ - Mixer element
+   - take 2 video streams and output as stereo
+   - later take n video streams
+   - share code with the converter, it just takes input from n pads instead of one.
+
+ - Splitter element
+  - Output one pad per view
+
+### Implementing MVC handling in decoders / parsers (and encoders)
+
+Things to do to implement MVC handling
+
+1. Parsing SEI in h264parse and setting caps (patches available in
+   bugzilla for parsing, see below)
+2. Integrate gstreamer-vaapi MVC support with this proposal
+3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
+4. generating SEI in H.264 encoder
+5. Support for MPEG2 MVC extensions
+
+## Relevant bugs
+
+ - [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
+ - [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
+ - [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
+
+## Other Information
+
+[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
+
+## Open Questions
+
+### Background
+
+### Representation for GstGL
+
+When uploading raw video frames to GL textures, the goal is to implement:
+
+Split packed frames into separate GL textures when uploading, and
+attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
+multiview-flags fields in the caps should change to reflect the conversion
+from one incoming GstMemory to multiple GstGLMemory, and change the
+width/height in the output info as needed.
+
+This is (currently) targetted as 2 render passes - upload as normal
+to a single stereo-packed RGBA texture, and then unpack into 2
+smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
+2 GstGLMemory attached to one buffer. We can optimise the upload later
+to go directly to 2 textures for common input formats.
+
+Separat output textures have a few advantages:
+
+ - Filter elements can more easily apply filters in several passes to each
+   texture without fundamental changes to our filters to avoid mixing pixels
+   from separate views.
+
+ - Centralises the sampling of input video frame packings in the upload code,
+   which makes adding new packings in the future easier.
+
+ - Sampling multiple textures to generate various output frame-packings
+   for display is conceptually simpler than converting from any input packing
+   to any output packing.
+
+ - In implementations that support quad buffers, having separate textures
+   makes it trivial to do GL_LEFT/GL_RIGHT output
+
+For either option, we'll need new glsink output API to pass more
+information to applications about multiple views for the draw signal/callback.
+
+I don't know if it's desirable to support *both* methods of representing
+views. If so, that should be signalled in the caps too. That could be a
+new multiview-mode for passing views in separate GstMemory objects
+attached to a GstBuffer, which would not be GL specific.
+
+### Overriding frame packing interpretation
+
+Most sample videos available are frame packed, with no metadata
+to say so. How should we override that interpretation?
+
+ - Simple answer: Use capssetter + new properties on playbin to
+   override the multiview fields. *Basically implemented in playbin, using*
+   *a pad probe. Needs more work for completeness*
+
+### Adding extra GstVideoMeta to buffers
+
+There should be one GstVideoMeta for the entire video frame in packed
+layouts, and one GstVideoMeta per GstGLMemory when views are attached
+to a GstBuffer separately. This should be done by the buffer pool,
+which knows from the caps.
+
+### videooverlay interface extensions
+
+GstVideoOverlay needs:
+
+- A way to announce the presence of multiview content when it is
+  detected/signalled in a stream.
+- A way to tell applications which output methods are supported/available
+- A way to tell the sink which output method it should use
+- Possibly a way to tell the sink to override the input frame
+  interpretation / caps - depends on the answer to the question
+  above about how to model overriding input interpretation.
+
+### What's implemented
+
+- Caps handling
+- gst-plugins-base libsgstvideo pieces
+- playbin caps overriding
+- conversion elements - glstereomix, gl3dconvert (needs a rename),
+  glstereosplit.
+
+### Possible future enhancements
+
+- Make GLupload split to separate textures at upload time?
+  - Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
+- Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
+  - current done by packing then downloading which isn't OK overhead for RGBA download
+- Think about how we integrate GLstereo - do we need to do anything special,
+  or can the app just render to stereo/quad buffers if they're available?
--- a/markdown/design/subtitle-overlays.md
+++ b/markdown/design/subtitle-overlays.md
@ -0,0 +1,527 @@
+# Subtitle overlays, hardware-accelerated decoding and playbin
+
+This document describes some of the considerations and requirements that
+led to the current `GstVideoOverlayCompositionMeta` API which allows
+attaching of subtitle bitmaps or logos to video buffers.
+
+## Background
+
+Subtitles can be muxed in containers or come from an external source.
+
+Subtitles come in many shapes and colours. Usually they are either
+text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
+and the most common form of DVB subs). Bitmap based subtitles are
+usually compressed in some way, like some form of run-length encoding.
+
+Subtitles are currently decoded and rendered in subtitle-format-specific
+overlay elements. These elements have two sink pads (one for raw video
+and one for the subtitle format in question) and one raw video source
+pad.
+
+They will take care of synchronising the two input streams, and of
+decoding and rendering the subtitles on top of the raw video stream.
+
+Digression: one could theoretically have dedicated decoder/render
+elements that output an AYUV or ARGB image, and then let a videomixer
+element do the actual overlaying, but this is not very efficient,
+because it requires us to allocate and blend whole pictures (1920x1080
+AYUV = 8MB, 1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the
+overlay region is only a small rectangle at the bottom. This wastes
+memory and CPU. We could do something better by introducing a new format
+that only encodes the region(s) of interest, but we don't have such a
+format yet, and are not necessarily keen to rewrite this part of the
+logic in playbin at this point - and we can't change existing elements'
+behaviour, so would need to introduce new elements for this.
+
+Playbin supports outputting compressed formats, i.e. it does not force
+decoding to a raw format, but is happy to output to a non-raw format as
+long as the sink supports that as well.
+
+In case of certain hardware-accelerated decoding APIs, we will make use
+of that functionality. However, the decoder will not output a raw video
+format then, but some kind of hardware/API-specific format (in the caps)
+and the buffers will reference hardware/API-specific objects that the
+hardware/API-specific sink will know how to handle.
+
+## The Problem
+
+In the case of such hardware-accelerated decoding, the decoder will not
+output raw pixels that can easily be manipulated. Instead, it will
+output hardware/API-specific objects that can later be used to render a
+frame using the same API.
+
+Even if we could transform such a buffer into raw pixels, we most likely
+would want to avoid that, in order to avoid the need to map the data
+back into system memory (and then later back to the GPU). It's much
+better to upload the much smaller encoded data to the GPU/DSP and then
+leave it there until rendered.
+
+Before `GstVideoOverlayComposition` playbin only supported subtitles on
+top of raw decoded video. It would try to find a suitable overlay element
+from the plugin registry based on the input subtitle caps and the rank.
+(It is assumed that we will be able to convert any raw video format into
+any format required by the overlay using a converter such as videoconvert.)
+
+It would not render subtitles if the video sent to the sink is not raw
+YUV or RGB or if conversions had been disabled by setting the
+native-video flag on playbin.
+
+Subtitle rendering is considered an important feature. Enabling
+hardware-accelerated decoding by default should not lead to a major
+feature regression in this area.
+
+This means that we need to support subtitle rendering on top of non-raw
+video.
+
+## Possible Solutions
+
+The goal is to keep knowledge of the subtitle format within the
+format-specific GStreamer plugins, and knowledge of any specific video
+acceleration API to the GStreamer plugins implementing that API. We do
+not want to make the pango/dvbsuboverlay/dvdspu/kate plugins link to
+libva/libvdpau/etc. and we do not want to make the vaapi/vdpau plugins
+link to all of libpango/libkate/libass etc.
+
+Multiple possible solutions come to mind:
+
+1)  backend-specific overlay elements
+    
+    e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
+    vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
+    
+    This assumes the overlay can be done directly on the
+    backend-specific object passed around.
+    
+    The main drawback with this solution is that it leads to a lot of
+    code duplication and may also lead to uncertainty about distributing
+    certain duplicated pieces of code. The code duplication is pretty
+    much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
+    kate, assrender, etc. available in form of base classes to derive
+    from is not really an option. Similarly, one would not really want
+    the vaapi/vdpau plugin to depend on a bunch of other libraries such
+    as libpango, libkate, libtiger, libass, etc.
+    
+    One could add some new kind of overlay plugin feature though in
+    combination with a generic base class of some sort, but in order to
+    accommodate all the different cases and formats one would end up
+    with quite convoluted/tricky API.
+    
+    (Of course there could also be a GstFancyVideoBuffer that provides
+    an abstraction for such video accelerated objects and that could
+    provide an API to add overlays to it in a generic way, but in the
+    end this is just a less generic variant of (c), and it is not clear
+    that there are real benefits to a specialised solution vs. a more
+    generic one).
+
+2)  convert backend-specific object to raw pixels and then overlay
+    
+    Even where possible technically, this is most likely very
+    inefficient.
+
+3)  attach the overlay data to the backend-specific video frame buffers
+    in a generic way and do the actual overlaying/blitting later in
+    backend-specific code such as the video sink (or an accelerated
+    encoder/transcoder)
+    
+    In this case, the actual overlay rendering (i.e. the actual text
+    rendering or decoding DVD/DVB data into pixels) is done in the
+    subtitle-format-specific GStreamer plugin. All knowledge about the
+    subtitle format is contained in the overlay plugin then, and all
+    knowledge about the video backend in the video backend specific
+    plugin.
+    
+    The main question then is how to get the overlay pixels (and we will
+    only deal with pixels here) from the overlay element to the video
+    sink.
+    
+    This could be done in multiple ways: One could send custom events
+    downstream with the overlay data, or one could attach the overlay
+    data directly to the video buffers in some way.
+    
+    Sending inline events has the advantage that is is fairly
+    transparent to any elements between the overlay element and the
+    video sink: if an effects plugin creates a new video buffer for the
+    output, nothing special needs to be done to maintain the subtitle
+    overlay information, since the overlay data is not attached to the
+    buffer. However, it slightly complicates things at the sink, since
+    it would also need to look for the new event in question instead of
+    just processing everything in its buffer render function.
+    
+    If one attaches the overlay data to the buffer directly, any element
+    between overlay and video sink that creates a new video buffer would
+    need to be aware of the overlay data attached to it and copy it over
+    to the newly-created buffer.
+    
+    One would have to do implement a special kind of new query (e.g.
+    FEATURE query) that is not passed on automatically by
+    gst\_pad\_query\_default() in order to make sure that all elements
+    downstream will handle the attached overlay data. (This is only a
+    problem if we want to also attach overlay data to raw video pixel
+    buffers; for new non-raw types we can just make it mandatory and
+    assume support and be done with it; for existing non-raw types
+    nothing changes anyway if subtitles don't work) (we need to maintain
+    backwards compatibility for existing raw video pipelines like e.g.:
+    ..decoder \! suboverlay \! encoder..)
+    
+    Even though slightly more work, attaching the overlay information to
+    buffers seems more intuitive than sending it interleaved as events.
+    And buffers stored or passed around (e.g. via the "last-buffer"
+    property in the sink when doing screenshots via playbin) always
+    contain all the information needed.
+
+4)  create a video/x-raw-\*-delta format and use a backend-specific
+    videomixer
+    
+    This possibility was hinted at already in the digression in section
+    1. It would satisfy the goal of keeping subtitle format knowledge in
+    the subtitle plugins and video backend knowledge in the video
+    backend plugin. It would also add a concept that might be generally
+    useful (think ximagesrc capture with xdamage). However, it would
+    require adding foorender variants of all the existing overlay
+    elements, and changing playbin to that new design, which is somewhat
+    intrusive. And given the general nature of such a new format/API, we
+    would need to take a lot of care to be able to accommodate all
+    possible use cases when designing the API, which makes it
+    considerably more ambitious. Lastly, we would need to write
+    videomixer variants for the various accelerated video backends as
+    well.
+
+Overall (c) appears to be the most promising solution. It is the least
+intrusive and should be fairly straight-forward to implement with
+reasonable effort, requiring only small changes to existing elements and
+requiring no new elements.
+
+Doing the final overlaying in the sink as opposed to a videomixer or
+overlay in the middle of the pipeline has other advantages:
+
+  - if video frames need to be dropped, e.g. for QoS reasons, we could
+    also skip the actual subtitle overlaying and possibly the
+    decoding/rendering as well, if the implementation and API allows for
+    that to be delayed.
+
+  - the sink often knows the actual size of the window/surface/screen
+    the output video is rendered to. This *may* make it possible to
+    render the overlay image in a higher resolution than the input
+    video, solving a long standing issue with pixelated subtitles on top
+    of low-resolution videos that are then scaled up in the sink. This
+    would require for the rendering to be delayed of course instead of
+    just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
+    in the overlay, but that could all be supported.
+
+  - if the video backend / sink has support for high-quality text
+    rendering (clutter?) we could just pass the text or pango markup to
+    the sink and let it do the rest (this is unlikely to be supported in
+    the general case - text and glyph rendering is hard; also, we don't
+    really want to make up our own text markup system, and pango markup
+    is probably too limited for complex karaoke stuff).
+
+## API needed
+
+1)  Representation of subtitle overlays to be rendered
+    
+    We need to pass the overlay pixels from the overlay element to the
+    sink somehow. Whatever the exact mechanism, let's assume we pass a
+    refcounted GstVideoOverlayComposition struct or object.
+    
+    A composition is made up of one or more overlays/rectangles.
+    
+    In the simplest case an overlay rectangle is just a blob of
+    RGBA/ABGR \[FIXME?\] or AYUV pixels with positioning info and other
+    metadata, and there is only one rectangle to render.
+    
+    We're keeping the naming generic ("OverlayFoo" rather than
+    "SubtitleFoo") here, since this might also be handy for other use
+    cases such as e.g. logo overlays or so. It is not designed for
+    full-fledged video stream mixing
+        though.
+    
+        // Note: don't mind the exact implementation details, they'll be hidden
+        
+        // FIXME: might be confusing in 0.11 though since GstXOverlay was
+        //        renamed to GstVideoOverlay in 0.11, but not much we can do,
+        //        maybe we can rename GstVideoOverlay to something better
+        
+        struct GstVideoOverlayComposition
+        {
+            guint                          num_rectangles;
+            GstVideoOverlayRectangle    ** rectangles;
+        
+            /* lowest rectangle sequence number still used by the upstream
+             * overlay element. This way a renderer maintaining some kind of
+             * rectangles <-> surface cache can know when to free cached
+             * surfaces/rectangles. */
+            guint                          min_seq_num_used;
+        
+            /* sequence number for the composition (same series as rectangles) */
+            guint                          seq_num;
+        }
+        
+        struct GstVideoOverlayRectangle
+        {
+            /* Position on video frame and dimension of output rectangle in
+             * output frame terms (already adjusted for the PAR of the output
+             * frame). x/y can be negative (overlay will be clipped then) */
+            gint  x, y;
+            guint render_width, render_height;
+        
+            /* Dimensions of overlay pixels */
+            guint width, height, stride;
+        
+            /* This is the PAR of the overlay pixels */
+            guint par_n, par_d;
+        
+            /* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
+             * and BGRA on little-endian systems (i.e. pixels are treated as
+             * 32-bit values and alpha is always in the most-significant byte,
+             * and blue is in the least-significant byte).
+             *
+             * FIXME: does anyone actually use AYUV in practice? (we do
+             * in our utility function to blend on top of raw video)
+             * What about AYUV and endianness? Do we always have [A][Y][U][V]
+             * in memory? */
+            /* FIXME: maybe use our own enum? */
+            GstVideoFormat format;
+        
+            /* Refcounted blob of memory, no caps or timestamps */
+            GstBuffer *pixels;
+        
+            // FIXME: how to express source like text or pango markup?
+            //        (just add source type enum + source buffer with data)
+            //
+            // FOR 0.10: always send pixel blobs, but attach source data in
+            // addition (reason: if downstream changes, we can't renegotiate
+            // that properly, if we just do a query of supported formats from
+            // the start). Sink will just ignore pixels and use pango markup
+            // from source data if it supports that.
+            //
+            // FOR 0.11: overlay should query formats (pango markup, pixels)
+            // supported by downstream and then only send that. We can
+            // renegotiate via the reconfigure event.
+            //
+        
+            /* sequence number: useful for backends/renderers/sinks that want
+             * to maintain a cache of rectangles <-> surfaces. The value of
+             * the min_seq_num_used in the composition tells the renderer which
+             * rectangles have expired. */
+            guint      seq_num;
+        
+            /* FIXME: we also need a (private) way to cache converted/scaled
+             * pixel blobs */
+        }
+    
+    (a1) Overlay consumer
+        API:
+    
+        How would this work in a video sink that supports scaling of textures:
+        
+        gst_foo_sink_render () {
+          /* assume only one for now */
+          if video_buffer has composition:
+            composition = video_buffer.get_composition()
+        
+            for each rectangle in composition:
+              if rectangle.source_data_type == PANGO_MARKUP
+                actor = text_from_pango_markup (rectangle.get_source_data())
+              else
+                pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
+                actor = texture_from_rgba (pixels, ...)
+        
+              .. position + scale on top of video surface ...
+        }
+    
+    (a2) Overlay producer
+        API:
+    
+        e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
+        
+         if (logoverlay->cached_composition == NULL) {
+           comp = composition_new ();
+        
+           rect = rectangle_new (format, pixels_buf,
+                                 width, height, stride, par_n, par_d,
+                                 x, y, render_width, render_height);
+        
+           /* composition adds its own ref for the rectangle */
+           composition_add_rectangle (comp, rect);
+           rectangle_unref (rect);
+        
+           /* buffer adds its own ref for the composition */
+           video_buffer_attach_composition (comp);
+        
+           /* we take ownership of the composition and save it for later */
+           logoverlay->cached_composition = comp;
+         } else {
+           video_buffer_attach_composition (logoverlay->cached_composition);
+         }
+    
+    FIXME: also add some API to modify render position/dimensions of a
+    rectangle (probably requires creation of new rectangle, unless we
+    handle writability like with other mini objects).
+
+2)  Fallback overlay rendering/blitting on top of raw video
+    
+    Eventually we want to use this overlay mechanism not only for
+    hardware-accelerated video, but also for plain old raw video, either
+    at the sink or in the overlay element directly.
+    
+    Apart from the advantages listed earlier in section 3, this allows
+    us to consolidate a lot of overlaying/blitting code that is
+    currently repeated in every single overlay element in one location.
+    This makes it considerably easier to support a whole range of raw
+    video formats out of the box, add SIMD-optimised rendering using
+    ORC, or handle corner cases correctly.
+    
+    (Note: side-effect of overlaying raw video at the video sink is that
+    if e.g. a screnshotter gets the last buffer via the last-buffer
+    property of basesink, it would get an image without the subtitles on
+    top. This could probably be fixed by re-implementing the property in
+    GstVideoSink though. Playbin2 could handle this internally as well).
+    
+        void
+        gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
+                                             GstBuffer                  * video_buf)
+        {
+          guint n;
+        
+          g_return_if_fail (gst_buffer_is_writable (video_buf));
+          g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
+        
+          ... parse video_buffer caps into BlendVideoFormatInfo ...
+        
+          for each rectangle in the composition: {
+        
+                 if (gst_video_format_is_yuv (video_buf_format)) {
+                   overlay_format = FORMAT_AYUV;
+                 } else if (gst_video_format_is_rgb (video_buf_format)) {
+                   overlay_format = FORMAT_ARGB;
+                 } else {
+                   /* FIXME: grayscale? */
+                   return;
+                 }
+        
+                 /* this will scale and convert AYUV<->ARGB if needed */
+                 pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
+        
+                 ... clip output rectangle ...
+        
+                 __do_blend (video_buf_format, video_buf->data,
+                             overlay_format, pixels->data,
+                             x, y, width, height, stride);
+        
+                 gst_buffer_unref (pixels);
+          }
+        }
+
+3)  Flatten all rectangles in a composition
+    
+    We cannot assume that the video backend API can handle any number of
+    rectangle overlays, it's possible that it only supports one single
+    overlay, in which case we need to squash all rectangles into one.
+    
+    However, we'll just declare this a corner case for now, and
+    implement it only if someone actually needs it. It's easy to add
+    later API-wise. Might be a bit tricky if we have rectangles with
+    different PARs/formats (e.g. subs and a logo), though we could
+    probably always just use the code from (b) with a fully transparent
+    video buffer to create a flattened overlay buffer.
+
+4)  query support for the new video composition mechanism
+        
+    This is handled via GstMeta and an ALLOCATION query - we can simply
+    query whether downstream supports the GstVideoOverlayComposition meta.
+    
+    There appears to be no issue with downstream possibly not being
+    linked yet at the time when an overlay would want to do such a
+    query, but we would just have to default to something and update
+    ourselves later on a reconfigure event then.
+
+Other considerations:
+
+  - renderers (overlays or sinks) may be able to handle only ARGB or
+    only AYUV (for most graphics/hw-API it's likely ARGB of some sort,
+    while our blending utility functions will likely want the same
+    colour space as the underlying raw video format, which is usually
+    YUV of some sort). We need to convert where required, and should
+    cache the conversion.
+
+  - renderers may or may not be able to scale the overlay. We need to do
+    the scaling internally if not (simple case: just horizontal scaling
+    to adjust for PAR differences; complex case: both horizontal and
+    vertical scaling, e.g. if subs come from a different source than the
+    video or the video has been rescaled or cropped between overlay
+    element and sink).
+
+  - renderers may be able to generate (possibly scaled) pixels on demand
+    from the original data (e.g. a string or RLE-encoded data). We will
+    ignore this for now, since this functionality can still be added
+    later via API additions. The most interesting case would be to pass
+    a pango markup string, since e.g. clutter can handle that natively.
+
+  - renderers may be able to write data directly on top of the video
+    pixels (instead of creating an intermediary buffer with the overlay
+    which is then blended on top of the actual video frame), e.g.
+    dvdspu, dvbsuboverlay
+
+However, in the interest of simplicity, we should probably ignore the
+fact that some elements can blend their overlays directly on top of the
+video (decoding/uncompressing them on the fly), even more so as it's not
+obvious that it's actually faster to decode the same overlay 70-90 times
+(say) (ie. ca. 3 seconds of video frames) and then blend it 70-90 times
+instead of decoding it once into a temporary buffer and then blending it
+directly from there, possibly SIMD-accelerated. Also, this is only
+relevant if the video is raw video and not some hardware-acceleration
+backend object.
+
+And ultimately it is the overlay element that decides whether to do the
+overlay right there and then or have the sink do it (if supported). It
+could decide to keep doing the overlay itself for raw video and only use
+our new API for non-raw video.
+
+  - renderers may want to make sure they only upload the overlay pixels
+    once per rectangle if that rectangle recurs in subsequent frames (as
+    part of the same composition or a different composition), as is
+    likely. This caching of e.g. surfaces needs to be done renderer-side
+    and can be accomplished based on the sequence numbers. The
+    composition contains the lowest sequence number still in use
+    upstream (an overlay element may want to cache created
+    compositions+rectangles as well after all to re-use them for
+    multiple frames), based on that the renderer can expire cached
+    objects. The caching needs to be done renderer-side because
+    attaching renderer-specific objects to the rectangles won't work
+    well given the refcounted nature of rectangles and compositions,
+    making it unpredictable when a rectangle or composition will be
+    freed or from which thread context it will be freed. The
+    renderer-specific objects are likely bound to other types of
+    renderer-specific contexts, and need to be managed in connection
+    with those.
+
+  - composition/rectangles should internally provide a certain degree of
+    thread-safety. Multiple elements (sinks, overlay element) might
+    access or use the same objects from multiple threads at the same
+    time, and it is expected that elements will keep a ref to
+    compositions and rectangles they push downstream for a while, e.g.
+    until the current subtitle composition expires.
+
+## Future considerations
+
+  - alternatives: there may be multiple versions/variants of the same
+    subtitle stream. On DVDs, there may be a 4:3 version and a 16:9
+    version of the same subtitles. We could attach both variants and let
+    the renderer pick the best one for the situation (currently we just
+    use the 16:9 version). With totem, it's ultimately totem that adds
+    the 'black bars' at the top/bottom, so totem also knows if it's got
+    a 4:3 display and can/wants to fit 4:3 subs (which may render on top
+    of the bars) or not, for example.
+
+## Misc. FIXMEs
+
+TEST: should these look (roughly) alike (note text distortion) - needs
+fixing in textoverlay
+
+    gst-launch-1.0 \
+       videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 \
+         ! textoverlay text=Hello font-desc=72 ! xvimagesink \
+       videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 \
+         ! textoverlay text=Hello font-desc=72 ! xvimagesink \
+       videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 \
+         ! textoverlay text=Hello font-desc=72 ! xvimagesink
--- a/sitemap.txt
+++ b/sitemap.txt
@ -141,6 +141,7 @@ index.md
 		design/MT-refcounting.md
 		design/TODO.md
 		design/activation.md
+		design/audiosinks.md
 		design/buffer.md
 		design/buffering.md
 		design/bufferpool.md
@ -149,10 +150,12 @@ index.md
 		design/context.md
 		design/controller.md
 		design/conventions.md
+		design/decodebin.md
 		design/dynamic.md
 		design/element-sink.md
 		design/element-source.md
 		design/element-transform.md
+		design/encoding.md
 		design/events.md
 		design/framestep.md
 		design/gstbin.md
@ -162,8 +165,13 @@ index.md
 		design/gstobject.md
 		design/gstpipeline.md
 		design/draft-klass.md
+		design/interlaced-video.md
+		design/keyframe-force.md
 		design/latency.md
 		design/live-source.md
+		design/mediatype-audio-raw.md
+		design/mediatype-text-raw.md
+		design/mediatype-video-raw.md
 		design/memory.md
 		design/messages.md
 		design/meta.md
@ -171,7 +179,9 @@ index.md
 		design/miniobject.md
 		design/missing-plugins.md
 		design/negotiation.md
+		design/orc-integration.md
 		design/overview.md
+		design/playbin.md
 		design/preroll.md
 		design/probes.md
 		design/progress.md
@ -186,9 +196,11 @@ index.md
 		design/sparsestreams.md
 		design/standards.md
 		design/states.md
+		design/stereo-multiview-video.md
 		design/stream-selection.md
 		design/stream-status.md
 		design/streams.md
+		design/subtitle-overlays.md
 		design/synchronisation.md
 		design/draft-tagreading.md
 		design/toc.md