docs: design: move most design docs to gst-docs module

2025-04-26 06:54:49 +00:00 · 2016-12-08 22:59:58 +00:00 · 2016-12-08 22:59:58 +00:00 · 46138b1b1d
commit 46138b1b1d
parent 49653b058a
13 changed files with 1 additions and 3652 deletions
--- a/docs/design/Makefile.am
+++ b/docs/design/Makefile.am
@ -2,16 +2,5 @@ SUBDIRS =
 EXTRA_DIST = \
 	design-audiosinks.txt      \
 	design-decodebin.txt       \
 	design-encoding.txt        \
 	design-orc-integration.txt \
 	draft-hw-acceleration.txt  \
-	draft-keyframe-force.txt   \
+	draft-va.txt
 	draft-subtitle-overlays.txt\
 	draft-va.txt               \
 	part-interlaced-video.txt  \
 	part-mediatype-audio-raw.txt\
 	part-mediatype-text-raw.txt\
 	part-mediatype-video-raw.txt\
 	part-playbin.txt
--- a/docs/design/design-audiosinks.txt
+++ b/docs/design/design-audiosinks.txt
@ -1,138 +0,0 @@
 Audiosink design
 ----------------
 Requirements:
 - must operate chain based.
   Most simple playback pipelines will push audio from the decoders
   into the audio sink.
 - must operate getrange based
   Most professional audio applications will operate in a mode where
   the audio sink pulls samples from the pipeline. This is typically
   done in a callback from the audiosink requesting N samples. The
   callback is either scheduled from a thread or from an interrupt
   from the audio hardware device. 
 - Exact sample accurate clocks.
   the audiosink must be able to provide a clock that is sample 
   accurate even if samples are dropped or when discontinuities are
   found in the stream.
 - Exact timing of playback.
   The audiosink must be able to play samples at their exact times.
 - use DMA access when possible.
   When the hardware can do DMA we should use it. This should also
   work over bufferpools to avoid data copying to/from kernel space.
 Design:
 The design is based on a set of base classes and the concept of a
 ringbuffer of samples.
   +-----------+   - provide preroll, rendering, timing
   + basesink  +   - caps nego
   +-----+-----+
         |
   +-----V----------+   - manages ringbuffer
   + audiobasesink  +   - manages scheduling (push/pull)
   +-----+----------+   - manages clock/query/seek
         |              - manages scheduling of samples in the ringbuffer
         |              - manages caps parsing
         |
   +-----V------+   - default ringbuffer implementation with a GThread
   + audiosink  +   - subclasses provide open/read/close methods
   +------------+
  The ringbuffer is a contiguous piece of memory divided into segtotal
  pieces of segments. Each segment has segsize bytes.
         play position 
           v          
   +---+---+---+-------------------------------------+----------+
   + 0 | 1 | 2 | ....                                | segtotal |
   +---+---+---+-------------------------------------+----------+
   <--->
     segsize bytes = N samples * bytes_per_sample.
  The ringbuffer has a play position, which is expressed in
  segments. The play position is where the device is currently reading
  samples from the buffer.
  The ringbuffer can be put to the PLAYING or STOPPED state. 
  In the STOPPED state no samples are played to the device and the play
  pointer does not advance. 
  In the PLAYING state samples are written to the device and the ringbuffer 
  should call a configurable callback after each segment is written to the
  device. In this state the play pointer is advanced after each segment is
  written.
  A write operation to the ringbuffer will put new samples in the ringbuffer.
  If there is not enough space in the ringbuffer, the write operation will 
  block.  The playback of the buffer never stops, even if the buffer is 
  empty. When the buffer is empty, silence is played by the device.
  The ringbuffer is implemented with lockfree atomic operations, especially
  on the reading side so that low-latency operations are possible.
  Whenever new samples are to be put into the ringbuffer, the position of the
  read pointer is taken. The required write position is taken and the diff
  is made between the required and actual position. If the difference is <0,
  the sample is too late. If the difference is bigger than segtotal, the
  writing part has to wait for the play pointer to advance. 
 Scheduling:
  - chain based mode:
   In chain based mode, bytes are written into the ringbuffer. This operation
   will eventually block when the ringbuffer is filled. 
   When no samples arrive in time, the ringbuffer will play silence. Each 
   buffer that arrives will be placed into the ringbuffer at the correct 
   times. This means that dropping samples or inserting silence is done
   automatically and very accurate and independend of the play pointer.
   In this mode, the ringbuffer is usually kept as full as possible. When 
   using a small buffer (small segsize and segtotal), the latency for audio 
   to start from the sink to when it is played can be kept low but at least
   one context switch has to be made between read and write.
  - getrange based mode
    In getrange based mode, the audiobasesink will use the callback function
    of the ringbuffer to get a segsize samples from the peer element. These
    samples will then be placed in the ringbuffer at the next play position.
    It is assumed that the getrange function returns fast enough to fill the
    ringbuffer before the play pointer reaches the write pointer.
    In this mode, the ringbuffer is usually kept as empty as possible. There
    is no context switch needed between the elements that create the samples
    and the actual writing of the samples to the device.
 DMA mode:
  - Elements that can do DMA based access to the audio device have to subclass
    from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
    of GstRingBuffer.
    The ringbuffer subclass should trigger a callback after writing or playing
    each sample to the device. This callback can be triggered from a thread or
    from a signal from the audio device. 
 Clocks:
   The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
   It can do this by using the play pointer and the delay to calculate the
   clock time.
--- a/docs/design/design-decodebin.txt
+++ b/docs/design/design-decodebin.txt
@ -1,274 +0,0 @@
 Decodebin design
 GstDecodeBin
 ------------
 Description:
  Autoplug and decode to raw media
  Input : single pad with ANY caps Output : Dynamic pads
 * Contents
  _ a GstTypeFindElement connected to the single sink pad
  _ optionally a demuxer/parser
  _ optionally one or more DecodeGroup
 * Autoplugging
  The goal is to reach 'target' caps (by default raw media).
  This is done by using the GstCaps of a source pad and finding the available
 demuxers/decoders GstElement that can be linked to that pad.
  The process starts with the source pad of typefind and stops when no more
 non-target caps are left. It is commonly done while pre-rolling, but can also
 happen whenever a new pad appears on any element.
  Once a target caps has been found, that pad is ghosted and the
 'pad-added' signal is emitted.
  If no compatible elements can be found for a GstCaps, the pad is ghosted and
 the 'unknown-type' signal is emitted.
 * Assisted auto-plugging
  When starting the auto-plugging process for a given GstCaps, two signals are
 emitted in the following way in order to allow the application/user to assist or
 fine-tune the process.
  _ 'autoplug-continue' :
    gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
    This signal is fired at the very beginning with the source pad GstCaps. If
  the callback returns TRUE, the process continues normally. If the callback
  returns FALSE, then the GstCaps are considered as a target caps and the
  autoplugging process stops.
  - 'autoplug-factories' :
    GValueArray user_function (GstElement* decodebin, GstPad* pad, 
         GstCaps* caps);
    Get a list of elementfactories for @pad with @caps. This function is used to
    instruct decodebin2 of the elements it should try to autoplug. The default
    behaviour when this function is not overriden is to get all elements that
    can handle @caps from the registry sorted by rank.
  - 'autoplug-select' :
    gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
                GValueArray* factories);
    This signal is fired once autoplugging has got a list of compatible
  GstElementFactory. The signal is emitted with the GstCaps of the source pad
  and a pointer on the GValueArray of compatible factories.
    The callback should return the index of the elementfactory in @factories
    that should be tried next.
    If the callback returns -1, the autoplugging process will stop as if no
  compatible factories were found.
  The default implementation of this function will try to autoplug the first
  factory of the list.
 * Target Caps
  The target caps are a read/write GObject property of decodebin.
  By default the target caps are:
  _ Raw audio : audio/x-raw
  _ and raw video : video/x-raw
  _ and Text : text/plain, text/x-pango-markup
 * media chain/group handling
  When autoplugging, all streams coming out of a demuxer will be grouped in a
 DecodeGroup.
  All new source pads created on that demuxer after it has emitted the
 'no-more-pads' signal will be put in another DecodeGroup.
  Only one decodegroup can be active at any given time. If a new decodegroup is
 created while another one exists, that decodegroup will be set as blocking until
 the existing one has drained.
 DecodeGroup
 -----------
 Description:
  Streams belonging to the same group/chain of a media file.
 * Contents
  The DecodeGroup contains:
  _ a GstMultiQueue to which all streams of a the media group are connected.
  _ the eventual decoders which are autoplugged in order to produce the
  requested target pads.
 * Proper group draining
  The DecodeGroup takes care that all the streams in the group are completely
 drained (EOS has come through all source ghost pads).
 * Pre-roll and block
  The DecodeGroup has a global blocking feature. If enabled, all the ghosted
 source pads for that group will be blocked.
  A method is available to unblock all blocked pads for that group.
 GstMultiQueue
 -------------
 Description:
  Multiple input-output data queue
  The GstMultiQueue achieves the same functionality as GstQueue, with a few
 differences:
  * Multiple streams handling.
    The element handles queueing data on more than one stream at once. To
  achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src
  pads (src_%u).
    When requesting a given sinkpad, the associated srcpad for that stream will
  be created. Ex: requesting sink_1 will generate src_1.
  * Non-starvation on multiple streams.
    If more than one stream is used with the element, the streams' queues will
  be dynamically grown (up to a limit), in order to ensure that no stream is
  risking data starvation. This guarantees that at any given time there are at
  least N bytes queued and available for each individual stream.
    If an EOS event comes through a srcpad, the associated queue should be
  considered as 'not-empty' in the queue-size-growing algorithm.
  * Non-linked srcpads graceful handling.
    A GstTask is started for all srcpads when going to GST_STATE_PAUSED.
    The task are blocking against a GCondition which will be fired in two
  different cases:
    _ When the associated queue has received a buffer.
    _ When the associated queue was previously declared as 'not-linked' and the
    first buffer of the queue is scheduled to be pushed synchronously in
    relation to the order in which it arrived globally in the element (see
    'Synchronous data pushing' below).
    When woken up by the GCondition, the GstTask will try to push the next
  GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
  GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
  pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
  'not-linked'.
    If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
  then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
  return GST_FLOW_NOT_LINKED.
  * Synchronous data pushing for non-linked pads.
    In order to better support dynamic switching between streams, the multiqueue
  (unlike the current GStreamer queue) continues to push buffers on non-linked
  pads rather than shutting down. 
    In addition, to prevent a non-linked stream from very quickly consuming all
  available buffers and thus 'racing ahead' of the other streams, the element
  must ensure that buffers and inlined events for a non-linked stream are pushed
  in the same order as they were received, relative to the other streams
  controlled by the element. This means that a buffer cannot be pushed to a
  non-linked pad any sooner than buffers in any other stream which were received
  before it.
 =====================================
 Parsers, decoders and auto-plugging
 =====================================
 This section has DRAFT status.
 Some media formats come in different "flavours" or "stream formats". These
 formats differ in the way the setup data and media data is signalled and/or
 packaged. An example for this is H.264 video, where there is a bytestream
 format (with codec setup data signalled inline and units prefixed by a sync
 code and packet length information) and a "raw" format where codec setup
 data is signalled out of band (via the caps) and the chunking is implicit
 in the way the buffers were muxed into a container, to mention just two of
 the possible variants.
 Especially on embedded platforms it is common that decoders can only
 handle one particular stream format, and not all of them.
 Where there are multiple stream formats, parsers are usually expected
 to be able to convert between the different formats. This will, if
 implemented correctly, work as expected in a static pipeline such as
   ... ! parser ! decoder ! sink
 where the parser can query the decoder's capabilities even before
 processing the first piece of data, and configure itself to convert
 accordingly, if conversion is needed at all.
 In an auto-plugging context this is not so straight-forward though,
 because elements are plugged incrementally and not before the previous
 element has processes some data and decided what it will output exactly
 (unless the template caps are completely fixed, then it can continue
 right away, this is not always the case here though, see below). A
 parser will thus have to decide on *some* output format so auto-plugging
 can continue. It doesn't know anything about the available decoders and
 their capabilities though, so it's possible that it will choose a format
 that is not supported by any of the available decoders, or by the preferred
 decoder.
 If the parser had sufficiently concise but fixed source pad template caps,
 decodebin could continue to plug a decoder right away, allowing the
 parser to configure itself in the same way as it would with a static
 pipeline. This is not an option, unfortunately, because often the
 parser needs to process some data to determine e.g. the format's profile or
 other stream properties (resolution, sample rate, channel configuration, etc.),
 and there may be different decoders for different profiles (e.g. DSP codec
 for baseline profile, and software fallback for main/high profile; or a DSP
 codec only supporting certain resolutions, with a software fallback for
 unusual resolutions). So if decodebin just plugged the most highest-ranking
 decoder, that decoder might not be be able to handle the actual stream later
 on, which would yield an error (this is a data flow error then which would
 be hard to intercept and avoid in decodebin). In other words, we can't solve
 this issue by plugging a decoder right away with the parser.
 So decodebin needs to communicate to the parser the set of available decoder
 caps (which would contain the relevant capabilities/restrictions such as
 supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
 filtering/sorting of course.
 This is done by plugging a capsfilter element right after the parser, and
 constructing set of filter caps from the list of available decoders (one
 appends at the end just the name(s) of the caps structures from the parser
 pad template caps to function as an 'ANY other' caps equivalent). This let
 the parser negotiate to a supported stream format in the same way as with
 the static pipeline mentioned above, but of course incur some overhead
 through the additional capsfilter element.
--- a/docs/design/design-encoding.txt
+++ b/docs/design/design-encoding.txt
@ -1,571 +0,0 @@
 Encoding and Muxing
 -------------------
 Summary
 -------
 A. Problems
 B. Goals
 1. EncodeBin
 2. Encoding Profile System
 3. Helper Library for Profiles
 I. Use-cases researched
 A. Problems this proposal attempts to solve
 -------------------------------------------
 * Duplication of pipeline code for gstreamer-based applications
  wishing to encode and or mux streams, leading to subtle differences
  and inconsistencies across those applications.
 * No unified system for describing encoding targets for applications
  in a user-friendly way.
 * No unified system for creating encoding targets for applications,
  resulting in duplication of code across all applications,
  differences and inconsistencies that come with that duplication,
  and applications hardcoding element names and settings resulting in
  poor portability.
 B. Goals
 --------
 1. Convenience encoding element
  Create a convenience GstBin for encoding and muxing several streams,
  hereafter called 'EncodeBin'.
  This element will only contain one single property, which is a
  profile.
 2. Define a encoding profile system
 2. Encoding profile helper library
  Create a helper library to:
  * create EncodeBin instances based on profiles, and
  * help applications to create/load/save/browse those profiles.
 1. EncodeBin
 ------------
 1.1 Proposed API
 ----------------
  EncodeBin is a GstBin subclass.
  It implements the GstTagSetter interface, by which it will proxy the
  calls to the muxer.
  Only two introspectable property (i.e. usable without extra API):
  * A GstEncodingProfile*
  * The name of the profile to use
  When a profile is selected, encodebin will:
  * Add REQUEST sinkpads for all the GstStreamProfile
  * Create the muxer and expose the source pad
  Whenever a request pad is created, encodebin will:
  * Create the chain of elements for that pad
  * Ghost the sink pad
  * Return that ghost pad
  This allows reducing the code to the minimum for applications
  wishing to encode a source for a given profile:
  ...
  encbin = gst_element_factory_make("encodebin, NULL);
  g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
  gst_element_link (encbin, filesink);
  ...
  vsrcpad = gst_element_get_src_pad(source, "src1");
  vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
  gst_pad_link(vsrcpad, vsinkpad);
  ...
 1.2 Explanation of the Various stages in EncodeBin
 --------------------------------------------------
  This describes the various stages which can happen in order to end
  up with a multiplexed stream that can then be stored or streamed.
 1.2.1 Incoming streams
  The streams fed to EncodeBin can be of various types:
  * Video
   * Uncompressed (but maybe subsampled)
   * Compressed
  * Audio
   * Uncompressed (audio/x-raw)
   * Compressed
  * Timed text
  * Private streams
 1.2.2 Steps involved for raw video encoding
 (0) Incoming Stream
 (1) Transform raw video feed (optional)
 Here we modify the various fundamental properties of a raw video
 stream to be compatible with the intersection of:
  * The encoder GstCaps and
  * The specified "Stream Restriction" of the profile/target
 The fundamental properties that can be modified are:
  * width/height
    This is done with a video scaler.
    The DAR (Display Aspect Ratio) MUST be respected.
    If needed, black borders can be added to comply with the target DAR.
  * framerate
  * format/colorspace/depth
    All of this is done with a colorspace converter
 (2) Actual encoding (optional for raw streams)
 An encoder (with some optional settings) is used.
 (3) Muxing
 A muxer (with some optional settings) is used.
 (4) Outgoing encoded and muxed stream
 1.2.3 Steps involved for raw audio encoding
 This is roughly the same as for raw video, expect for (1)
 (1) Transform raw audo feed (optional)
 We modify the various fundamental properties of a raw audio stream to
 be compatible with the intersection of:
  * The encoder GstCaps and
  * The specified "Stream Restriction" of the profile/target
 The fundamental properties that can be modifier are:
 * Number of channels
 * Type of raw audio (integer or floating point)
 * Depth (number of bits required to encode one sample)
 1.2.4 Steps involved for encoded audio/video streams
 Steps (1) and (2) are replaced by a parser if a parser is available
 for the given format.
 1.2.5 Steps involved for other streams
 Other streams will just be forwarded as-is to the muxer, provided the
 muxer accepts the stream type.
 2. Encoding Profile System
 --------------------------
 This work is based on:
 * The existing GstPreset system for elements [0]
 * The gnome-media GConf audio profile system [1]
 * The investigation done into device profiles by Arista and
 Transmageddon [2 and 3]
 2.2 Terminology
 ---------------
 * Encoding Target Category
  A Target Category is a classification of devices/systems/use-cases
  for encoding.
  Such a classification is required in order for:
  * Applications with a very-specific use-case to limit the number of
    profiles they can offer the user. A screencasting application has
    no use with the online services targets for example. 
  * Offering the user some initial classification in the case of a
    more generic encoding application (like a video editor or a
    transcoder). 
  Ex:
   Consumer devices
   Online service
   Intermediate Editing Format
   Screencast
   Capture
   Computer
 * Encoding Profile Target
  A Profile Target describes a specific entity for which we wish to
  encode.
  A Profile Target must belong to at least one Target Category.
  It will define at least one Encoding Profile.
  Ex (with category):
   Nokia N900 (Consumer device)
   Sony PlayStation 3 (Consumer device)
   Youtube (Online service)
   DNxHD (Intermediate editing format)
   HuffYUV (Screencast)
   Theora (Computer)
 * Encoding Profile
  A specific combination of muxer, encoders, presets and limitations.
  Ex:
   Nokia N900/H264 HQ
   Ipod/High Quality
   DVD/Pal
   Youtube/High Quality
   HTML5/Low Bandwith
   DNxHD
 2.3 Encoding Profile
 --------------------
 An encoding profile requires the following information:
 * Name
   This string is not translatable and must be unique.
   A recommendation to guarantee uniqueness of the naming could be:
      <target>/<name>
 * Description
   This is a translatable string describing the profile
 * Muxing format
   This is a string containing the GStreamer media-type of the
   container format.
 * Muxing preset
   This is an optional string describing the preset(s) to use on the
   muxer.
 * Multipass setting
   This is a boolean describing whether the profile requires several
   passes.
 * List of Stream Profile
 2.3.1 Stream Profiles
 A Stream Profile consists of:
 * Type
   The type of stream profile (audio, video, text, private-data)
 * Encoding Format
   This is a string containing the GStreamer media-type of the encoding
   format to be used. If encoding is not to be applied, the raw audio
   media type will be used.
 * Encoding preset
   This is an optional string describing the preset(s) to use on the
   encoder.
 * Restriction
   This is an optional GstCaps containing the restriction of the
   stream that can be fed to the encoder.
   This will generally containing restrictions in video
   width/heigh/framerate or audio depth.
 * presence
   This is an integer specifying how many streams can be used in the
   containing profile. 0 means that any number of streams can be
   used.
 * pass
   This is an integer which is only meaningful if the multipass flag
   has been set in the profile. If it has been set it indicates which
   pass this Stream Profile corresponds to.
 2.4 Example profile
 -------------------
 The representation used here is XML only as an example. No decision is
 made as to which formatting to use for storing targets and profiles.
 <gst-encoding-target>
  <name>Nokia N900</name>
  <category>Consumer Device</category>
  <profiles>
    <profile>Nokia N900/H264 HQ</profile>
    <profile>Nokia N900/MP3</profile>
    <profile>Nokia N900/AAC</profile>
  </profiles>
 </gst-encoding-target>
 <gst-encoding-profile>
  <name>Nokia N900/H264 HQ</name>
  <description>
    High Quality H264/AAC for the Nokia N900
  </description>
  <format>video/quicktime,variant=iso</format>
  <streams>
    <stream-profile>
      <type>audio</type>
      <format>audio/mpeg,mpegversion=4</format>
      <preset>Quality High/Main</preset>
      <restriction>audio/x-raw,channels=[1,2]</restriction>
      <presence>1</presence>
    </stream-profile>
    <stream-profile>
      <type>video</type>
      <format>video/x-h264</format>
      <preset>Profile Baseline/Quality High</preset>
      <restriction>
        video/x-raw,width=[16, 800],\
 	height=[16, 480],framerate=[1/1, 30000/1001]
      </restriction>
      <presence>1</presence>
    </stream-profile>
  </streams>
 </gst-encoding-profile>
 2.5 API
 -------
  A proposed C API is contained in the gstprofile.h file in this directory.
 2.6 Modifications required in the existing GstPreset system
 -----------------------------------------------------------
 2.6.1. Temporary preset.
  Currently a preset needs to be saved on disk in order to be
  used.
  This makes it impossible to have temporary presets (that exist only
  during the lifetime of a process), which might be required in the
  new proposed profile system
 2.6.2 Categorisation of presets.
  Currently presets are just aliases of a group of property/value
  without any meanings or explanation as to how they exclude each
  other.
  Take for example the H264 encoder. It can have presets for:
  * passes (1,2 or 3 passes)
  * profiles (Baseline, Main, ...)
  * quality (Low, medium, High)
  In order to programmatically know which presets exclude each other,
  we here propose the categorisation of these presets.
  This can be done in one of two ways
  1. in the name (by making the name be [<category>:]<name>)
    This would give for example: "Quality:High", "Profile:Baseline"
  2. by adding a new _meta key
    This would give for example: _meta/category:quality
 2.6.3 Aggregation of presets.
  There can be more than one choice of presets to be done for an
  element (quality, profile, pass).
  This means that one can not currently describe the full
  configuration of an element with a single string but with many.
  The proposal here is to extend the GstPreset API to be able to set
  all presets using one string and a well-known separator ('/').
  This change only requires changes in the core preset handling code.
  This would allow doing the following:
  gst_preset_load_preset (h264enc,
                          "pass:1/profile:baseline/quality:high");
 2.7 Points to be determined
 ---------------------------
  This document hasn't determined yet how to solve the following
  problems:
 2.7.1 Storage of profiles
  One proposal for storage would be to use a system wide directory
  (like $prefix/share/gstreamer-0.10/profiles) and store XML files for
  every individual profiles.
  Users could then add their own profiles in ~/.gstreamer-0.10/profiles
  This poses some limitations as to what to do if some applications
  want to have some profiles limited to their own usage.
 3. Helper library for profiles
 ------------------------------
 These helper methods could also be added to existing libraries (like
 GstPreset, GstPbUtils, ..).
 The various API proposed are in the accompanying gstprofile.h file.
 3.1 Getting user-readable names for formats
 This is already provided by GstPbUtils.
 3.2 Hierarchy of profiles
 The goal is for applications to be able to present to the user a list
 of combo-boxes for choosing their output profile:
 [      Category      ]       # optional, depends on the application
 [    Device/Site/..  ]       # optional, depends on the application
 [      Profile       ]
 Convenience methods are offered to easily get lists of categories,
 devices, and profiles.
 3.3 Creating Profiles
 The goal is for applications to be able to easily create profiles.
 The applications needs to be able to have a fast/efficient way to:
 * select a container format and see all compatible streams he can use
 with it.
 * select a codec format and see which container formats he can use
 with it.
 The remaining parts concern the restrictions to encoder
 input.
 3.4 Ensuring availability of plugins for Profiles
 When an application wishes to use a Profile, it should be able to
 query whether it has all the needed plugins to use it.
 This part will use GstPbUtils to query, and if needed install the
 missing plugins through the installed distribution plugin installer.
 I. Use-cases researched
 -----------------------
 This is a list of various use-cases where encoding/muxing is being
 used.
 * Transcoding
  The goal is to convert with as minimal loss of quality any input
  file for a target use.
  A specific variant of this is transmuxing (see below).
  Example applications: Arista, Transmageddon
 * Rendering timelines
  The incoming streams are a collection of various segments that need
  to be rendered.
  Those segments can vary in nature (i.e. the video width/height can
  change).
  This requires the use of identiy with the single-segment property
  activated to transform the incoming collection of segments to a
  single continuous segment.
  Example applications: PiTiVi, Jokosher
 * Encoding of live sources
  The major risk to take into account is the encoder not encoding the
  incoming stream fast enough. This is outside of the scope of
  encodebin, and should be solved by using queues between the sources
  and encodebin, as well as implementing QoS in encoders and sources
  (the encoders emitting QoS events, and the upstream elements
  adapting themselves accordingly).
  Example applications: camerabin, cheese
 * Screencasting applications
  This is similar to encoding of live sources.
  The difference being that due to the nature of the source (size and
  amount/frequency of updates) one might want to do the encoding in
  two parts:
  * The actual live capture is encoded with a 'almost-lossless' codec
  (such as huffyuv)
  * Once the capture is done, the file created in the first step is
  then rendered to the desired target format.
  Fixing sources to only emit region-updates and having encoders
  capable of encoding those streams would fix the need for the first
  step but is outside of the scope of encodebin.
  Example applications: Istanbul, gnome-shell, recordmydesktop
 * Live transcoding
  This is the case of an incoming live stream which will be
  broadcasted/transmitted live.
  One issue to take into account is to reduce the encoding latency to
  a minimum. This should mostly be done by picking low-latency
  encoders.
  Example applications: Rygel, Coherence
 * Transmuxing
  Given a certain file, the aim is to remux the contents WITHOUT
  decoding into either a different container format or the same
  container format.
  Remuxing into the same container format is useful when the file was
  not created properly (for example, the index is missing).
  Whenever available, parsers should be applied on the encoded streams
  to validate and/or fix the streams before muxing them.
  Metadata from the original file must be kept in the newly created
  file.
  Example applications: Arista, Transmaggedon
 * Loss-less cutting
  Given a certain file, the aim is to extract a certain part of the
  file without going through the process of decoding and re-encoding
  that file.
  This is similar to the transmuxing use-case.
  Example applications: PiTiVi, Transmageddon, Arista, ...
 * Multi-pass encoding
  Some encoders allow doing a multi-pass encoding.
  The initial pass(es) are only used to collect encoding estimates and
  are not actually muxed and outputted.
  The final pass uses previously collected information, and the output
  is then muxed and outputted.
 * Archiving and intermediary format
  The requirement is to have lossless
 * CD ripping
  Example applications: Sound-juicer
 * DVD ripping
  Example application: Thoggen
 * Research links
  Some of these are still active documents, some other not
 [0] GstPreset API documentation
    http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
 [1] gnome-media GConf profiles
    http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
 [2] Research on a Device Profile API
    http://gstreamer.freedesktop.org/wiki/DeviceProfile
 [3] Research on defining presets usage
    http://gstreamer.freedesktop.org/wiki/PresetDesign
--- a/docs/design/design-orc-integration.txt
+++ b/docs/design/design-orc-integration.txt
@ -1,204 +0,0 @@
 Orc Integration
 ===============
 Sections
 --------
 - About Orc
 - Fast memcpy()
 - Normal Usage
 - Build Process
 - Testing
 - Orc Limitations
 About Orc
 ---------
 Orc code can be in one of two forms: in .orc files that is converted
 by orcc to C code that calls liborc functions, or C code that calls
 liborc to create complex operations at runtime.  The former is mostly
 for functions with predetermined functionality.  The latter is for
 functionality that is determined at runtime, where writing .orc
 functions for all combinations would be prohibitive.  Orc also has
 a fast memcpy and memset which are useful independently.
 Fast memcpy()
 -------------
 *** This part is not integrated yet. ***
 Orc has built-in functions orc_memcpy() and orc_memset() that work
 like memcpy() and memset().  These are meant for large copies only.
 A reasonable cutoff for using orc_memcpy() instead of memcpy() is
 if the number of bytes is generally greater than 100.  DO NOT use
 orc_memcpy() if the typical is size is less than 20 bytes, especially
 if the size is known at compile time, as these cases are inlined by
 the compiler.
 (Example: sys/ximage/ximagesink.c)
 Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to
 libgstximagesink_la_LIBADD.  Then, in the source file, add:
  #ifdef HAVE_ORC
  #include <orc/orc.h>
  #else
  #define orc_memcpy(a,b,c) memcpy(a,b,c)
  #endif
 Then switch relevant uses of memcpy() to orc_memcpy().
 The above example works whether or not Orc is enabled at compile
 time.
 Normal Usage
 ------------
 The following lines are added near the top of Makefile.am for plugins
 that use Orc code in .orc files (this is for the volume plugin):
  ORC_BASE=volume
  include $(top_srcdir)/common/orc.mk
 Also add the generated source file to the plugin build:
  nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES)
 And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and
 $(ORC_LIBS) to libgstvolume_la_LIBADD.
 The value assigned to ORC_BASE does not need to be related to
 the name of the plugin.
 Advanced Usage
 --------------
 The Holy Grail of Orc usage is to programmatically generate Orc code
 at runtime, have liborc compile it into binary code at runtime, and
 then execute this code.  Currently, the best example of this is in
 Schroedinger.  An example of how this would be used is audioconvert:
 given an input format, channel position manipulation, dithering and
 quantizing configuration, and output format, a Orc code generator
 would create an OrcProgram, add the appropriate instructions to do
 each step based on the configuration, and then compile the program.
 Successfully compiling the program would return a function pointer
 that can be called to perform the operation.
 This sort of advanced usage requires structural changes to current
 plugins (e.g., audioconvert) and will probably be developed
 incrementally.  Moreover, if such code is intended to be used without
 Orc as strict build/runtime requirement, two codepaths would need to
 be developed and tested.  For this reason, until GStreamer requires
 Orc, I think it's a good idea to restrict such advanced usage to the
 cog plugin in -bad, which requires Orc.
 Build Process
 -------------
 The goal of the build process is to make Orc non-essential for most
 developers and users.  This is not to say you shouldn't have Orc
 installed -- without it, you will get slow backup C code, just that
 people compiling GStreamer are not forced to switch from Liboil to
 Orc immediately.
 With Orc installed, the build process will use the Orc Compiler (orcc)
 to convert each .orc file into a temporary C source (tmp-orc.c) and a
 temporary header file (${name}orc.h if constructed from ${base}.orc).
 The C source file is compiled and linked to the plugin, and the header
 file is included by other source files in the plugin.
 If 'make orc-update' is run in the source directory, the files
 tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and
 ${base}orc-dist.h respectively.  The -dist.[ch] files are automatically
 disted via orc.mk.  The -dist.[ch] files should be checked in to
 git whenever the .orc source is changed and checked in.  Example
 workflow:
  edit .orc file
  ... make, test, etc.
  make orc-update
  git add volume.orc volumeorc-dist.c volumeorc-dist.h
  git commit
 At 'make dist' time, all of the .orc files are compiled, and then
 copied to their -dist.[ch] counterparts, and then the -dist.[ch]
 files are added to the dist directory.
 Without Orc installed (or --disable-orc given to configure), the
 -dist.[ch] files are copied to tmp-orc.c and ${name}orc.h.  When
 compiled Orc disabled, DISABLE_ORC is defined in config.h, and
 the C backup code is compiled.  This backup code is pure C, and
 does not include orc headers or require linking against liborc.
 The common/orc.mk build method is limited by the inflexibility of
 automake.  The file tmp-orc.c must be a fixed filename, using ORC_NAME
 to generate the filename does not work because it conflicts with
 automake's dependency generation.  Building multiple .orc files
 is not possible due to this restriction.
 Testing
 -------
 If you create another .orc file, please add it to
 tests/orc/Makefile.am.  This causes automatic test code to be
 generated and run during 'make check'.  Each function in the .orc
 file is tested by comparing the results of executing the run-time
 compiled code and the C backup function.
 Orc Limitations
 ---------------
 audioconvert
  Orc doesn't have a mechanism for generating random numbers, which
  prevents its use as-is for dithering.  One way around this is to
  generate suitable dithering values in one pass, then use those
  values in a second Orc-based pass.
  Orc doesn't handle 64-bit float, for no good reason.
  Irrespective of Orc handling 64-bit float, it would be useful to
  have a direct 32-bit float to 16-bit integer conversion.
  audioconvert is a good candidate for programmatically generated
  Orc code.
  audioconvert enumerates functions in terms of big-endian vs.
  little-endian.  Orc's functions are "native" and "swapped".
  Programmatically generating code removes the need to worry about
  this.
  Orc doesn't handle 24-bit samples.  Fixing this is not a priority
  (for ds).
 videoscale
  Orc doesn't handle horizontal resampling yet.  The plan is to add
  special sampling opcodes, for nearest, bilinear, and cubic
  interpolation.
 videotestsrc
  Lots of code in videotestsrc needs to be rewritten to be SIMD
  (and Orc) friendly, e.g., stuff that uses oil_splat_u8().
  A fast low-quality random number generator in Orc would be useful
  here.
 volume
  Many of the comments on audioconvert apply here as well.
  There are a bunch of FIXMEs in here that are due to misapplied
  patches.
--- a/docs/design/draft-keyframe-force.txt
+++ b/docs/design/draft-keyframe-force.txt
@ -1,91 +0,0 @@
 Forcing keyframes
 -----------------
 Consider the following use case:
  We have a pipeline that performs video and audio capture from a live source,
  compresses and muxes the streams and writes the resulting data into a file.
  Inside the uncompressed video data we have a specific pattern inserted at
  specific moments that should trigger a switch to a new file, meaning, we close
  the existing file we are writing to and start writing to a new file.
  We want the new file to start with a keyframe so that one can start decoding
  the file immediately.
 Components:
  1) We need an element that is able to detect the pattern in the video stream.
  2) We need to inform the video encoder that it should start encoding a keyframe
     starting from exactly the frame with the pattern.
  3) We need to inform the demuxer that it should flush out any pending data and
     start creating the start of a new file with the keyframe as a first video
     frame.
  4) We need to inform the sink element that it should start writing to the next
     file. This requires application interaction to instruct the sink of the new
     filename. The application should also be free to ignore the boundary and
     continue to write to the existing file. The application will typically use
     an event pad probe to detect the custom event.
 Implementation:
 The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM
 event that marks the keyframe boundary. This event is inserted into the
 pipeline by the application upon a certain trigger. In the above use case this
 trigger would be given by the element that detects the pattern, in the form of
 an element message.
 The custom event would travel further downstream to instruct encoder, muxer and
 sink about the possible switch.
 The information passed in the event consists of:
  name:  GstForceKeyUnit
 	 (G_TYPE_UINT64)"timestamp"    : the timestamp of the buffer that
 	                                 triggered the event.
 	 (G_TYPE_UINT64)"stream-time"  : the stream position that triggered the
 	                                 event.
 	 (G_TYPE_UINT64)"running-time" : the running time of the stream when the 
 	                                 event was triggered.
 	 (G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in
                                         the caps or those sent at the start of
                                         the stream.
 	 ....                          : optional other data fields.
  Note that this event is purely informational, no element is required to
  perform an action but it should forward the event downstream, just like any
  other event it does not handle.
  Elements understanding the event should behave as follows:
  1) The video encoder receives the event before the next frame. Upon reception
     of the event it schedules to encode the next frame as a keyframe. 
     Before pushing out the encoded keyframe it must push the GstForceKeyUnit
     event downstream.
  2) The muxer receives the GstForceKeyUnit event and flushes out its current state,
     preparing to produce data that can be used as a keyunit. Before pushing out
     the new data it pushes the GstForceKeyUnit event downstream.
  3) The application receives the GstForceKeyUnit on a sink padprobe of the sink
     and reconfigures the sink to make it perform new actions after receiving
     the next buffer. 
 Upstream
 --------
 When using RTP packets can get lost or receivers can be added at any time,
 they may request a new key frame.
 An downstream element sends an upstream "GstForceKeyUnit" event up the
 pipeline.
 When an element produces some kind of key unit in output, but has
 no such concept in its input (like an encoder that takes raw frames),
 it consumes the event (doesn't pass it upstream), and instead sends
 a downstream GstForceKeyUnit event and a new keyframe.
--- a/docs/design/draft-subtitle-overlays.txt
+++ b/docs/design/draft-subtitle-overlays.txt
@ -1,546 +0,0 @@
 ===============================================================
 Subtitle overlays, hardware-accelerated decoding and playbin
 ===============================================================
 Status: EARLY DRAFT / BRAINSTORMING
 === 1. Background ===
 Subtitles can be muxed in containers or come from an external source.
 Subtitles come in many shapes and colours. Usually they are either
 text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
 and the most common form of DVB subs). Bitmap based subtitles are
 usually compressed in some way, like some form of run-length encoding.
 Subtitles are currently decoded and rendered in subtitle-format-specific
 overlay elements. These elements have two sink pads (one for raw video
 and one for the subtitle format in question) and one raw video source pad.
 They will take care of synchronising the two input streams, and of
 decoding and rendering the subtitles on top of the raw video stream.
 Digression: one could theoretically have dedicated decoder/render elements
 that output an AYUV or ARGB image, and then let a videomixer element do
 the actual overlaying, but this is not very efficient, because it requires
 us to allocate and blend whole pictures (1920x1080 AYUV = 8MB,
 1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region
 is only a small rectangle at the bottom. This wastes memory and CPU.
 We could do something better by introducing a new format that only
 encodes the region(s) of interest, but we don't have such a format yet, and
 are not necessarily keen to rewrite this part of the logic in playbin
 at this point - and we can't change existing elements' behaviour, so would
 need to introduce new elements for this.
 Playbin2 supports outputting compressed formats, i.e. it does not
 force decoding to a raw format, but is happy to output to a non-raw
 format as long as the sink supports that as well.
 In case of certain hardware-accelerated decoding APIs, we will make use
 of that functionality. However, the decoder will not output a raw video
 format then, but some kind of hardware/API-specific format (in the caps)
 and the buffers will reference hardware/API-specific objects that
 the hardware/API-specific sink will know how to handle.
 === 2. The Problem ===
 In the case of such hardware-accelerated decoding, the decoder will not
 output raw pixels that can easily be manipulated. Instead, it will
 output hardware/API-specific objects that can later be used to render
 a frame using the same API.
 Even if we could transform such a buffer into raw pixels, we most
 likely would want to avoid that, in order to avoid the need to
 map the data back into system memory (and then later back to the GPU).
 It's much better to upload the much smaller encoded data to the GPU/DSP
 and then leave it there until rendered.
 Currently playbin only supports subtitles on top of raw decoded video.
 It will try to find a suitable overlay element from the plugin registry
 based on the input subtitle caps and the rank. (It is assumed that we
 will be able to convert any raw video format into any format required
 by the overlay using a converter such as videoconvert.)
 It will not render subtitles if the video sent to the sink is not
 raw YUV or RGB or if conversions have been disabled by setting the
 native-video flag on playbin.
 Subtitle rendering is considered an important feature. Enabling
 hardware-accelerated decoding by default should not lead to a major
 feature regression in this area.
 This means that we need to support subtitle rendering on top of
 non-raw video.
 === 3. Possible Solutions ===
 The goal is to keep knowledge of the subtitle format within the
 format-specific GStreamer plugins, and knowledge of any specific
 video acceleration API to the GStreamer plugins implementing
 that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate
 plugins link to libva/libvdpau/etc. and we do not want to make
 the vaapi/vdpau plugins link to all of libpango/libkate/libass etc.
 Multiple possible solutions come to mind:
  (a) backend-specific overlay elements
      e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
      vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
      This assumes the overlay can be done directly on the backend-specific
      object passed around.
      The main drawback with this solution is that it leads to a lot of
      code duplication and may also lead to uncertainty about distributing
      certain duplicated pieces of code. The code duplication is pretty
      much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
      kate, assrender, etc. available in form of base classes to derive
      from is not really an option. Similarly, one would not really want
      the vaapi/vdpau plugin to depend on a bunch of other libraries
      such as libpango, libkate, libtiger, libass, etc.
      One could add some new kind of overlay plugin feature though in
      combination with a generic base class of some sort, but in order
      to accommodate all the different cases and formats one would end
      up with quite convoluted/tricky API.
      (Of course there could also be a GstFancyVideoBuffer that provides
      an abstraction for such video accelerated objects and that could
      provide an API to add overlays to it in a generic way, but in the
      end this is just a less generic variant of (c), and it is not clear
      that there are real benefits to a specialised solution vs. a more
      generic one).
  (b) convert backend-specific object to raw pixels and then overlay
      Even where possible technically, this is most likely very
      inefficient.
  (c) attach the overlay data to the backend-specific video frame buffers
      in a generic way and do the actual overlaying/blitting later in
      backend-specific code such as the video sink (or an accelerated
      encoder/transcoder)
      In this case, the actual overlay rendering (i.e. the actual text
      rendering or decoding DVD/DVB data into pixels) is done in the
      subtitle-format-specific GStreamer plugin. All knowledge about
      the subtitle format is contained in the overlay plugin then,
      and all knowledge about the video backend in the video backend
      specific plugin.
      The main question then is how to get the overlay pixels (and
      we will only deal with pixels here) from the overlay element
      to the video sink.
      This could be done in multiple ways: One could send custom
      events downstream with the overlay data, or one could attach
      the overlay data directly to the video buffers in some way.
      Sending inline events has the advantage that is is fairly
      transparent to any elements between the overlay element and
      the video sink: if an effects plugin creates a new video
      buffer for the output, nothing special needs to be done to
      maintain the subtitle overlay information, since the overlay
      data is not attached to the buffer. However, it slightly
      complicates things at the sink, since it would also need to
      look for the new event in question instead of just processing
      everything in its buffer render function.
      If one attaches the overlay data to the buffer directly, any
      element between overlay and video sink that creates a new
      video buffer would need to be aware of the overlay data
      attached to it and copy it over to the newly-created buffer.
      One would have to do implement a special kind of new query
      (e.g. FEATURE query) that is not passed on automatically by
      gst_pad_query_default() in order to make sure that all elements
      downstream will handle the attached overlay data. (This is only
      a problem if we want to also attach overlay data to raw video
      pixel buffers; for new non-raw types we can just make it
      mandatory and assume support and be done with it; for existing
      non-raw types nothing changes anyway if subtitles don't work)
      (we need to maintain backwards compatibility for existing raw
      video pipelines like e.g.:  ..decoder ! suboverlay ! encoder..)
      Even though slightly more work, attaching the overlay information
      to buffers seems more intuitive than sending it interleaved as
      events. And buffers stored or passed around (e.g. via the
      "last-buffer" property in the sink when doing screenshots via
      playbin) always contain all the information needed.
  (d) create a video/x-raw-*-delta format and use a backend-specific videomixer
      This possibility was hinted at already in the digression in
      section 1. It would satisfy the goal of keeping subtitle format
      knowledge in the subtitle plugins and video backend knowledge
      in the video backend plugin. It would also add a concept that
      might be generally useful (think ximagesrc capture with xdamage).
      However, it would require adding foorender variants of all the
      existing overlay elements, and changing playbin to that new
      design, which is somewhat intrusive. And given the general
      nature of such a new format/API, we would need to take a lot
      of care to be able to accommodate all possible use cases when
      designing the API, which makes it considerably more ambitious.
      Lastly, we would need to write videomixer variants for the
      various accelerated video backends as well.
 Overall (c) appears to be the most promising solution. It is the least
 intrusive and should be fairly straight-forward to implement with
 reasonable effort, requiring only small changes to existing elements
 and requiring no new elements.
 Doing the final overlaying in the sink as opposed to a videomixer
 or overlay in the middle of the pipeline has other advantages:
 - if video frames need to be dropped, e.g. for QoS reasons,
   we could also skip the actual subtitle overlaying and
   possibly the decoding/rendering as well, if the
   implementation and API allows for that to be delayed.
 - the sink often knows the actual size of the window/surface/screen
   the output video is rendered to. This *may* make it possible to
   render the overlay image in a higher resolution than the input
   video, solving a long standing issue with pixelated subtitles on
   top of low-resolution videos that are then scaled up in the sink.
   This would require for the rendering to be delayed of course instead
   of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
   in the overlay, but that could all be supported.
 - if the video backend / sink has support for high-quality text
   rendering (clutter?) we could just pass the text or pango markup
   to the sink and let it do the rest (this is unlikely to be
   supported in the general case - text and glyph rendering is
   hard; also, we don't really want to make up our own text markup
   system, and pango markup is probably too limited for complex
   karaoke stuff).
 === 4. API needed ===
  (a) Representation of subtitle overlays to be rendered
      We need to pass the overlay pixels from the overlay element to the
      sink somehow. Whatever the exact mechanism, let's assume we pass
      a refcounted GstVideoOverlayComposition struct or object.
      A composition is made up of one or more overlays/rectangles.
      In the simplest case an overlay rectangle is just a blob of
      RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other
      metadata, and there is only one rectangle to render.
      We're keeping the naming generic ("OverlayFoo" rather than
      "SubtitleFoo") here, since this might also be handy for
      other use cases such as e.g. logo overlays or so. It is not
      designed for full-fledged video stream mixing though.
        // Note: don't mind the exact implementation details, they'll be hidden
        // FIXME: might be confusing in 0.11 though since GstXOverlay was
        //        renamed to GstVideoOverlay in 0.11, but not much we can do,
        //        maybe we can rename GstVideoOverlay to something better
        struct GstVideoOverlayComposition
        {
            guint                          num_rectangles;
            GstVideoOverlayRectangle    ** rectangles;
            /* lowest rectangle sequence number still used by the upstream
             * overlay element. This way a renderer maintaining some kind of
             * rectangles <-> surface cache can know when to free cached
             * surfaces/rectangles. */
            guint                          min_seq_num_used;
            /* sequence number for the composition (same series as rectangles) */
            guint                          seq_num;
        }
        struct GstVideoOverlayRectangle
        {
            /* Position on video frame and dimension of output rectangle in
             * output frame terms (already adjusted for the PAR of the output
             * frame). x/y can be negative (overlay will be clipped then) */
            gint  x, y;
            guint render_width, render_height;
            /* Dimensions of overlay pixels */
            guint width, height, stride;
            /* This is the PAR of the overlay pixels */
            guint par_n, par_d;
            /* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
             * and BGRA on little-endian systems (i.e. pixels are treated as
             * 32-bit values and alpha is always in the most-significant byte,
             * and blue is in the least-significant byte).
             *
             * FIXME: does anyone actually use AYUV in practice? (we do
             * in our utility function to blend on top of raw video)
             * What about AYUV and endianness? Do we always have [A][Y][U][V]
             * in memory? */
            /* FIXME: maybe use our own enum? */
            GstVideoFormat format;
            /* Refcounted blob of memory, no caps or timestamps */
            GstBuffer *pixels;
            // FIXME: how to express source like text or pango markup?
            //        (just add source type enum + source buffer with data)
            //
            // FOR 0.10: always send pixel blobs, but attach source data in
            // addition (reason: if downstream changes, we can't renegotiate
            // that properly, if we just do a query of supported formats from
            // the start). Sink will just ignore pixels and use pango markup
            // from source data if it supports that.
            //
            // FOR 0.11: overlay should query formats (pango markup, pixels)
            // supported by downstream and then only send that. We can
            // renegotiate via the reconfigure event.
            //
            /* sequence number: useful for backends/renderers/sinks that want
             * to maintain a cache of rectangles <-> surfaces. The value of
             * the min_seq_num_used in the composition tells the renderer which
             * rectangles have expired. */
            guint      seq_num;
            /* FIXME: we also need a (private) way to cache converted/scaled
             * pixel blobs */
        }
      (a1) Overlay consumer API:
        How would this work in a video sink that supports scaling of textures:
        gst_foo_sink_render () {
          /* assume only one for now */
          if video_buffer has composition:
            composition = video_buffer.get_composition()
            for each rectangle in composition:
              if rectangle.source_data_type == PANGO_MARKUP
                actor = text_from_pango_markup (rectangle.get_source_data())
              else
                pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
                actor = texture_from_rgba (pixels, ...)
              .. position + scale on top of video surface ...
        }
      (a2) Overlay producer API:
        e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
         if (logoverlay->cached_composition == NULL) {
           comp = composition_new ();
           rect = rectangle_new (format, pixels_buf,
                                 width, height, stride, par_n, par_d,
                                 x, y, render_width, render_height);
           /* composition adds its own ref for the rectangle */
           composition_add_rectangle (comp, rect);
           rectangle_unref (rect);
           /* buffer adds its own ref for the composition */
           video_buffer_attach_composition (comp);
           /* we take ownership of the composition and save it for later */
           logoverlay->cached_composition = comp;
         } else {
           video_buffer_attach_composition (logoverlay->cached_composition);
         }
      FIXME: also add some API to modify render position/dimensions of
      a rectangle (probably requires creation of new rectangle, unless
      we handle writability like with other mini objects).
  (b) Fallback overlay rendering/blitting on top of raw video
      Eventually we want to use this overlay mechanism not only for
      hardware-accelerated video, but also for plain old raw video,
      either at the sink or in the overlay element directly.
      Apart from the advantages listed earlier in section 3, this
      allows us to consolidate a lot of overlaying/blitting code that
      is currently repeated in every single overlay element in one
      location. This makes it considerably easier to support a whole
      range of raw video formats out of the box, add SIMD-optimised
      rendering using ORC, or handle corner cases correctly.
      (Note: side-effect of overlaying raw video at the video sink is
      that if e.g. a screnshotter gets the last buffer via the last-buffer
      property of basesink, it would get an image without the subtitles
      on top. This could probably be fixed by re-implementing the
      property in GstVideoSink though. Playbin2 could handle this
      internally as well).
        void
        gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
                                             GstBuffer                  * video_buf)
        {
          guint n;
          g_return_if_fail (gst_buffer_is_writable (video_buf));
          g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
          ... parse video_buffer caps into BlendVideoFormatInfo ...
          for each rectangle in the composition: {
                 if (gst_video_format_is_yuv (video_buf_format)) {
                   overlay_format = FORMAT_AYUV;
                 } else if (gst_video_format_is_rgb (video_buf_format)) {
                   overlay_format = FORMAT_ARGB;
                 } else {
                   /* FIXME: grayscale? */
                   return;
                 }
                 /* this will scale and convert AYUV<->ARGB if needed */
                 pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
                 ... clip output rectangle ...
                 __do_blend (video_buf_format, video_buf->data,
                             overlay_format, pixels->data,
                             x, y, width, height, stride);
                 gst_buffer_unref (pixels);
          }
        }
  (c) Flatten all rectangles in a composition
      We cannot assume that the video backend API can handle any
      number of rectangle overlays, it's possible that it only
      supports one single overlay, in which case we need to squash
      all rectangles into one.
      However, we'll just declare this a corner case for now, and
      implement it only if someone actually needs it. It's easy
      to add later API-wise. Might be a bit tricky if we have
      rectangles with different PARs/formats (e.g. subs and a logo),
      though we could probably always just use the code from (b)
      with a fully transparent video buffer to create a flattened
      overlay buffer.
  (d) core API: new FEATURE query
      For 0.10 we need to add a FEATURE query, so the overlay element
      can query whether the sink downstream and all elements between
      the overlay element and the sink support the new overlay API.
      Elements in between need to support it because the render
      positions and dimensions need to be updated if the video is
      cropped or rescaled, for example.
      In order to ensure that all elements support the new API,
      we need to drop the query in the pad default query handler
      (so it only succeeds if all elements handle it explicitly).
      Might want two variants of the feature query - one where
      all elements in the chain need to support it explicitly
      and one where it's enough if some element downstream
      supports it.
      In 0.11 this could probably be handled via GstMeta and
      ALLOCATION queries (and/or we could simply require
      elements to be aware of this API from the start).
      There appears to be no issue with downstream possibly
      not being linked yet at the time when an overlay would
      want to do such a query.
 Other considerations:
 - renderers (overlays or sinks) may be able to handle only ARGB or only AYUV
   (for most graphics/hw-API it's likely ARGB of some sort, while our
   blending utility functions will likely want the same colour space as
   the underlying raw video format, which is usually YUV of some sort).
   We need to convert where required, and should cache the conversion.
 - renderers may or may not be able to scale the overlay. We need to
   do the scaling internally if not (simple case: just horizontal scaling
   to adjust for PAR differences; complex case: both horizontal and vertical
   scaling, e.g. if subs come from a different source than the video or the
   video has been rescaled or cropped between overlay element and sink).
 - renderers may be able to generate (possibly scaled) pixels on demand
   from the original data (e.g. a string or RLE-encoded data). We will
   ignore this for now, since this functionality can still be added later
   via API additions. The most interesting case would be to pass a pango
   markup string, since e.g. clutter can handle that natively.
 - renderers may be able to write data directly on top of the video pixels
   (instead of creating an intermediary buffer with the overlay which is
   then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay
   However, in the interest of simplicity, we should probably ignore the
   fact that some elements can blend their overlays directly on top of the
   video (decoding/uncompressing them on the fly), even more so as it's
   not obvious that it's actually faster to decode the same overlay
   70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend
   it 70-90 times instead of decoding it once into a temporary buffer
   and then blending it directly from there, possibly SIMD-accelerated.
   Also, this is only relevant if the video is raw video and not some
   hardware-acceleration backend object.
   And ultimately it is the overlay element that decides whether to do
   the overlay right there and then or have the sink do it (if supported).
   It could decide to keep doing the overlay itself for raw video and
   only use our new API for non-raw video.
 - renderers may want to make sure they only upload the overlay pixels once
   per rectangle if that rectangle recurs in subsequent frames (as part of
   the same composition or a different composition), as is likely. This caching
   of e.g. surfaces needs to be done renderer-side and can be accomplished
   based on the sequence numbers. The composition contains the lowest
   sequence number still in use upstream (an overlay element may want to
   cache created compositions+rectangles as well after all to re-use them
   for multiple frames), based on that the renderer can expire cached
   objects. The caching needs to be done renderer-side because attaching
   renderer-specific objects to the rectangles won't work well given the
   refcounted nature of rectangles and compositions, making it unpredictable
   when a rectangle or composition will be freed or from which thread
   context it will be freed. The renderer-specific objects are likely bound
   to other types of renderer-specific contexts, and need to be managed
   in connection with those.
 - composition/rectangles should internally provide a certain degree of
   thread-safety. Multiple elements (sinks, overlay element) might access
   or use the same objects from multiple threads at the same time, and it
   is expected that elements will keep a ref to compositions and rectangles
   they push downstream for a while, e.g. until the current subtitle
   composition expires.
 === 5. Future considerations ===
 - alternatives: there may be multiple versions/variants of the same subtitle
   stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same
   subtitles. We could attach both variants and let the renderer pick the best
   one  for the situation (currently we just use the 16:9 version). With totem,
   it's ultimately totem that adds the 'black bars' at the top/bottom, so totem
   also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which
   may render on top of the bars) or not, for example.
 === 6. Misc. FIXMEs ===
 TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay
 gst-launch-0.10 \
    videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
    videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
    videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink
 ~~~ THE END ~~~ 
--- a/docs/design/part-interlaced-video.txt
+++ b/docs/design/part-interlaced-video.txt
@ -1,107 +0,0 @@
 Interlaced Video
 ================
 Video buffers have a number of states identifiable through a combination of caps
 and buffer flags.
 Possible states:
 - Progressive
 - Interlaced
  - Plain
    - One field
    - Two fields
    - Three fields - this should be a progressive buffer with a repeated 'first'
      field that can be used for telecine pulldown
  - Telecine
    - One field
    - Two fields
      - Progressive
      - Interlaced (a.k.a. 'mixed'; the fields are from different frames)
    - Three fields - this should be a progressive buffer with a repeated 'first'
      field that can be used for telecine pulldown
 Note: It can be seen that the difference between the plain interlaced and
 telecine states is that in the telecine state, buffers containing two fields may
 be progressive.
 Tools for identification:
 - GstVideoInfo
  - GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_...
    - PROGRESSIVE
    - INTERLEAVED
    - MIXED
 - Buffers flags - GST_VIDEO_BUFFER_FLAG_...
  - TFF
  - RFF
  - ONEFIELD
  - INTERLACED
 Identification of Buffer States
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Note that flags are not necessarily interpreted in the same way for all
 different states nor are they necessarily required nor make sense in all cases.
 Progressive
 ...........
 If the interlace mode in the video info corresponding to a buffer is
 "progressive", then the buffer is progressive.
 Plain Interlaced
 ................
 If the video info interlace mode is "interleaved", then the buffer is plain
 interlaced.
 GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be
 displayed first. The timestamp on the buffer corresponds to the first field.
 GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag)
 should be repeated. This is generally only used for telecine purposes but as the
 telecine state was added long after the interlaced state was added and defined,
 this flag remains valid for plain interlaced buffers.
 GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF
 flag is to be used. The other field should be ignored.
 Telecine
 ........
 If video info interlace mode is "mixed" then the buffers are in some form of
 telecine state.
 The TFF and ONEFIELD flags have the same semantics as for the plain interlaced
 state.
 GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains
 only repeated fields that are present in other buffers and are as such
 unneeded. For example, in a sequence of three telecined frames, we might have:
 AtAb AtBb BtBb
 In this situation, we only need the first and third buffers as the second
 buffer contains fields present in the first and third.
 Note that the following state can have its second buffer identified using the
 ONEFIELD flag (and TFF not set):
 AtAb AtBb BtCb
 The telecine state requires one additional flag to be able to identify
 progressive buffers.
 The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an
 'interlaced' or 'mixed' buffer that contains two fields that, when combined
 with fields from adjacent buffers, allow reconstruction of progressive frames.
 The absence of the flag implies the buffer containing two fields is a
 progressive frame.
 For example in the following sequence, the third buffer would be mixed (yes, it
 is a strange pattern, but it can happen):
 AtAb AtBb BtCb CtDb DtDb
--- a/docs/design/part-mediatype-audio-raw.txt
+++ b/docs/design/part-mediatype-audio-raw.txt
@ -1,76 +0,0 @@
 Media Types
 -----------
 audio/x-raw
  format, G_TYPE_STRING, mandatory
   The format of the audio samples, see the Formats section for a list
   of valid sample formats.
  rate, G_TYPE_INT, mandatory
   The samplerate of the audio
  channels, G_TYPE_INT, mandatory
   The number of channels
  channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels
   Bitmask of channel positions present. May be omitted for mono and
   stereo. May be set to 0 to denote that the channels are unpositioned.
  layout, G_TYPE_STRING, mandatory
   The layout of channels within a buffer. Possible values are
   "interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR)
 Use GstAudioInfo and related helper API to create and parse raw audio caps.
 Metadata
 --------
 "GstAudioDownmixMeta"
   A matrix for downmixing multichannel audio to a lower numer of channels.
 Formats
 -------
 The following values can be used for the format string property.
  "S8" 8-bit signed PCM audio
  "U8" 8-bit unsigned PCM audio
  "S16LE" 16-bit signed PCM audio
  "S16BE" 16-bit signed PCM audio
  "U16LE" 16-bit unsigned PCM audio
  "U16BE" 16-bit unsigned PCM audio
  "S24_32LE" 24-bit signed PCM audio packed into 32-bit
  "S24_32BE" 24-bit signed PCM audio packed into 32-bit
  "U24_32LE" 24-bit unsigned PCM audio packed into 32-bit
  "U24_32BE" 24-bit unsigned PCM audio packed into 32-bit
  "S32LE" 32-bit signed PCM audio
  "S32BE" 32-bit signed PCM audio
  "U32LE" 32-bit unsigned PCM audio
  "U32BE" 32-bit unsigned PCM audio
  "S24LE" 24-bit signed PCM audio
  "S24BE" 24-bit signed PCM audio
  "U24LE" 24-bit unsigned PCM audio
  "U24BE" 24-bit unsigned PCM audio
  "S20LE" 20-bit signed PCM audio
  "S20BE" 20-bit signed PCM audio
  "U20LE" 20-bit unsigned PCM audio
  "U20BE" 20-bit unsigned PCM audio
  "S18LE" 18-bit signed PCM audio
  "S18BE" 18-bit signed PCM audio
  "U18LE" 18-bit unsigned PCM audio
  "U18BE" 18-bit unsigned PCM audio
  "F32LE" 32-bit floating-point audio
  "F32BE" 32-bit floating-point audio
  "F64LE" 64-bit floating-point audio
  "F64BE" 64-bit floating-point audio
--- a/docs/design/part-mediatype-text-raw.txt
+++ b/docs/design/part-mediatype-text-raw.txt
@ -1,28 +0,0 @@
 Media Types
 -----------
 text/x-raw
  format, G_TYPE_STRING, mandatory
    The format of the text, see the Formats section for a list of valid format
    strings.
 Metadata
 --------
  There are no common metas for this raw format yet.
 Formats
 -------
 "utf8" plain timed utf8 text (formerly text/plain)
        Parsed timed text in utf8 format.
 "pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup)
        Same as "utf8", but text embedded in an XML-style markup language for
        size, colour, emphasis, etc.
        See http://developer.gnome.org/pango/stable/PangoMarkupFormat.html
--- a/docs/design/part-mediatype-video-raw.txt
+++ b/docs/design/part-mediatype-video-raw.txt
--- a/docs/design/part-playbin.txt
+++ b/docs/design/part-playbin.txt
@ -1,69 +0,0 @@
 playbin
 --------
 The purpose of this element is to decode and render the media contained in a
 given generic uri. The element extends GstPipeline and is typically used in
 playback situations.
 Required features:
 - accept and play any valid uri. This includes
   - rendering video/audio
   - overlaying subtitles on the video
 - optionally read external subtitle files
 - allow for hardware (non raw) sinks
 - selection of audio/video/subtitle streams based on language.
 - perform network buffering/incremental download
 - gapless playback
 - support for visualisations with configurable sizes
 - ability to reject files that are too big, or of a format that would require
   too much CPU/memory usage.
 - be very efficient with adding elements such as converters to reduce the
   amount of negotiation that has to happen.
 - handle chained oggs. This includes having support for dynamic pad add and
   remove from a demuxer.
 Components
 ----------
 * decodebin2
 - performs the autoplugging of demuxers/decoders
 - emits signals when for steering the autoplugging
   - to decide if a non-raw media format is acceptable as output
   - to sort the possible decoders for a non-raw format
 - see also decodebin2 design doc
 * uridecodebin
 - combination of a source to handle the given uri, an optional queueing element
   and one or more decodebin2 elements to decode the non-raw streams.
 * playsink
 - handles display of audio/video/text.
 - has request audio/video/text input pad. There is only one sinkpad per type.
   The requested pads define the configuration of the internal pipeline. 
 - allows for setting audio/video sinks or does automatic sink selection.
 - allows for configuration of visualisation element.
 - allows for enable/disable of visualisation, audio and video.
 * playbin
 - combination of one or more uridecodebin elements to read the uri and subtitle
   uri.
 - support for queuing new media to support gapless playback.
 - handles stream selection.
 - uses playsink to display.
 - selection of sinks and configuration of uridecodebin with raw output formats.
 Gapless playback
 ----------------
 playbin has an "about-to-finish" signal. The application should configure a new
 uri (and optional suburi) in the callback. When the current media finishes, this
 new media will be played next.
--- a/docs/design/part-stereo-multiview-video.markdown
+++ b/docs/design/part-stereo-multiview-video.markdown
@ -1,278 +0,0 @@
 Design for Stereoscopic & Multiview Video Handling
 ==================================================
 There are two cases to handle:
 * Encoded video output from a demuxer to parser / decoder or from encoders into a muxer.
 * Raw video buffers
 The design below is somewhat based on the proposals from
 [bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
 Multiview is used as a generic term to refer to handling both
 stereo content (left and right eye only) as well as extensions for videos
 containing multiple independent viewpoints.
 Encoded Signalling
 ------------------
 This is regarding the signalling in caps and buffers from demuxers to
 parsers (sometimes) or out from encoders.
 For backward compatibility with existing codecs many transports of
 stereoscopic 3D content use normal 2D video with 2 views packed spatially
 in some way, and put extra new descriptions in the container/mux.
 Info in the demuxer seems to apply to stereo encodings only. For all
 MVC methods I know, the multiview encoding is in the video bitstream itself
 and therefore already available to decoders. Only stereo systems have been retro-fitted
 into the demuxer.
 Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
 and it would be useful to be able to put the info onto caps and buffers from the
 parser without decoding.
 To handle both cases, we need to be able to output the required details on
 encoded video for decoders to apply onto the raw video buffers they decode.
 *If there ever is a need to transport multiview info for encoded data the
 same system below for raw video or some variation should work*
 ### Encoded Video: Properties that need to be encoded into caps
 1. multiview-mode (called "Channel Layout" in bug 611157)
    * Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
      (switches between mono and stereo - mp4 can do this)
    * Uses a buffer flag to mark individual buffers as mono or "not mono"
      (single|stereo|multiview) for mixed scenarios. The alternative (not
      proposed) is for the demuxer to switch caps for each mono to not-mono
      change, and not used a 'mixed' caps variant at all.
    * _single_ refers to a stream of buffers that only contain 1 view.
      It is different from mono in that the stream is a marked left or right
      eye stream for later combining in a mixer or when displaying.
    * _multiple_ marks a stream with multiple independent views encoded.
      It is included in this list for completeness. As noted above, there's
      currently no scenario that requires marking encoded buffers as MVC.
 2. Frame-packing arrangements / view sequence orderings
    * Possible frame packings: side-by-side, side-by-side-quincunx,
      column-interleaved, row-interleaved, top-bottom, checker-board
    * bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
      I think that's covered by suitably adjusting pixel-aspect-ratio. If
      not, they can be added later.
    * _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
    * _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
    * _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
      1 pixel offset of each eye needs to be accounted when upscaling or displaying
    * there may be other packings (future expansion)
    * Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
    * _frame-by-frame_, each buffer is left, then right view etc
    * _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
      Demuxer info indicates which one is which.
      Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
      -> *Leave this for future expansion*
    * _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
      -> *Leave this for future expansion / deletion*
 3. view encoding order
    * Describes how to decide which piece of each frame corresponds to left or right eye
    * Possible orderings left, right, left-then-right, right-then-left
    - Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
    - Need a buffer flag for marking the first buffer of a group.
 4. "Frame layout flags"
    * flags for view specific interpretation
    * horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
      Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
    * This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
    * It might be better just to use a hex value / integer
 Buffer representation for raw video
 -----------------------------------
 * Transported as normal video buffers with extra metadata
 * The caps define the overall buffer width/height, with helper functions to
  extract the individual views for packed formats
 * pixel-aspect-ratio adjusted if needed to double the overall width/height
 * video sinks that don't know about multiview extensions yet will show the packed view as-is
  For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps
  can disallow those transports.
 * _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice
  the height, so can be handled by adjusting the overall caps and strides
 * Other exotic layouts need new pixel formats defined (checker-board, column-interleaved, side-by-side-quincunx)
 * _Frame-by-frame_ - one view per buffer, but with alternating metas marking which buffer is which left/right/other view and using a new buffer flag as described above
  to mark the start of a group of corresponding frames.
 * New video caps addition as for encoded buffers
 ### Proposed Caps fields
 Combining the requirements above and collapsing the combinations into mnemonics:
 * multiview-mode =
   mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
   frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
   mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
 * multiview-flags =
    + 0x0000 none
    + 0x0001 right-view-first
    + 0x0002 left-h-flipped
    + 0x0004 left-v-flipped
    + 0x0008 right-h-flipped
    + 0x0010 right-v-flipped
 ### Proposed new buffer flags
 Add two new GST_VIDEO_BUFFER flags in video-frame.h and make it clear that those
 flags can apply to encoded video buffers too. wtay says that's currently the
 case anyway, but the documentation should say it.
 **GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW** - Marks a buffer as representing non-mono content, although it may be a single (left or right) eye view.
 **GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE** - for frame-sequential methods of transport, mark the "first" of a left/right/other group of frames
 ### A new GstMultiviewMeta
 This provides a place to describe all provided views in a buffer / stream,
 and through Meta negotiation to inform decoders about which views to decode if
 not all are wanted.
 * Logical labels/names and mapping to GstVideoMeta numbers
 * Standard view labels LEFT/RIGHT, and non-standard ones (strings)
        GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
        GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
        struct GstVideoMultiviewViewInfo {
            guint view_label;
            guint meta_id; // id of the GstVideoMeta for this view
            padding;
        }
        struct GstVideoMultiviewMeta {
            guint n_views;
            GstVideoMultiviewViewInfo *view_info;
        }
 The meta is optional, and probably only useful later for MVC
 Outputting stereo content
 -------------------------
 The initial implementation for output will be stereo content in glimagesink
 ### Output Considerations with OpenGL
 * If we have support for stereo GL buffer formats, we can output separate left/right eye images and let the hardware take care of display.
 * Otherwise, glimagesink needs to render one window with left/right in a suitable frame packing
  and that will only show correctly in fullscreen on a device set for the right 3D packing -> requires app intervention to set the video mode.
 * Which could be done manually on the TV, or with HDMI 1.4 by setting the right video mode for the screen to inform the TV or third option, we
  support rendering to two separate overlay areas on the screen - one for left eye, one for right which can be supported using the 'splitter' element and 2 output sinks or, better, add a 2nd window overlay for split stereo output
 * Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so initial implementation won't include that
 ## Other elements for handling multiview content
 * videooverlay interface extensions
    * __Q__: Should this be a new interface?
    * Element message to communicate the presence of stereoscopic information to the app
    * App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
        * Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
    * New API for the app to set rendering options for stereo/multiview content
    * This might be best implemented as a **multiview GstContext**, so that
      the pipeline can share app preferences for content interpretation and downmixing
      to mono for output, or in the sink and have those down as far upstream/downstream as possible.
 * Converter element
    * convert different view layouts
    * Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
 * Mixer element
    * take 2 video streams and output as stereo
    * later take n video streams
    * share code with the converter, it just takes input from n pads instead of one.
 * Splitter element
    * Output one pad per view
 ### Implementing MVC handling in decoders / parsers (and encoders)
 Things to do to implement MVC handling
 1. Parsing SEI in h264parse and setting caps (patches available in
   bugzilla for parsing, see below)
 2. Integrate gstreamer-vaapi MVC support with this proposal
 3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
 4. generating SEI in H.264 encoder
 5. Support for MPEG2 MVC extensions
 ## Relevant bugs
 [bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
 [bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
 [bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
 ## Other Information
 [Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
 ## Open Questions
 ### Background
 ### Representation for GstGL
 When uploading raw video frames to GL textures, the goal is to implement:
 2. Split packed frames into separate GL textures when uploading, and
 attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
 multiview-flags fields in the caps should change to reflect the conversion
 from one incoming GstMemory to multiple GstGLMemory, and change the
 width/height in the output info as needed.
 This is (currently) targetted as 2 render passes - upload as normal
 to a single stereo-packed RGBA texture, and then unpack into 2
 smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
 2 GstGLMemory attached to one buffer. We can optimise the upload later
 to go directly to 2 textures for common input formats.
 Separat output textures have a few advantages:
 * Filter elements can more easily apply filters in several passes to each
 texture without fundamental changes to our filters to avoid mixing pixels
 from separate views.
 * Centralises the sampling of input video frame packings in the upload code,
 which makes adding new packings in the future easier.
 * Sampling multiple textures to generate various output frame-packings
 for display is conceptually simpler than converting from any input packing
 to any output packing.
 * In implementations that support quad buffers, having separate textures
 makes it trivial to do GL_LEFT/GL_RIGHT output
 For either option, we'll need new glsink output API to pass more
 information to applications about multiple views for the draw signal/callback.
 I don't know if it's desirable to support *both* methods of representing
 views. If so, that should be signalled in the caps too. That could be a
 new multiview-mode for passing views in separate GstMemory objects
 attached to a GstBuffer, which would not be GL specific.
 ### Overriding frame packing interpretation
 Most sample videos available are frame packed, with no metadata
 to say so. How should we override that interpretation?
 * Simple answer: Use capssetter + new properties on playbin to
  override the multiview fields
  *Basically implemented in playbin, using a pad probe. Needs more work for completeness*
 ### Adding extra GstVideoMeta to buffers
 There should be one GstVideoMeta for the entire video frame in packed
 layouts, and one GstVideoMeta per GstGLMemory when views are attached
 to a GstBuffer separately. This should be done by the buffer pool,
 which knows from the caps.
 ### videooverlay interface extensions
 GstVideoOverlay needs:
 * A way to announce the presence of multiview content when it is
  detected/signalled in a stream.
 * A way to tell applications which output methods are supported/available
 * A way to tell the sink which output method it should use
 * Possibly a way to tell the sink to override the input frame
  interpretation / caps - depends on the answer to the question
  above about how to model overriding input interpretation.
 ### What's implemented
 * Caps handling
 * gst-plugins-base libsgstvideo pieces
 * playbin caps overriding
 * conversion elements - glstereomix, gl3dconvert (needs a rename),
  glstereosplit.
 ### Possible future enhancements
 * Make GLupload split to separate textures at upload time?
    * Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
 * Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
  - current done by packing then downloading which isn't OK overhead for RGBA download
 * Think about how we integrate GLstereo - do we need to do anything special,
  or can the app just render to stereo/quad buffers if they're available?