mirror of
https://gitlab.freedesktop.org/gstreamer/gstreamer.git
synced 2024-12-18 22:36:33 +00:00
docs: design: move most design docs to gst-docs module
This commit is contained in:
parent
49653b058a
commit
46138b1b1d
13 changed files with 1 additions and 3652 deletions
|
@ -2,16 +2,5 @@ SUBDIRS =
|
|||
|
||||
|
||||
EXTRA_DIST = \
|
||||
design-audiosinks.txt \
|
||||
design-decodebin.txt \
|
||||
design-encoding.txt \
|
||||
design-orc-integration.txt \
|
||||
draft-hw-acceleration.txt \
|
||||
draft-keyframe-force.txt \
|
||||
draft-subtitle-overlays.txt\
|
||||
draft-va.txt \
|
||||
part-interlaced-video.txt \
|
||||
part-mediatype-audio-raw.txt\
|
||||
part-mediatype-text-raw.txt\
|
||||
part-mediatype-video-raw.txt\
|
||||
part-playbin.txt
|
||||
draft-va.txt
|
||||
|
|
|
@ -1,138 +0,0 @@
|
|||
Audiosink design
|
||||
----------------
|
||||
|
||||
Requirements:
|
||||
|
||||
- must operate chain based.
|
||||
Most simple playback pipelines will push audio from the decoders
|
||||
into the audio sink.
|
||||
|
||||
- must operate getrange based
|
||||
Most professional audio applications will operate in a mode where
|
||||
the audio sink pulls samples from the pipeline. This is typically
|
||||
done in a callback from the audiosink requesting N samples. The
|
||||
callback is either scheduled from a thread or from an interrupt
|
||||
from the audio hardware device.
|
||||
|
||||
- Exact sample accurate clocks.
|
||||
the audiosink must be able to provide a clock that is sample
|
||||
accurate even if samples are dropped or when discontinuities are
|
||||
found in the stream.
|
||||
|
||||
- Exact timing of playback.
|
||||
The audiosink must be able to play samples at their exact times.
|
||||
|
||||
- use DMA access when possible.
|
||||
When the hardware can do DMA we should use it. This should also
|
||||
work over bufferpools to avoid data copying to/from kernel space.
|
||||
|
||||
|
||||
Design:
|
||||
|
||||
The design is based on a set of base classes and the concept of a
|
||||
ringbuffer of samples.
|
||||
|
||||
+-----------+ - provide preroll, rendering, timing
|
||||
+ basesink + - caps nego
|
||||
+-----+-----+
|
||||
|
|
||||
+-----V----------+ - manages ringbuffer
|
||||
+ audiobasesink + - manages scheduling (push/pull)
|
||||
+-----+----------+ - manages clock/query/seek
|
||||
| - manages scheduling of samples in the ringbuffer
|
||||
| - manages caps parsing
|
||||
|
|
||||
+-----V------+ - default ringbuffer implementation with a GThread
|
||||
+ audiosink + - subclasses provide open/read/close methods
|
||||
+------------+
|
||||
|
||||
The ringbuffer is a contiguous piece of memory divided into segtotal
|
||||
pieces of segments. Each segment has segsize bytes.
|
||||
|
||||
play position
|
||||
v
|
||||
+---+---+---+-------------------------------------+----------+
|
||||
+ 0 | 1 | 2 | .... | segtotal |
|
||||
+---+---+---+-------------------------------------+----------+
|
||||
<--->
|
||||
segsize bytes = N samples * bytes_per_sample.
|
||||
|
||||
|
||||
The ringbuffer has a play position, which is expressed in
|
||||
segments. The play position is where the device is currently reading
|
||||
samples from the buffer.
|
||||
|
||||
The ringbuffer can be put to the PLAYING or STOPPED state.
|
||||
|
||||
In the STOPPED state no samples are played to the device and the play
|
||||
pointer does not advance.
|
||||
|
||||
In the PLAYING state samples are written to the device and the ringbuffer
|
||||
should call a configurable callback after each segment is written to the
|
||||
device. In this state the play pointer is advanced after each segment is
|
||||
written.
|
||||
|
||||
A write operation to the ringbuffer will put new samples in the ringbuffer.
|
||||
If there is not enough space in the ringbuffer, the write operation will
|
||||
block. The playback of the buffer never stops, even if the buffer is
|
||||
empty. When the buffer is empty, silence is played by the device.
|
||||
|
||||
The ringbuffer is implemented with lockfree atomic operations, especially
|
||||
on the reading side so that low-latency operations are possible.
|
||||
|
||||
Whenever new samples are to be put into the ringbuffer, the position of the
|
||||
read pointer is taken. The required write position is taken and the diff
|
||||
is made between the required and actual position. If the difference is <0,
|
||||
the sample is too late. If the difference is bigger than segtotal, the
|
||||
writing part has to wait for the play pointer to advance.
|
||||
|
||||
|
||||
Scheduling:
|
||||
|
||||
- chain based mode:
|
||||
|
||||
In chain based mode, bytes are written into the ringbuffer. This operation
|
||||
will eventually block when the ringbuffer is filled.
|
||||
|
||||
When no samples arrive in time, the ringbuffer will play silence. Each
|
||||
buffer that arrives will be placed into the ringbuffer at the correct
|
||||
times. This means that dropping samples or inserting silence is done
|
||||
automatically and very accurate and independend of the play pointer.
|
||||
|
||||
In this mode, the ringbuffer is usually kept as full as possible. When
|
||||
using a small buffer (small segsize and segtotal), the latency for audio
|
||||
to start from the sink to when it is played can be kept low but at least
|
||||
one context switch has to be made between read and write.
|
||||
|
||||
- getrange based mode
|
||||
|
||||
In getrange based mode, the audiobasesink will use the callback function
|
||||
of the ringbuffer to get a segsize samples from the peer element. These
|
||||
samples will then be placed in the ringbuffer at the next play position.
|
||||
It is assumed that the getrange function returns fast enough to fill the
|
||||
ringbuffer before the play pointer reaches the write pointer.
|
||||
|
||||
In this mode, the ringbuffer is usually kept as empty as possible. There
|
||||
is no context switch needed between the elements that create the samples
|
||||
and the actual writing of the samples to the device.
|
||||
|
||||
|
||||
DMA mode:
|
||||
|
||||
- Elements that can do DMA based access to the audio device have to subclass
|
||||
from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
|
||||
of GstRingBuffer.
|
||||
|
||||
The ringbuffer subclass should trigger a callback after writing or playing
|
||||
each sample to the device. This callback can be triggered from a thread or
|
||||
from a signal from the audio device.
|
||||
|
||||
|
||||
Clocks:
|
||||
|
||||
The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
|
||||
It can do this by using the play pointer and the delay to calculate the
|
||||
clock time.
|
||||
|
||||
|
||||
|
|
@ -1,274 +0,0 @@
|
|||
Decodebin design
|
||||
|
||||
GstDecodeBin
|
||||
------------
|
||||
|
||||
Description:
|
||||
|
||||
Autoplug and decode to raw media
|
||||
|
||||
Input : single pad with ANY caps Output : Dynamic pads
|
||||
|
||||
* Contents
|
||||
|
||||
_ a GstTypeFindElement connected to the single sink pad
|
||||
|
||||
_ optionally a demuxer/parser
|
||||
|
||||
_ optionally one or more DecodeGroup
|
||||
|
||||
* Autoplugging
|
||||
|
||||
The goal is to reach 'target' caps (by default raw media).
|
||||
|
||||
This is done by using the GstCaps of a source pad and finding the available
|
||||
demuxers/decoders GstElement that can be linked to that pad.
|
||||
|
||||
The process starts with the source pad of typefind and stops when no more
|
||||
non-target caps are left. It is commonly done while pre-rolling, but can also
|
||||
happen whenever a new pad appears on any element.
|
||||
|
||||
Once a target caps has been found, that pad is ghosted and the
|
||||
'pad-added' signal is emitted.
|
||||
|
||||
If no compatible elements can be found for a GstCaps, the pad is ghosted and
|
||||
the 'unknown-type' signal is emitted.
|
||||
|
||||
|
||||
* Assisted auto-plugging
|
||||
|
||||
When starting the auto-plugging process for a given GstCaps, two signals are
|
||||
emitted in the following way in order to allow the application/user to assist or
|
||||
fine-tune the process.
|
||||
|
||||
_ 'autoplug-continue' :
|
||||
|
||||
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
|
||||
|
||||
This signal is fired at the very beginning with the source pad GstCaps. If
|
||||
the callback returns TRUE, the process continues normally. If the callback
|
||||
returns FALSE, then the GstCaps are considered as a target caps and the
|
||||
autoplugging process stops.
|
||||
|
||||
- 'autoplug-factories' :
|
||||
|
||||
GValueArray user_function (GstElement* decodebin, GstPad* pad,
|
||||
GstCaps* caps);
|
||||
|
||||
Get a list of elementfactories for @pad with @caps. This function is used to
|
||||
instruct decodebin2 of the elements it should try to autoplug. The default
|
||||
behaviour when this function is not overriden is to get all elements that
|
||||
can handle @caps from the registry sorted by rank.
|
||||
|
||||
- 'autoplug-select' :
|
||||
|
||||
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
|
||||
GValueArray* factories);
|
||||
|
||||
This signal is fired once autoplugging has got a list of compatible
|
||||
GstElementFactory. The signal is emitted with the GstCaps of the source pad
|
||||
and a pointer on the GValueArray of compatible factories.
|
||||
|
||||
The callback should return the index of the elementfactory in @factories
|
||||
that should be tried next.
|
||||
|
||||
If the callback returns -1, the autoplugging process will stop as if no
|
||||
compatible factories were found.
|
||||
|
||||
The default implementation of this function will try to autoplug the first
|
||||
factory of the list.
|
||||
|
||||
* Target Caps
|
||||
|
||||
The target caps are a read/write GObject property of decodebin.
|
||||
|
||||
By default the target caps are:
|
||||
|
||||
_ Raw audio : audio/x-raw
|
||||
|
||||
_ and raw video : video/x-raw
|
||||
|
||||
_ and Text : text/plain, text/x-pango-markup
|
||||
|
||||
|
||||
* media chain/group handling
|
||||
|
||||
When autoplugging, all streams coming out of a demuxer will be grouped in a
|
||||
DecodeGroup.
|
||||
|
||||
All new source pads created on that demuxer after it has emitted the
|
||||
'no-more-pads' signal will be put in another DecodeGroup.
|
||||
|
||||
Only one decodegroup can be active at any given time. If a new decodegroup is
|
||||
created while another one exists, that decodegroup will be set as blocking until
|
||||
the existing one has drained.
|
||||
|
||||
|
||||
|
||||
DecodeGroup
|
||||
-----------
|
||||
|
||||
Description:
|
||||
|
||||
Streams belonging to the same group/chain of a media file.
|
||||
|
||||
* Contents
|
||||
|
||||
The DecodeGroup contains:
|
||||
|
||||
_ a GstMultiQueue to which all streams of a the media group are connected.
|
||||
|
||||
_ the eventual decoders which are autoplugged in order to produce the
|
||||
requested target pads.
|
||||
|
||||
* Proper group draining
|
||||
|
||||
The DecodeGroup takes care that all the streams in the group are completely
|
||||
drained (EOS has come through all source ghost pads).
|
||||
|
||||
* Pre-roll and block
|
||||
|
||||
The DecodeGroup has a global blocking feature. If enabled, all the ghosted
|
||||
source pads for that group will be blocked.
|
||||
|
||||
A method is available to unblock all blocked pads for that group.
|
||||
|
||||
|
||||
|
||||
GstMultiQueue
|
||||
-------------
|
||||
|
||||
Description:
|
||||
|
||||
Multiple input-output data queue
|
||||
|
||||
The GstMultiQueue achieves the same functionality as GstQueue, with a few
|
||||
differences:
|
||||
|
||||
* Multiple streams handling.
|
||||
|
||||
The element handles queueing data on more than one stream at once. To
|
||||
achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src
|
||||
pads (src_%u).
|
||||
|
||||
When requesting a given sinkpad, the associated srcpad for that stream will
|
||||
be created. Ex: requesting sink_1 will generate src_1.
|
||||
|
||||
|
||||
* Non-starvation on multiple streams.
|
||||
|
||||
If more than one stream is used with the element, the streams' queues will
|
||||
be dynamically grown (up to a limit), in order to ensure that no stream is
|
||||
risking data starvation. This guarantees that at any given time there are at
|
||||
least N bytes queued and available for each individual stream.
|
||||
|
||||
If an EOS event comes through a srcpad, the associated queue should be
|
||||
considered as 'not-empty' in the queue-size-growing algorithm.
|
||||
|
||||
|
||||
* Non-linked srcpads graceful handling.
|
||||
|
||||
A GstTask is started for all srcpads when going to GST_STATE_PAUSED.
|
||||
|
||||
The task are blocking against a GCondition which will be fired in two
|
||||
different cases:
|
||||
|
||||
_ When the associated queue has received a buffer.
|
||||
|
||||
_ When the associated queue was previously declared as 'not-linked' and the
|
||||
first buffer of the queue is scheduled to be pushed synchronously in
|
||||
relation to the order in which it arrived globally in the element (see
|
||||
'Synchronous data pushing' below).
|
||||
|
||||
When woken up by the GCondition, the GstTask will try to push the next
|
||||
GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
|
||||
GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
|
||||
pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
|
||||
'not-linked'.
|
||||
|
||||
If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
|
||||
then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
|
||||
return GST_FLOW_NOT_LINKED.
|
||||
|
||||
* Synchronous data pushing for non-linked pads.
|
||||
|
||||
In order to better support dynamic switching between streams, the multiqueue
|
||||
(unlike the current GStreamer queue) continues to push buffers on non-linked
|
||||
pads rather than shutting down.
|
||||
|
||||
In addition, to prevent a non-linked stream from very quickly consuming all
|
||||
available buffers and thus 'racing ahead' of the other streams, the element
|
||||
must ensure that buffers and inlined events for a non-linked stream are pushed
|
||||
in the same order as they were received, relative to the other streams
|
||||
controlled by the element. This means that a buffer cannot be pushed to a
|
||||
non-linked pad any sooner than buffers in any other stream which were received
|
||||
before it.
|
||||
|
||||
|
||||
=====================================
|
||||
Parsers, decoders and auto-plugging
|
||||
=====================================
|
||||
|
||||
This section has DRAFT status.
|
||||
|
||||
Some media formats come in different "flavours" or "stream formats". These
|
||||
formats differ in the way the setup data and media data is signalled and/or
|
||||
packaged. An example for this is H.264 video, where there is a bytestream
|
||||
format (with codec setup data signalled inline and units prefixed by a sync
|
||||
code and packet length information) and a "raw" format where codec setup
|
||||
data is signalled out of band (via the caps) and the chunking is implicit
|
||||
in the way the buffers were muxed into a container, to mention just two of
|
||||
the possible variants.
|
||||
|
||||
Especially on embedded platforms it is common that decoders can only
|
||||
handle one particular stream format, and not all of them.
|
||||
|
||||
Where there are multiple stream formats, parsers are usually expected
|
||||
to be able to convert between the different formats. This will, if
|
||||
implemented correctly, work as expected in a static pipeline such as
|
||||
|
||||
... ! parser ! decoder ! sink
|
||||
|
||||
where the parser can query the decoder's capabilities even before
|
||||
processing the first piece of data, and configure itself to convert
|
||||
accordingly, if conversion is needed at all.
|
||||
|
||||
In an auto-plugging context this is not so straight-forward though,
|
||||
because elements are plugged incrementally and not before the previous
|
||||
element has processes some data and decided what it will output exactly
|
||||
(unless the template caps are completely fixed, then it can continue
|
||||
right away, this is not always the case here though, see below). A
|
||||
parser will thus have to decide on *some* output format so auto-plugging
|
||||
can continue. It doesn't know anything about the available decoders and
|
||||
their capabilities though, so it's possible that it will choose a format
|
||||
that is not supported by any of the available decoders, or by the preferred
|
||||
decoder.
|
||||
|
||||
If the parser had sufficiently concise but fixed source pad template caps,
|
||||
decodebin could continue to plug a decoder right away, allowing the
|
||||
parser to configure itself in the same way as it would with a static
|
||||
pipeline. This is not an option, unfortunately, because often the
|
||||
parser needs to process some data to determine e.g. the format's profile or
|
||||
other stream properties (resolution, sample rate, channel configuration, etc.),
|
||||
and there may be different decoders for different profiles (e.g. DSP codec
|
||||
for baseline profile, and software fallback for main/high profile; or a DSP
|
||||
codec only supporting certain resolutions, with a software fallback for
|
||||
unusual resolutions). So if decodebin just plugged the most highest-ranking
|
||||
decoder, that decoder might not be be able to handle the actual stream later
|
||||
on, which would yield an error (this is a data flow error then which would
|
||||
be hard to intercept and avoid in decodebin). In other words, we can't solve
|
||||
this issue by plugging a decoder right away with the parser.
|
||||
|
||||
So decodebin needs to communicate to the parser the set of available decoder
|
||||
caps (which would contain the relevant capabilities/restrictions such as
|
||||
supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
|
||||
filtering/sorting of course.
|
||||
|
||||
This is done by plugging a capsfilter element right after the parser, and
|
||||
constructing set of filter caps from the list of available decoders (one
|
||||
appends at the end just the name(s) of the caps structures from the parser
|
||||
pad template caps to function as an 'ANY other' caps equivalent). This let
|
||||
the parser negotiate to a supported stream format in the same way as with
|
||||
the static pipeline mentioned above, but of course incur some overhead
|
||||
through the additional capsfilter element.
|
||||
|
|
@ -1,571 +0,0 @@
|
|||
Encoding and Muxing
|
||||
-------------------
|
||||
|
||||
Summary
|
||||
-------
|
||||
A. Problems
|
||||
B. Goals
|
||||
1. EncodeBin
|
||||
2. Encoding Profile System
|
||||
3. Helper Library for Profiles
|
||||
I. Use-cases researched
|
||||
|
||||
|
||||
A. Problems this proposal attempts to solve
|
||||
-------------------------------------------
|
||||
|
||||
* Duplication of pipeline code for gstreamer-based applications
|
||||
wishing to encode and or mux streams, leading to subtle differences
|
||||
and inconsistencies across those applications.
|
||||
|
||||
* No unified system for describing encoding targets for applications
|
||||
in a user-friendly way.
|
||||
|
||||
* No unified system for creating encoding targets for applications,
|
||||
resulting in duplication of code across all applications,
|
||||
differences and inconsistencies that come with that duplication,
|
||||
and applications hardcoding element names and settings resulting in
|
||||
poor portability.
|
||||
|
||||
|
||||
|
||||
B. Goals
|
||||
--------
|
||||
|
||||
1. Convenience encoding element
|
||||
|
||||
Create a convenience GstBin for encoding and muxing several streams,
|
||||
hereafter called 'EncodeBin'.
|
||||
|
||||
This element will only contain one single property, which is a
|
||||
profile.
|
||||
|
||||
2. Define a encoding profile system
|
||||
|
||||
2. Encoding profile helper library
|
||||
|
||||
Create a helper library to:
|
||||
* create EncodeBin instances based on profiles, and
|
||||
* help applications to create/load/save/browse those profiles.
|
||||
|
||||
|
||||
|
||||
|
||||
1. EncodeBin
|
||||
------------
|
||||
|
||||
1.1 Proposed API
|
||||
----------------
|
||||
|
||||
EncodeBin is a GstBin subclass.
|
||||
|
||||
It implements the GstTagSetter interface, by which it will proxy the
|
||||
calls to the muxer.
|
||||
|
||||
Only two introspectable property (i.e. usable without extra API):
|
||||
* A GstEncodingProfile*
|
||||
* The name of the profile to use
|
||||
|
||||
When a profile is selected, encodebin will:
|
||||
* Add REQUEST sinkpads for all the GstStreamProfile
|
||||
* Create the muxer and expose the source pad
|
||||
|
||||
Whenever a request pad is created, encodebin will:
|
||||
* Create the chain of elements for that pad
|
||||
* Ghost the sink pad
|
||||
* Return that ghost pad
|
||||
|
||||
This allows reducing the code to the minimum for applications
|
||||
wishing to encode a source for a given profile:
|
||||
|
||||
...
|
||||
|
||||
encbin = gst_element_factory_make("encodebin, NULL);
|
||||
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
|
||||
gst_element_link (encbin, filesink);
|
||||
|
||||
...
|
||||
|
||||
vsrcpad = gst_element_get_src_pad(source, "src1");
|
||||
vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
|
||||
gst_pad_link(vsrcpad, vsinkpad);
|
||||
|
||||
...
|
||||
|
||||
|
||||
1.2 Explanation of the Various stages in EncodeBin
|
||||
--------------------------------------------------
|
||||
|
||||
This describes the various stages which can happen in order to end
|
||||
up with a multiplexed stream that can then be stored or streamed.
|
||||
|
||||
1.2.1 Incoming streams
|
||||
|
||||
The streams fed to EncodeBin can be of various types:
|
||||
|
||||
* Video
|
||||
* Uncompressed (but maybe subsampled)
|
||||
* Compressed
|
||||
* Audio
|
||||
* Uncompressed (audio/x-raw)
|
||||
* Compressed
|
||||
* Timed text
|
||||
* Private streams
|
||||
|
||||
|
||||
1.2.2 Steps involved for raw video encoding
|
||||
|
||||
(0) Incoming Stream
|
||||
|
||||
(1) Transform raw video feed (optional)
|
||||
|
||||
Here we modify the various fundamental properties of a raw video
|
||||
stream to be compatible with the intersection of:
|
||||
* The encoder GstCaps and
|
||||
* The specified "Stream Restriction" of the profile/target
|
||||
|
||||
The fundamental properties that can be modified are:
|
||||
* width/height
|
||||
This is done with a video scaler.
|
||||
The DAR (Display Aspect Ratio) MUST be respected.
|
||||
If needed, black borders can be added to comply with the target DAR.
|
||||
* framerate
|
||||
* format/colorspace/depth
|
||||
All of this is done with a colorspace converter
|
||||
|
||||
(2) Actual encoding (optional for raw streams)
|
||||
|
||||
An encoder (with some optional settings) is used.
|
||||
|
||||
(3) Muxing
|
||||
|
||||
A muxer (with some optional settings) is used.
|
||||
|
||||
(4) Outgoing encoded and muxed stream
|
||||
|
||||
|
||||
1.2.3 Steps involved for raw audio encoding
|
||||
|
||||
This is roughly the same as for raw video, expect for (1)
|
||||
|
||||
(1) Transform raw audo feed (optional)
|
||||
|
||||
We modify the various fundamental properties of a raw audio stream to
|
||||
be compatible with the intersection of:
|
||||
* The encoder GstCaps and
|
||||
* The specified "Stream Restriction" of the profile/target
|
||||
|
||||
The fundamental properties that can be modifier are:
|
||||
* Number of channels
|
||||
* Type of raw audio (integer or floating point)
|
||||
* Depth (number of bits required to encode one sample)
|
||||
|
||||
|
||||
1.2.4 Steps involved for encoded audio/video streams
|
||||
|
||||
Steps (1) and (2) are replaced by a parser if a parser is available
|
||||
for the given format.
|
||||
|
||||
|
||||
1.2.5 Steps involved for other streams
|
||||
|
||||
Other streams will just be forwarded as-is to the muxer, provided the
|
||||
muxer accepts the stream type.
|
||||
|
||||
|
||||
|
||||
|
||||
2. Encoding Profile System
|
||||
--------------------------
|
||||
|
||||
This work is based on:
|
||||
* The existing GstPreset system for elements [0]
|
||||
* The gnome-media GConf audio profile system [1]
|
||||
* The investigation done into device profiles by Arista and
|
||||
Transmageddon [2 and 3]
|
||||
|
||||
2.2 Terminology
|
||||
---------------
|
||||
|
||||
* Encoding Target Category
|
||||
A Target Category is a classification of devices/systems/use-cases
|
||||
for encoding.
|
||||
|
||||
Such a classification is required in order for:
|
||||
* Applications with a very-specific use-case to limit the number of
|
||||
profiles they can offer the user. A screencasting application has
|
||||
no use with the online services targets for example.
|
||||
* Offering the user some initial classification in the case of a
|
||||
more generic encoding application (like a video editor or a
|
||||
transcoder).
|
||||
|
||||
Ex:
|
||||
Consumer devices
|
||||
Online service
|
||||
Intermediate Editing Format
|
||||
Screencast
|
||||
Capture
|
||||
Computer
|
||||
|
||||
* Encoding Profile Target
|
||||
A Profile Target describes a specific entity for which we wish to
|
||||
encode.
|
||||
A Profile Target must belong to at least one Target Category.
|
||||
It will define at least one Encoding Profile.
|
||||
|
||||
Ex (with category):
|
||||
Nokia N900 (Consumer device)
|
||||
Sony PlayStation 3 (Consumer device)
|
||||
Youtube (Online service)
|
||||
DNxHD (Intermediate editing format)
|
||||
HuffYUV (Screencast)
|
||||
Theora (Computer)
|
||||
|
||||
* Encoding Profile
|
||||
A specific combination of muxer, encoders, presets and limitations.
|
||||
|
||||
Ex:
|
||||
Nokia N900/H264 HQ
|
||||
Ipod/High Quality
|
||||
DVD/Pal
|
||||
Youtube/High Quality
|
||||
HTML5/Low Bandwith
|
||||
DNxHD
|
||||
|
||||
2.3 Encoding Profile
|
||||
--------------------
|
||||
|
||||
An encoding profile requires the following information:
|
||||
|
||||
* Name
|
||||
This string is not translatable and must be unique.
|
||||
A recommendation to guarantee uniqueness of the naming could be:
|
||||
<target>/<name>
|
||||
* Description
|
||||
This is a translatable string describing the profile
|
||||
* Muxing format
|
||||
This is a string containing the GStreamer media-type of the
|
||||
container format.
|
||||
* Muxing preset
|
||||
This is an optional string describing the preset(s) to use on the
|
||||
muxer.
|
||||
* Multipass setting
|
||||
This is a boolean describing whether the profile requires several
|
||||
passes.
|
||||
* List of Stream Profile
|
||||
|
||||
2.3.1 Stream Profiles
|
||||
|
||||
A Stream Profile consists of:
|
||||
|
||||
* Type
|
||||
The type of stream profile (audio, video, text, private-data)
|
||||
* Encoding Format
|
||||
This is a string containing the GStreamer media-type of the encoding
|
||||
format to be used. If encoding is not to be applied, the raw audio
|
||||
media type will be used.
|
||||
* Encoding preset
|
||||
This is an optional string describing the preset(s) to use on the
|
||||
encoder.
|
||||
* Restriction
|
||||
This is an optional GstCaps containing the restriction of the
|
||||
stream that can be fed to the encoder.
|
||||
This will generally containing restrictions in video
|
||||
width/heigh/framerate or audio depth.
|
||||
* presence
|
||||
This is an integer specifying how many streams can be used in the
|
||||
containing profile. 0 means that any number of streams can be
|
||||
used.
|
||||
* pass
|
||||
This is an integer which is only meaningful if the multipass flag
|
||||
has been set in the profile. If it has been set it indicates which
|
||||
pass this Stream Profile corresponds to.
|
||||
|
||||
2.4 Example profile
|
||||
-------------------
|
||||
|
||||
The representation used here is XML only as an example. No decision is
|
||||
made as to which formatting to use for storing targets and profiles.
|
||||
|
||||
<gst-encoding-target>
|
||||
<name>Nokia N900</name>
|
||||
<category>Consumer Device</category>
|
||||
<profiles>
|
||||
<profile>Nokia N900/H264 HQ</profile>
|
||||
<profile>Nokia N900/MP3</profile>
|
||||
<profile>Nokia N900/AAC</profile>
|
||||
</profiles>
|
||||
</gst-encoding-target>
|
||||
|
||||
<gst-encoding-profile>
|
||||
<name>Nokia N900/H264 HQ</name>
|
||||
<description>
|
||||
High Quality H264/AAC for the Nokia N900
|
||||
</description>
|
||||
<format>video/quicktime,variant=iso</format>
|
||||
<streams>
|
||||
<stream-profile>
|
||||
<type>audio</type>
|
||||
<format>audio/mpeg,mpegversion=4</format>
|
||||
<preset>Quality High/Main</preset>
|
||||
<restriction>audio/x-raw,channels=[1,2]</restriction>
|
||||
<presence>1</presence>
|
||||
</stream-profile>
|
||||
<stream-profile>
|
||||
<type>video</type>
|
||||
<format>video/x-h264</format>
|
||||
<preset>Profile Baseline/Quality High</preset>
|
||||
<restriction>
|
||||
video/x-raw,width=[16, 800],\
|
||||
height=[16, 480],framerate=[1/1, 30000/1001]
|
||||
</restriction>
|
||||
<presence>1</presence>
|
||||
</stream-profile>
|
||||
</streams>
|
||||
|
||||
</gst-encoding-profile>
|
||||
|
||||
2.5 API
|
||||
-------
|
||||
A proposed C API is contained in the gstprofile.h file in this directory.
|
||||
|
||||
|
||||
2.6 Modifications required in the existing GstPreset system
|
||||
-----------------------------------------------------------
|
||||
|
||||
2.6.1. Temporary preset.
|
||||
|
||||
Currently a preset needs to be saved on disk in order to be
|
||||
used.
|
||||
|
||||
This makes it impossible to have temporary presets (that exist only
|
||||
during the lifetime of a process), which might be required in the
|
||||
new proposed profile system
|
||||
|
||||
2.6.2 Categorisation of presets.
|
||||
|
||||
Currently presets are just aliases of a group of property/value
|
||||
without any meanings or explanation as to how they exclude each
|
||||
other.
|
||||
|
||||
Take for example the H264 encoder. It can have presets for:
|
||||
* passes (1,2 or 3 passes)
|
||||
* profiles (Baseline, Main, ...)
|
||||
* quality (Low, medium, High)
|
||||
|
||||
In order to programmatically know which presets exclude each other,
|
||||
we here propose the categorisation of these presets.
|
||||
|
||||
This can be done in one of two ways
|
||||
1. in the name (by making the name be [<category>:]<name>)
|
||||
This would give for example: "Quality:High", "Profile:Baseline"
|
||||
2. by adding a new _meta key
|
||||
This would give for example: _meta/category:quality
|
||||
|
||||
2.6.3 Aggregation of presets.
|
||||
|
||||
There can be more than one choice of presets to be done for an
|
||||
element (quality, profile, pass).
|
||||
|
||||
This means that one can not currently describe the full
|
||||
configuration of an element with a single string but with many.
|
||||
|
||||
The proposal here is to extend the GstPreset API to be able to set
|
||||
all presets using one string and a well-known separator ('/').
|
||||
|
||||
This change only requires changes in the core preset handling code.
|
||||
|
||||
This would allow doing the following:
|
||||
gst_preset_load_preset (h264enc,
|
||||
"pass:1/profile:baseline/quality:high");
|
||||
|
||||
2.7 Points to be determined
|
||||
---------------------------
|
||||
|
||||
This document hasn't determined yet how to solve the following
|
||||
problems:
|
||||
|
||||
2.7.1 Storage of profiles
|
||||
|
||||
One proposal for storage would be to use a system wide directory
|
||||
(like $prefix/share/gstreamer-0.10/profiles) and store XML files for
|
||||
every individual profiles.
|
||||
|
||||
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
|
||||
|
||||
This poses some limitations as to what to do if some applications
|
||||
want to have some profiles limited to their own usage.
|
||||
|
||||
|
||||
3. Helper library for profiles
|
||||
------------------------------
|
||||
|
||||
These helper methods could also be added to existing libraries (like
|
||||
GstPreset, GstPbUtils, ..).
|
||||
|
||||
The various API proposed are in the accompanying gstprofile.h file.
|
||||
|
||||
3.1 Getting user-readable names for formats
|
||||
|
||||
This is already provided by GstPbUtils.
|
||||
|
||||
3.2 Hierarchy of profiles
|
||||
|
||||
The goal is for applications to be able to present to the user a list
|
||||
of combo-boxes for choosing their output profile:
|
||||
|
||||
[ Category ] # optional, depends on the application
|
||||
[ Device/Site/.. ] # optional, depends on the application
|
||||
[ Profile ]
|
||||
|
||||
Convenience methods are offered to easily get lists of categories,
|
||||
devices, and profiles.
|
||||
|
||||
3.3 Creating Profiles
|
||||
|
||||
The goal is for applications to be able to easily create profiles.
|
||||
|
||||
The applications needs to be able to have a fast/efficient way to:
|
||||
* select a container format and see all compatible streams he can use
|
||||
with it.
|
||||
* select a codec format and see which container formats he can use
|
||||
with it.
|
||||
|
||||
The remaining parts concern the restrictions to encoder
|
||||
input.
|
||||
|
||||
3.4 Ensuring availability of plugins for Profiles
|
||||
|
||||
When an application wishes to use a Profile, it should be able to
|
||||
query whether it has all the needed plugins to use it.
|
||||
|
||||
This part will use GstPbUtils to query, and if needed install the
|
||||
missing plugins through the installed distribution plugin installer.
|
||||
|
||||
|
||||
I. Use-cases researched
|
||||
-----------------------
|
||||
|
||||
This is a list of various use-cases where encoding/muxing is being
|
||||
used.
|
||||
|
||||
* Transcoding
|
||||
|
||||
The goal is to convert with as minimal loss of quality any input
|
||||
file for a target use.
|
||||
A specific variant of this is transmuxing (see below).
|
||||
|
||||
Example applications: Arista, Transmageddon
|
||||
|
||||
* Rendering timelines
|
||||
|
||||
The incoming streams are a collection of various segments that need
|
||||
to be rendered.
|
||||
Those segments can vary in nature (i.e. the video width/height can
|
||||
change).
|
||||
This requires the use of identiy with the single-segment property
|
||||
activated to transform the incoming collection of segments to a
|
||||
single continuous segment.
|
||||
|
||||
Example applications: PiTiVi, Jokosher
|
||||
|
||||
* Encoding of live sources
|
||||
|
||||
The major risk to take into account is the encoder not encoding the
|
||||
incoming stream fast enough. This is outside of the scope of
|
||||
encodebin, and should be solved by using queues between the sources
|
||||
and encodebin, as well as implementing QoS in encoders and sources
|
||||
(the encoders emitting QoS events, and the upstream elements
|
||||
adapting themselves accordingly).
|
||||
|
||||
Example applications: camerabin, cheese
|
||||
|
||||
* Screencasting applications
|
||||
|
||||
This is similar to encoding of live sources.
|
||||
The difference being that due to the nature of the source (size and
|
||||
amount/frequency of updates) one might want to do the encoding in
|
||||
two parts:
|
||||
* The actual live capture is encoded with a 'almost-lossless' codec
|
||||
(such as huffyuv)
|
||||
* Once the capture is done, the file created in the first step is
|
||||
then rendered to the desired target format.
|
||||
|
||||
Fixing sources to only emit region-updates and having encoders
|
||||
capable of encoding those streams would fix the need for the first
|
||||
step but is outside of the scope of encodebin.
|
||||
|
||||
Example applications: Istanbul, gnome-shell, recordmydesktop
|
||||
|
||||
* Live transcoding
|
||||
|
||||
This is the case of an incoming live stream which will be
|
||||
broadcasted/transmitted live.
|
||||
One issue to take into account is to reduce the encoding latency to
|
||||
a minimum. This should mostly be done by picking low-latency
|
||||
encoders.
|
||||
|
||||
Example applications: Rygel, Coherence
|
||||
|
||||
* Transmuxing
|
||||
|
||||
Given a certain file, the aim is to remux the contents WITHOUT
|
||||
decoding into either a different container format or the same
|
||||
container format.
|
||||
Remuxing into the same container format is useful when the file was
|
||||
not created properly (for example, the index is missing).
|
||||
Whenever available, parsers should be applied on the encoded streams
|
||||
to validate and/or fix the streams before muxing them.
|
||||
|
||||
Metadata from the original file must be kept in the newly created
|
||||
file.
|
||||
|
||||
Example applications: Arista, Transmaggedon
|
||||
|
||||
* Loss-less cutting
|
||||
|
||||
Given a certain file, the aim is to extract a certain part of the
|
||||
file without going through the process of decoding and re-encoding
|
||||
that file.
|
||||
This is similar to the transmuxing use-case.
|
||||
|
||||
Example applications: PiTiVi, Transmageddon, Arista, ...
|
||||
|
||||
* Multi-pass encoding
|
||||
|
||||
Some encoders allow doing a multi-pass encoding.
|
||||
The initial pass(es) are only used to collect encoding estimates and
|
||||
are not actually muxed and outputted.
|
||||
The final pass uses previously collected information, and the output
|
||||
is then muxed and outputted.
|
||||
|
||||
* Archiving and intermediary format
|
||||
|
||||
The requirement is to have lossless
|
||||
|
||||
* CD ripping
|
||||
|
||||
Example applications: Sound-juicer
|
||||
|
||||
* DVD ripping
|
||||
|
||||
Example application: Thoggen
|
||||
|
||||
|
||||
|
||||
* Research links
|
||||
|
||||
Some of these are still active documents, some other not
|
||||
|
||||
[0] GstPreset API documentation
|
||||
http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
|
||||
|
||||
[1] gnome-media GConf profiles
|
||||
http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
|
||||
|
||||
[2] Research on a Device Profile API
|
||||
http://gstreamer.freedesktop.org/wiki/DeviceProfile
|
||||
|
||||
[3] Research on defining presets usage
|
||||
http://gstreamer.freedesktop.org/wiki/PresetDesign
|
||||
|
|
@ -1,204 +0,0 @@
|
|||
|
||||
Orc Integration
|
||||
===============
|
||||
|
||||
Sections
|
||||
--------
|
||||
|
||||
- About Orc
|
||||
- Fast memcpy()
|
||||
- Normal Usage
|
||||
- Build Process
|
||||
- Testing
|
||||
- Orc Limitations
|
||||
|
||||
|
||||
About Orc
|
||||
---------
|
||||
|
||||
Orc code can be in one of two forms: in .orc files that is converted
|
||||
by orcc to C code that calls liborc functions, or C code that calls
|
||||
liborc to create complex operations at runtime. The former is mostly
|
||||
for functions with predetermined functionality. The latter is for
|
||||
functionality that is determined at runtime, where writing .orc
|
||||
functions for all combinations would be prohibitive. Orc also has
|
||||
a fast memcpy and memset which are useful independently.
|
||||
|
||||
|
||||
Fast memcpy()
|
||||
-------------
|
||||
|
||||
*** This part is not integrated yet. ***
|
||||
|
||||
Orc has built-in functions orc_memcpy() and orc_memset() that work
|
||||
like memcpy() and memset(). These are meant for large copies only.
|
||||
A reasonable cutoff for using orc_memcpy() instead of memcpy() is
|
||||
if the number of bytes is generally greater than 100. DO NOT use
|
||||
orc_memcpy() if the typical is size is less than 20 bytes, especially
|
||||
if the size is known at compile time, as these cases are inlined by
|
||||
the compiler.
|
||||
|
||||
(Example: sys/ximage/ximagesink.c)
|
||||
|
||||
Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to
|
||||
libgstximagesink_la_LIBADD. Then, in the source file, add:
|
||||
|
||||
#ifdef HAVE_ORC
|
||||
#include <orc/orc.h>
|
||||
#else
|
||||
#define orc_memcpy(a,b,c) memcpy(a,b,c)
|
||||
#endif
|
||||
|
||||
Then switch relevant uses of memcpy() to orc_memcpy().
|
||||
|
||||
The above example works whether or not Orc is enabled at compile
|
||||
time.
|
||||
|
||||
|
||||
Normal Usage
|
||||
------------
|
||||
|
||||
The following lines are added near the top of Makefile.am for plugins
|
||||
that use Orc code in .orc files (this is for the volume plugin):
|
||||
|
||||
ORC_BASE=volume
|
||||
include $(top_srcdir)/common/orc.mk
|
||||
|
||||
Also add the generated source file to the plugin build:
|
||||
|
||||
nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES)
|
||||
|
||||
And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and
|
||||
$(ORC_LIBS) to libgstvolume_la_LIBADD.
|
||||
|
||||
The value assigned to ORC_BASE does not need to be related to
|
||||
the name of the plugin.
|
||||
|
||||
|
||||
Advanced Usage
|
||||
--------------
|
||||
|
||||
The Holy Grail of Orc usage is to programmatically generate Orc code
|
||||
at runtime, have liborc compile it into binary code at runtime, and
|
||||
then execute this code. Currently, the best example of this is in
|
||||
Schroedinger. An example of how this would be used is audioconvert:
|
||||
given an input format, channel position manipulation, dithering and
|
||||
quantizing configuration, and output format, a Orc code generator
|
||||
would create an OrcProgram, add the appropriate instructions to do
|
||||
each step based on the configuration, and then compile the program.
|
||||
Successfully compiling the program would return a function pointer
|
||||
that can be called to perform the operation.
|
||||
|
||||
This sort of advanced usage requires structural changes to current
|
||||
plugins (e.g., audioconvert) and will probably be developed
|
||||
incrementally. Moreover, if such code is intended to be used without
|
||||
Orc as strict build/runtime requirement, two codepaths would need to
|
||||
be developed and tested. For this reason, until GStreamer requires
|
||||
Orc, I think it's a good idea to restrict such advanced usage to the
|
||||
cog plugin in -bad, which requires Orc.
|
||||
|
||||
|
||||
Build Process
|
||||
-------------
|
||||
|
||||
The goal of the build process is to make Orc non-essential for most
|
||||
developers and users. This is not to say you shouldn't have Orc
|
||||
installed -- without it, you will get slow backup C code, just that
|
||||
people compiling GStreamer are not forced to switch from Liboil to
|
||||
Orc immediately.
|
||||
|
||||
With Orc installed, the build process will use the Orc Compiler (orcc)
|
||||
to convert each .orc file into a temporary C source (tmp-orc.c) and a
|
||||
temporary header file (${name}orc.h if constructed from ${base}.orc).
|
||||
The C source file is compiled and linked to the plugin, and the header
|
||||
file is included by other source files in the plugin.
|
||||
|
||||
If 'make orc-update' is run in the source directory, the files
|
||||
tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and
|
||||
${base}orc-dist.h respectively. The -dist.[ch] files are automatically
|
||||
disted via orc.mk. The -dist.[ch] files should be checked in to
|
||||
git whenever the .orc source is changed and checked in. Example
|
||||
workflow:
|
||||
|
||||
edit .orc file
|
||||
... make, test, etc.
|
||||
make orc-update
|
||||
git add volume.orc volumeorc-dist.c volumeorc-dist.h
|
||||
git commit
|
||||
|
||||
At 'make dist' time, all of the .orc files are compiled, and then
|
||||
copied to their -dist.[ch] counterparts, and then the -dist.[ch]
|
||||
files are added to the dist directory.
|
||||
|
||||
Without Orc installed (or --disable-orc given to configure), the
|
||||
-dist.[ch] files are copied to tmp-orc.c and ${name}orc.h. When
|
||||
compiled Orc disabled, DISABLE_ORC is defined in config.h, and
|
||||
the C backup code is compiled. This backup code is pure C, and
|
||||
does not include orc headers or require linking against liborc.
|
||||
|
||||
The common/orc.mk build method is limited by the inflexibility of
|
||||
automake. The file tmp-orc.c must be a fixed filename, using ORC_NAME
|
||||
to generate the filename does not work because it conflicts with
|
||||
automake's dependency generation. Building multiple .orc files
|
||||
is not possible due to this restriction.
|
||||
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
If you create another .orc file, please add it to
|
||||
tests/orc/Makefile.am. This causes automatic test code to be
|
||||
generated and run during 'make check'. Each function in the .orc
|
||||
file is tested by comparing the results of executing the run-time
|
||||
compiled code and the C backup function.
|
||||
|
||||
|
||||
Orc Limitations
|
||||
---------------
|
||||
|
||||
audioconvert
|
||||
|
||||
Orc doesn't have a mechanism for generating random numbers, which
|
||||
prevents its use as-is for dithering. One way around this is to
|
||||
generate suitable dithering values in one pass, then use those
|
||||
values in a second Orc-based pass.
|
||||
|
||||
Orc doesn't handle 64-bit float, for no good reason.
|
||||
|
||||
Irrespective of Orc handling 64-bit float, it would be useful to
|
||||
have a direct 32-bit float to 16-bit integer conversion.
|
||||
|
||||
audioconvert is a good candidate for programmatically generated
|
||||
Orc code.
|
||||
|
||||
audioconvert enumerates functions in terms of big-endian vs.
|
||||
little-endian. Orc's functions are "native" and "swapped".
|
||||
Programmatically generating code removes the need to worry about
|
||||
this.
|
||||
|
||||
Orc doesn't handle 24-bit samples. Fixing this is not a priority
|
||||
(for ds).
|
||||
|
||||
videoscale
|
||||
|
||||
Orc doesn't handle horizontal resampling yet. The plan is to add
|
||||
special sampling opcodes, for nearest, bilinear, and cubic
|
||||
interpolation.
|
||||
|
||||
videotestsrc
|
||||
|
||||
Lots of code in videotestsrc needs to be rewritten to be SIMD
|
||||
(and Orc) friendly, e.g., stuff that uses oil_splat_u8().
|
||||
|
||||
A fast low-quality random number generator in Orc would be useful
|
||||
here.
|
||||
|
||||
volume
|
||||
|
||||
Many of the comments on audioconvert apply here as well.
|
||||
|
||||
There are a bunch of FIXMEs in here that are due to misapplied
|
||||
patches.
|
||||
|
||||
|
||||
|
|
@ -1,91 +0,0 @@
|
|||
Forcing keyframes
|
||||
-----------------
|
||||
|
||||
Consider the following use case:
|
||||
|
||||
We have a pipeline that performs video and audio capture from a live source,
|
||||
compresses and muxes the streams and writes the resulting data into a file.
|
||||
|
||||
Inside the uncompressed video data we have a specific pattern inserted at
|
||||
specific moments that should trigger a switch to a new file, meaning, we close
|
||||
the existing file we are writing to and start writing to a new file.
|
||||
|
||||
We want the new file to start with a keyframe so that one can start decoding
|
||||
the file immediately.
|
||||
|
||||
Components:
|
||||
|
||||
1) We need an element that is able to detect the pattern in the video stream.
|
||||
|
||||
2) We need to inform the video encoder that it should start encoding a keyframe
|
||||
starting from exactly the frame with the pattern.
|
||||
|
||||
3) We need to inform the demuxer that it should flush out any pending data and
|
||||
start creating the start of a new file with the keyframe as a first video
|
||||
frame.
|
||||
|
||||
4) We need to inform the sink element that it should start writing to the next
|
||||
file. This requires application interaction to instruct the sink of the new
|
||||
filename. The application should also be free to ignore the boundary and
|
||||
continue to write to the existing file. The application will typically use
|
||||
an event pad probe to detect the custom event.
|
||||
|
||||
Implementation:
|
||||
|
||||
The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM
|
||||
event that marks the keyframe boundary. This event is inserted into the
|
||||
pipeline by the application upon a certain trigger. In the above use case this
|
||||
trigger would be given by the element that detects the pattern, in the form of
|
||||
an element message.
|
||||
|
||||
The custom event would travel further downstream to instruct encoder, muxer and
|
||||
sink about the possible switch.
|
||||
|
||||
The information passed in the event consists of:
|
||||
|
||||
name: GstForceKeyUnit
|
||||
(G_TYPE_UINT64)"timestamp" : the timestamp of the buffer that
|
||||
triggered the event.
|
||||
(G_TYPE_UINT64)"stream-time" : the stream position that triggered the
|
||||
event.
|
||||
(G_TYPE_UINT64)"running-time" : the running time of the stream when the
|
||||
event was triggered.
|
||||
(G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in
|
||||
the caps or those sent at the start of
|
||||
the stream.
|
||||
|
||||
.... : optional other data fields.
|
||||
|
||||
Note that this event is purely informational, no element is required to
|
||||
perform an action but it should forward the event downstream, just like any
|
||||
other event it does not handle.
|
||||
|
||||
Elements understanding the event should behave as follows:
|
||||
|
||||
1) The video encoder receives the event before the next frame. Upon reception
|
||||
of the event it schedules to encode the next frame as a keyframe.
|
||||
Before pushing out the encoded keyframe it must push the GstForceKeyUnit
|
||||
event downstream.
|
||||
|
||||
2) The muxer receives the GstForceKeyUnit event and flushes out its current state,
|
||||
preparing to produce data that can be used as a keyunit. Before pushing out
|
||||
the new data it pushes the GstForceKeyUnit event downstream.
|
||||
|
||||
3) The application receives the GstForceKeyUnit on a sink padprobe of the sink
|
||||
and reconfigures the sink to make it perform new actions after receiving
|
||||
the next buffer.
|
||||
|
||||
|
||||
Upstream
|
||||
--------
|
||||
|
||||
When using RTP packets can get lost or receivers can be added at any time,
|
||||
they may request a new key frame.
|
||||
|
||||
An downstream element sends an upstream "GstForceKeyUnit" event up the
|
||||
pipeline.
|
||||
|
||||
When an element produces some kind of key unit in output, but has
|
||||
no such concept in its input (like an encoder that takes raw frames),
|
||||
it consumes the event (doesn't pass it upstream), and instead sends
|
||||
a downstream GstForceKeyUnit event and a new keyframe.
|
|
@ -1,546 +0,0 @@
|
|||
===============================================================
|
||||
Subtitle overlays, hardware-accelerated decoding and playbin
|
||||
===============================================================
|
||||
|
||||
Status: EARLY DRAFT / BRAINSTORMING
|
||||
|
||||
=== 1. Background ===
|
||||
|
||||
Subtitles can be muxed in containers or come from an external source.
|
||||
|
||||
Subtitles come in many shapes and colours. Usually they are either
|
||||
text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
|
||||
and the most common form of DVB subs). Bitmap based subtitles are
|
||||
usually compressed in some way, like some form of run-length encoding.
|
||||
|
||||
Subtitles are currently decoded and rendered in subtitle-format-specific
|
||||
overlay elements. These elements have two sink pads (one for raw video
|
||||
and one for the subtitle format in question) and one raw video source pad.
|
||||
|
||||
They will take care of synchronising the two input streams, and of
|
||||
decoding and rendering the subtitles on top of the raw video stream.
|
||||
|
||||
Digression: one could theoretically have dedicated decoder/render elements
|
||||
that output an AYUV or ARGB image, and then let a videomixer element do
|
||||
the actual overlaying, but this is not very efficient, because it requires
|
||||
us to allocate and blend whole pictures (1920x1080 AYUV = 8MB,
|
||||
1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region
|
||||
is only a small rectangle at the bottom. This wastes memory and CPU.
|
||||
We could do something better by introducing a new format that only
|
||||
encodes the region(s) of interest, but we don't have such a format yet, and
|
||||
are not necessarily keen to rewrite this part of the logic in playbin
|
||||
at this point - and we can't change existing elements' behaviour, so would
|
||||
need to introduce new elements for this.
|
||||
|
||||
Playbin2 supports outputting compressed formats, i.e. it does not
|
||||
force decoding to a raw format, but is happy to output to a non-raw
|
||||
format as long as the sink supports that as well.
|
||||
|
||||
In case of certain hardware-accelerated decoding APIs, we will make use
|
||||
of that functionality. However, the decoder will not output a raw video
|
||||
format then, but some kind of hardware/API-specific format (in the caps)
|
||||
and the buffers will reference hardware/API-specific objects that
|
||||
the hardware/API-specific sink will know how to handle.
|
||||
|
||||
|
||||
=== 2. The Problem ===
|
||||
|
||||
In the case of such hardware-accelerated decoding, the decoder will not
|
||||
output raw pixels that can easily be manipulated. Instead, it will
|
||||
output hardware/API-specific objects that can later be used to render
|
||||
a frame using the same API.
|
||||
|
||||
Even if we could transform such a buffer into raw pixels, we most
|
||||
likely would want to avoid that, in order to avoid the need to
|
||||
map the data back into system memory (and then later back to the GPU).
|
||||
It's much better to upload the much smaller encoded data to the GPU/DSP
|
||||
and then leave it there until rendered.
|
||||
|
||||
Currently playbin only supports subtitles on top of raw decoded video.
|
||||
It will try to find a suitable overlay element from the plugin registry
|
||||
based on the input subtitle caps and the rank. (It is assumed that we
|
||||
will be able to convert any raw video format into any format required
|
||||
by the overlay using a converter such as videoconvert.)
|
||||
|
||||
It will not render subtitles if the video sent to the sink is not
|
||||
raw YUV or RGB or if conversions have been disabled by setting the
|
||||
native-video flag on playbin.
|
||||
|
||||
Subtitle rendering is considered an important feature. Enabling
|
||||
hardware-accelerated decoding by default should not lead to a major
|
||||
feature regression in this area.
|
||||
|
||||
This means that we need to support subtitle rendering on top of
|
||||
non-raw video.
|
||||
|
||||
|
||||
=== 3. Possible Solutions ===
|
||||
|
||||
The goal is to keep knowledge of the subtitle format within the
|
||||
format-specific GStreamer plugins, and knowledge of any specific
|
||||
video acceleration API to the GStreamer plugins implementing
|
||||
that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate
|
||||
plugins link to libva/libvdpau/etc. and we do not want to make
|
||||
the vaapi/vdpau plugins link to all of libpango/libkate/libass etc.
|
||||
|
||||
|
||||
Multiple possible solutions come to mind:
|
||||
|
||||
(a) backend-specific overlay elements
|
||||
|
||||
e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
|
||||
vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
|
||||
|
||||
This assumes the overlay can be done directly on the backend-specific
|
||||
object passed around.
|
||||
|
||||
The main drawback with this solution is that it leads to a lot of
|
||||
code duplication and may also lead to uncertainty about distributing
|
||||
certain duplicated pieces of code. The code duplication is pretty
|
||||
much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
|
||||
kate, assrender, etc. available in form of base classes to derive
|
||||
from is not really an option. Similarly, one would not really want
|
||||
the vaapi/vdpau plugin to depend on a bunch of other libraries
|
||||
such as libpango, libkate, libtiger, libass, etc.
|
||||
|
||||
One could add some new kind of overlay plugin feature though in
|
||||
combination with a generic base class of some sort, but in order
|
||||
to accommodate all the different cases and formats one would end
|
||||
up with quite convoluted/tricky API.
|
||||
|
||||
(Of course there could also be a GstFancyVideoBuffer that provides
|
||||
an abstraction for such video accelerated objects and that could
|
||||
provide an API to add overlays to it in a generic way, but in the
|
||||
end this is just a less generic variant of (c), and it is not clear
|
||||
that there are real benefits to a specialised solution vs. a more
|
||||
generic one).
|
||||
|
||||
|
||||
(b) convert backend-specific object to raw pixels and then overlay
|
||||
|
||||
Even where possible technically, this is most likely very
|
||||
inefficient.
|
||||
|
||||
|
||||
(c) attach the overlay data to the backend-specific video frame buffers
|
||||
in a generic way and do the actual overlaying/blitting later in
|
||||
backend-specific code such as the video sink (or an accelerated
|
||||
encoder/transcoder)
|
||||
|
||||
In this case, the actual overlay rendering (i.e. the actual text
|
||||
rendering or decoding DVD/DVB data into pixels) is done in the
|
||||
subtitle-format-specific GStreamer plugin. All knowledge about
|
||||
the subtitle format is contained in the overlay plugin then,
|
||||
and all knowledge about the video backend in the video backend
|
||||
specific plugin.
|
||||
|
||||
The main question then is how to get the overlay pixels (and
|
||||
we will only deal with pixels here) from the overlay element
|
||||
to the video sink.
|
||||
|
||||
This could be done in multiple ways: One could send custom
|
||||
events downstream with the overlay data, or one could attach
|
||||
the overlay data directly to the video buffers in some way.
|
||||
|
||||
Sending inline events has the advantage that is is fairly
|
||||
transparent to any elements between the overlay element and
|
||||
the video sink: if an effects plugin creates a new video
|
||||
buffer for the output, nothing special needs to be done to
|
||||
maintain the subtitle overlay information, since the overlay
|
||||
data is not attached to the buffer. However, it slightly
|
||||
complicates things at the sink, since it would also need to
|
||||
look for the new event in question instead of just processing
|
||||
everything in its buffer render function.
|
||||
|
||||
If one attaches the overlay data to the buffer directly, any
|
||||
element between overlay and video sink that creates a new
|
||||
video buffer would need to be aware of the overlay data
|
||||
attached to it and copy it over to the newly-created buffer.
|
||||
|
||||
One would have to do implement a special kind of new query
|
||||
(e.g. FEATURE query) that is not passed on automatically by
|
||||
gst_pad_query_default() in order to make sure that all elements
|
||||
downstream will handle the attached overlay data. (This is only
|
||||
a problem if we want to also attach overlay data to raw video
|
||||
pixel buffers; for new non-raw types we can just make it
|
||||
mandatory and assume support and be done with it; for existing
|
||||
non-raw types nothing changes anyway if subtitles don't work)
|
||||
(we need to maintain backwards compatibility for existing raw
|
||||
video pipelines like e.g.: ..decoder ! suboverlay ! encoder..)
|
||||
|
||||
Even though slightly more work, attaching the overlay information
|
||||
to buffers seems more intuitive than sending it interleaved as
|
||||
events. And buffers stored or passed around (e.g. via the
|
||||
"last-buffer" property in the sink when doing screenshots via
|
||||
playbin) always contain all the information needed.
|
||||
|
||||
|
||||
(d) create a video/x-raw-*-delta format and use a backend-specific videomixer
|
||||
|
||||
This possibility was hinted at already in the digression in
|
||||
section 1. It would satisfy the goal of keeping subtitle format
|
||||
knowledge in the subtitle plugins and video backend knowledge
|
||||
in the video backend plugin. It would also add a concept that
|
||||
might be generally useful (think ximagesrc capture with xdamage).
|
||||
However, it would require adding foorender variants of all the
|
||||
existing overlay elements, and changing playbin to that new
|
||||
design, which is somewhat intrusive. And given the general
|
||||
nature of such a new format/API, we would need to take a lot
|
||||
of care to be able to accommodate all possible use cases when
|
||||
designing the API, which makes it considerably more ambitious.
|
||||
Lastly, we would need to write videomixer variants for the
|
||||
various accelerated video backends as well.
|
||||
|
||||
|
||||
Overall (c) appears to be the most promising solution. It is the least
|
||||
intrusive and should be fairly straight-forward to implement with
|
||||
reasonable effort, requiring only small changes to existing elements
|
||||
and requiring no new elements.
|
||||
|
||||
Doing the final overlaying in the sink as opposed to a videomixer
|
||||
or overlay in the middle of the pipeline has other advantages:
|
||||
|
||||
- if video frames need to be dropped, e.g. for QoS reasons,
|
||||
we could also skip the actual subtitle overlaying and
|
||||
possibly the decoding/rendering as well, if the
|
||||
implementation and API allows for that to be delayed.
|
||||
|
||||
- the sink often knows the actual size of the window/surface/screen
|
||||
the output video is rendered to. This *may* make it possible to
|
||||
render the overlay image in a higher resolution than the input
|
||||
video, solving a long standing issue with pixelated subtitles on
|
||||
top of low-resolution videos that are then scaled up in the sink.
|
||||
This would require for the rendering to be delayed of course instead
|
||||
of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
|
||||
in the overlay, but that could all be supported.
|
||||
|
||||
- if the video backend / sink has support for high-quality text
|
||||
rendering (clutter?) we could just pass the text or pango markup
|
||||
to the sink and let it do the rest (this is unlikely to be
|
||||
supported in the general case - text and glyph rendering is
|
||||
hard; also, we don't really want to make up our own text markup
|
||||
system, and pango markup is probably too limited for complex
|
||||
karaoke stuff).
|
||||
|
||||
|
||||
=== 4. API needed ===
|
||||
|
||||
(a) Representation of subtitle overlays to be rendered
|
||||
|
||||
We need to pass the overlay pixels from the overlay element to the
|
||||
sink somehow. Whatever the exact mechanism, let's assume we pass
|
||||
a refcounted GstVideoOverlayComposition struct or object.
|
||||
|
||||
A composition is made up of one or more overlays/rectangles.
|
||||
|
||||
In the simplest case an overlay rectangle is just a blob of
|
||||
RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other
|
||||
metadata, and there is only one rectangle to render.
|
||||
|
||||
We're keeping the naming generic ("OverlayFoo" rather than
|
||||
"SubtitleFoo") here, since this might also be handy for
|
||||
other use cases such as e.g. logo overlays or so. It is not
|
||||
designed for full-fledged video stream mixing though.
|
||||
|
||||
// Note: don't mind the exact implementation details, they'll be hidden
|
||||
|
||||
// FIXME: might be confusing in 0.11 though since GstXOverlay was
|
||||
// renamed to GstVideoOverlay in 0.11, but not much we can do,
|
||||
// maybe we can rename GstVideoOverlay to something better
|
||||
|
||||
struct GstVideoOverlayComposition
|
||||
{
|
||||
guint num_rectangles;
|
||||
GstVideoOverlayRectangle ** rectangles;
|
||||
|
||||
/* lowest rectangle sequence number still used by the upstream
|
||||
* overlay element. This way a renderer maintaining some kind of
|
||||
* rectangles <-> surface cache can know when to free cached
|
||||
* surfaces/rectangles. */
|
||||
guint min_seq_num_used;
|
||||
|
||||
/* sequence number for the composition (same series as rectangles) */
|
||||
guint seq_num;
|
||||
}
|
||||
|
||||
struct GstVideoOverlayRectangle
|
||||
{
|
||||
/* Position on video frame and dimension of output rectangle in
|
||||
* output frame terms (already adjusted for the PAR of the output
|
||||
* frame). x/y can be negative (overlay will be clipped then) */
|
||||
gint x, y;
|
||||
guint render_width, render_height;
|
||||
|
||||
/* Dimensions of overlay pixels */
|
||||
guint width, height, stride;
|
||||
|
||||
/* This is the PAR of the overlay pixels */
|
||||
guint par_n, par_d;
|
||||
|
||||
/* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
|
||||
* and BGRA on little-endian systems (i.e. pixels are treated as
|
||||
* 32-bit values and alpha is always in the most-significant byte,
|
||||
* and blue is in the least-significant byte).
|
||||
*
|
||||
* FIXME: does anyone actually use AYUV in practice? (we do
|
||||
* in our utility function to blend on top of raw video)
|
||||
* What about AYUV and endianness? Do we always have [A][Y][U][V]
|
||||
* in memory? */
|
||||
/* FIXME: maybe use our own enum? */
|
||||
GstVideoFormat format;
|
||||
|
||||
/* Refcounted blob of memory, no caps or timestamps */
|
||||
GstBuffer *pixels;
|
||||
|
||||
// FIXME: how to express source like text or pango markup?
|
||||
// (just add source type enum + source buffer with data)
|
||||
//
|
||||
// FOR 0.10: always send pixel blobs, but attach source data in
|
||||
// addition (reason: if downstream changes, we can't renegotiate
|
||||
// that properly, if we just do a query of supported formats from
|
||||
// the start). Sink will just ignore pixels and use pango markup
|
||||
// from source data if it supports that.
|
||||
//
|
||||
// FOR 0.11: overlay should query formats (pango markup, pixels)
|
||||
// supported by downstream and then only send that. We can
|
||||
// renegotiate via the reconfigure event.
|
||||
//
|
||||
|
||||
/* sequence number: useful for backends/renderers/sinks that want
|
||||
* to maintain a cache of rectangles <-> surfaces. The value of
|
||||
* the min_seq_num_used in the composition tells the renderer which
|
||||
* rectangles have expired. */
|
||||
guint seq_num;
|
||||
|
||||
/* FIXME: we also need a (private) way to cache converted/scaled
|
||||
* pixel blobs */
|
||||
}
|
||||
|
||||
(a1) Overlay consumer API:
|
||||
|
||||
How would this work in a video sink that supports scaling of textures:
|
||||
|
||||
gst_foo_sink_render () {
|
||||
/* assume only one for now */
|
||||
if video_buffer has composition:
|
||||
composition = video_buffer.get_composition()
|
||||
|
||||
for each rectangle in composition:
|
||||
if rectangle.source_data_type == PANGO_MARKUP
|
||||
actor = text_from_pango_markup (rectangle.get_source_data())
|
||||
else
|
||||
pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
|
||||
actor = texture_from_rgba (pixels, ...)
|
||||
|
||||
.. position + scale on top of video surface ...
|
||||
}
|
||||
|
||||
(a2) Overlay producer API:
|
||||
|
||||
e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
|
||||
|
||||
if (logoverlay->cached_composition == NULL) {
|
||||
comp = composition_new ();
|
||||
|
||||
rect = rectangle_new (format, pixels_buf,
|
||||
width, height, stride, par_n, par_d,
|
||||
x, y, render_width, render_height);
|
||||
|
||||
/* composition adds its own ref for the rectangle */
|
||||
composition_add_rectangle (comp, rect);
|
||||
rectangle_unref (rect);
|
||||
|
||||
/* buffer adds its own ref for the composition */
|
||||
video_buffer_attach_composition (comp);
|
||||
|
||||
/* we take ownership of the composition and save it for later */
|
||||
logoverlay->cached_composition = comp;
|
||||
} else {
|
||||
video_buffer_attach_composition (logoverlay->cached_composition);
|
||||
}
|
||||
|
||||
FIXME: also add some API to modify render position/dimensions of
|
||||
a rectangle (probably requires creation of new rectangle, unless
|
||||
we handle writability like with other mini objects).
|
||||
|
||||
(b) Fallback overlay rendering/blitting on top of raw video
|
||||
|
||||
Eventually we want to use this overlay mechanism not only for
|
||||
hardware-accelerated video, but also for plain old raw video,
|
||||
either at the sink or in the overlay element directly.
|
||||
|
||||
Apart from the advantages listed earlier in section 3, this
|
||||
allows us to consolidate a lot of overlaying/blitting code that
|
||||
is currently repeated in every single overlay element in one
|
||||
location. This makes it considerably easier to support a whole
|
||||
range of raw video formats out of the box, add SIMD-optimised
|
||||
rendering using ORC, or handle corner cases correctly.
|
||||
|
||||
(Note: side-effect of overlaying raw video at the video sink is
|
||||
that if e.g. a screnshotter gets the last buffer via the last-buffer
|
||||
property of basesink, it would get an image without the subtitles
|
||||
on top. This could probably be fixed by re-implementing the
|
||||
property in GstVideoSink though. Playbin2 could handle this
|
||||
internally as well).
|
||||
|
||||
void
|
||||
gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
|
||||
GstBuffer * video_buf)
|
||||
{
|
||||
guint n;
|
||||
|
||||
g_return_if_fail (gst_buffer_is_writable (video_buf));
|
||||
g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
|
||||
|
||||
... parse video_buffer caps into BlendVideoFormatInfo ...
|
||||
|
||||
for each rectangle in the composition: {
|
||||
|
||||
if (gst_video_format_is_yuv (video_buf_format)) {
|
||||
overlay_format = FORMAT_AYUV;
|
||||
} else if (gst_video_format_is_rgb (video_buf_format)) {
|
||||
overlay_format = FORMAT_ARGB;
|
||||
} else {
|
||||
/* FIXME: grayscale? */
|
||||
return;
|
||||
}
|
||||
|
||||
/* this will scale and convert AYUV<->ARGB if needed */
|
||||
pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
|
||||
|
||||
... clip output rectangle ...
|
||||
|
||||
__do_blend (video_buf_format, video_buf->data,
|
||||
overlay_format, pixels->data,
|
||||
x, y, width, height, stride);
|
||||
|
||||
gst_buffer_unref (pixels);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
(c) Flatten all rectangles in a composition
|
||||
|
||||
We cannot assume that the video backend API can handle any
|
||||
number of rectangle overlays, it's possible that it only
|
||||
supports one single overlay, in which case we need to squash
|
||||
all rectangles into one.
|
||||
|
||||
However, we'll just declare this a corner case for now, and
|
||||
implement it only if someone actually needs it. It's easy
|
||||
to add later API-wise. Might be a bit tricky if we have
|
||||
rectangles with different PARs/formats (e.g. subs and a logo),
|
||||
though we could probably always just use the code from (b)
|
||||
with a fully transparent video buffer to create a flattened
|
||||
overlay buffer.
|
||||
|
||||
(d) core API: new FEATURE query
|
||||
|
||||
For 0.10 we need to add a FEATURE query, so the overlay element
|
||||
can query whether the sink downstream and all elements between
|
||||
the overlay element and the sink support the new overlay API.
|
||||
Elements in between need to support it because the render
|
||||
positions and dimensions need to be updated if the video is
|
||||
cropped or rescaled, for example.
|
||||
|
||||
In order to ensure that all elements support the new API,
|
||||
we need to drop the query in the pad default query handler
|
||||
(so it only succeeds if all elements handle it explicitly).
|
||||
|
||||
Might want two variants of the feature query - one where
|
||||
all elements in the chain need to support it explicitly
|
||||
and one where it's enough if some element downstream
|
||||
supports it.
|
||||
|
||||
In 0.11 this could probably be handled via GstMeta and
|
||||
ALLOCATION queries (and/or we could simply require
|
||||
elements to be aware of this API from the start).
|
||||
|
||||
There appears to be no issue with downstream possibly
|
||||
not being linked yet at the time when an overlay would
|
||||
want to do such a query.
|
||||
|
||||
|
||||
Other considerations:
|
||||
|
||||
- renderers (overlays or sinks) may be able to handle only ARGB or only AYUV
|
||||
(for most graphics/hw-API it's likely ARGB of some sort, while our
|
||||
blending utility functions will likely want the same colour space as
|
||||
the underlying raw video format, which is usually YUV of some sort).
|
||||
We need to convert where required, and should cache the conversion.
|
||||
|
||||
- renderers may or may not be able to scale the overlay. We need to
|
||||
do the scaling internally if not (simple case: just horizontal scaling
|
||||
to adjust for PAR differences; complex case: both horizontal and vertical
|
||||
scaling, e.g. if subs come from a different source than the video or the
|
||||
video has been rescaled or cropped between overlay element and sink).
|
||||
|
||||
- renderers may be able to generate (possibly scaled) pixels on demand
|
||||
from the original data (e.g. a string or RLE-encoded data). We will
|
||||
ignore this for now, since this functionality can still be added later
|
||||
via API additions. The most interesting case would be to pass a pango
|
||||
markup string, since e.g. clutter can handle that natively.
|
||||
|
||||
- renderers may be able to write data directly on top of the video pixels
|
||||
(instead of creating an intermediary buffer with the overlay which is
|
||||
then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay
|
||||
|
||||
However, in the interest of simplicity, we should probably ignore the
|
||||
fact that some elements can blend their overlays directly on top of the
|
||||
video (decoding/uncompressing them on the fly), even more so as it's
|
||||
not obvious that it's actually faster to decode the same overlay
|
||||
70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend
|
||||
it 70-90 times instead of decoding it once into a temporary buffer
|
||||
and then blending it directly from there, possibly SIMD-accelerated.
|
||||
Also, this is only relevant if the video is raw video and not some
|
||||
hardware-acceleration backend object.
|
||||
|
||||
And ultimately it is the overlay element that decides whether to do
|
||||
the overlay right there and then or have the sink do it (if supported).
|
||||
It could decide to keep doing the overlay itself for raw video and
|
||||
only use our new API for non-raw video.
|
||||
|
||||
- renderers may want to make sure they only upload the overlay pixels once
|
||||
per rectangle if that rectangle recurs in subsequent frames (as part of
|
||||
the same composition or a different composition), as is likely. This caching
|
||||
of e.g. surfaces needs to be done renderer-side and can be accomplished
|
||||
based on the sequence numbers. The composition contains the lowest
|
||||
sequence number still in use upstream (an overlay element may want to
|
||||
cache created compositions+rectangles as well after all to re-use them
|
||||
for multiple frames), based on that the renderer can expire cached
|
||||
objects. The caching needs to be done renderer-side because attaching
|
||||
renderer-specific objects to the rectangles won't work well given the
|
||||
refcounted nature of rectangles and compositions, making it unpredictable
|
||||
when a rectangle or composition will be freed or from which thread
|
||||
context it will be freed. The renderer-specific objects are likely bound
|
||||
to other types of renderer-specific contexts, and need to be managed
|
||||
in connection with those.
|
||||
|
||||
- composition/rectangles should internally provide a certain degree of
|
||||
thread-safety. Multiple elements (sinks, overlay element) might access
|
||||
or use the same objects from multiple threads at the same time, and it
|
||||
is expected that elements will keep a ref to compositions and rectangles
|
||||
they push downstream for a while, e.g. until the current subtitle
|
||||
composition expires.
|
||||
|
||||
=== 5. Future considerations ===
|
||||
|
||||
- alternatives: there may be multiple versions/variants of the same subtitle
|
||||
stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same
|
||||
subtitles. We could attach both variants and let the renderer pick the best
|
||||
one for the situation (currently we just use the 16:9 version). With totem,
|
||||
it's ultimately totem that adds the 'black bars' at the top/bottom, so totem
|
||||
also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which
|
||||
may render on top of the bars) or not, for example.
|
||||
|
||||
=== 6. Misc. FIXMEs ===
|
||||
|
||||
TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay
|
||||
|
||||
gst-launch-0.10 \
|
||||
videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
||||
videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
||||
videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink
|
||||
|
||||
~~~ THE END ~~~
|
||||
|
|
@ -1,107 +0,0 @@
|
|||
Interlaced Video
|
||||
================
|
||||
|
||||
Video buffers have a number of states identifiable through a combination of caps
|
||||
and buffer flags.
|
||||
|
||||
Possible states:
|
||||
- Progressive
|
||||
- Interlaced
|
||||
- Plain
|
||||
- One field
|
||||
- Two fields
|
||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
||||
field that can be used for telecine pulldown
|
||||
- Telecine
|
||||
- One field
|
||||
- Two fields
|
||||
- Progressive
|
||||
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
|
||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
||||
field that can be used for telecine pulldown
|
||||
|
||||
Note: It can be seen that the difference between the plain interlaced and
|
||||
telecine states is that in the telecine state, buffers containing two fields may
|
||||
be progressive.
|
||||
|
||||
Tools for identification:
|
||||
- GstVideoInfo
|
||||
- GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_...
|
||||
- PROGRESSIVE
|
||||
- INTERLEAVED
|
||||
- MIXED
|
||||
- Buffers flags - GST_VIDEO_BUFFER_FLAG_...
|
||||
- TFF
|
||||
- RFF
|
||||
- ONEFIELD
|
||||
- INTERLACED
|
||||
|
||||
|
||||
Identification of Buffer States
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Note that flags are not necessarily interpreted in the same way for all
|
||||
different states nor are they necessarily required nor make sense in all cases.
|
||||
|
||||
|
||||
Progressive
|
||||
...........
|
||||
|
||||
If the interlace mode in the video info corresponding to a buffer is
|
||||
"progressive", then the buffer is progressive.
|
||||
|
||||
|
||||
Plain Interlaced
|
||||
................
|
||||
|
||||
If the video info interlace mode is "interleaved", then the buffer is plain
|
||||
interlaced.
|
||||
|
||||
GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be
|
||||
displayed first. The timestamp on the buffer corresponds to the first field.
|
||||
|
||||
GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag)
|
||||
should be repeated. This is generally only used for telecine purposes but as the
|
||||
telecine state was added long after the interlaced state was added and defined,
|
||||
this flag remains valid for plain interlaced buffers.
|
||||
|
||||
GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF
|
||||
flag is to be used. The other field should be ignored.
|
||||
|
||||
|
||||
Telecine
|
||||
........
|
||||
|
||||
If video info interlace mode is "mixed" then the buffers are in some form of
|
||||
telecine state.
|
||||
|
||||
The TFF and ONEFIELD flags have the same semantics as for the plain interlaced
|
||||
state.
|
||||
|
||||
GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains
|
||||
only repeated fields that are present in other buffers and are as such
|
||||
unneeded. For example, in a sequence of three telecined frames, we might have:
|
||||
|
||||
AtAb AtBb BtBb
|
||||
|
||||
In this situation, we only need the first and third buffers as the second
|
||||
buffer contains fields present in the first and third.
|
||||
|
||||
Note that the following state can have its second buffer identified using the
|
||||
ONEFIELD flag (and TFF not set):
|
||||
|
||||
AtAb AtBb BtCb
|
||||
|
||||
The telecine state requires one additional flag to be able to identify
|
||||
progressive buffers.
|
||||
|
||||
The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an
|
||||
'interlaced' or 'mixed' buffer that contains two fields that, when combined
|
||||
with fields from adjacent buffers, allow reconstruction of progressive frames.
|
||||
The absence of the flag implies the buffer containing two fields is a
|
||||
progressive frame.
|
||||
|
||||
For example in the following sequence, the third buffer would be mixed (yes, it
|
||||
is a strange pattern, but it can happen):
|
||||
|
||||
AtAb AtBb BtCb CtDb DtDb
|
|
@ -1,76 +0,0 @@
|
|||
Media Types
|
||||
-----------
|
||||
|
||||
audio/x-raw
|
||||
|
||||
format, G_TYPE_STRING, mandatory
|
||||
The format of the audio samples, see the Formats section for a list
|
||||
of valid sample formats.
|
||||
|
||||
rate, G_TYPE_INT, mandatory
|
||||
The samplerate of the audio
|
||||
|
||||
channels, G_TYPE_INT, mandatory
|
||||
The number of channels
|
||||
|
||||
channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels
|
||||
Bitmask of channel positions present. May be omitted for mono and
|
||||
stereo. May be set to 0 to denote that the channels are unpositioned.
|
||||
|
||||
layout, G_TYPE_STRING, mandatory
|
||||
The layout of channels within a buffer. Possible values are
|
||||
"interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR)
|
||||
|
||||
Use GstAudioInfo and related helper API to create and parse raw audio caps.
|
||||
|
||||
|
||||
Metadata
|
||||
--------
|
||||
|
||||
"GstAudioDownmixMeta"
|
||||
A matrix for downmixing multichannel audio to a lower numer of channels.
|
||||
|
||||
|
||||
Formats
|
||||
-------
|
||||
|
||||
The following values can be used for the format string property.
|
||||
|
||||
"S8" 8-bit signed PCM audio
|
||||
"U8" 8-bit unsigned PCM audio
|
||||
|
||||
"S16LE" 16-bit signed PCM audio
|
||||
"S16BE" 16-bit signed PCM audio
|
||||
"U16LE" 16-bit unsigned PCM audio
|
||||
"U16BE" 16-bit unsigned PCM audio
|
||||
|
||||
"S24_32LE" 24-bit signed PCM audio packed into 32-bit
|
||||
"S24_32BE" 24-bit signed PCM audio packed into 32-bit
|
||||
"U24_32LE" 24-bit unsigned PCM audio packed into 32-bit
|
||||
"U24_32BE" 24-bit unsigned PCM audio packed into 32-bit
|
||||
|
||||
"S32LE" 32-bit signed PCM audio
|
||||
"S32BE" 32-bit signed PCM audio
|
||||
"U32LE" 32-bit unsigned PCM audio
|
||||
"U32BE" 32-bit unsigned PCM audio
|
||||
|
||||
"S24LE" 24-bit signed PCM audio
|
||||
"S24BE" 24-bit signed PCM audio
|
||||
"U24LE" 24-bit unsigned PCM audio
|
||||
"U24BE" 24-bit unsigned PCM audio
|
||||
|
||||
"S20LE" 20-bit signed PCM audio
|
||||
"S20BE" 20-bit signed PCM audio
|
||||
"U20LE" 20-bit unsigned PCM audio
|
||||
"U20BE" 20-bit unsigned PCM audio
|
||||
|
||||
"S18LE" 18-bit signed PCM audio
|
||||
"S18BE" 18-bit signed PCM audio
|
||||
"U18LE" 18-bit unsigned PCM audio
|
||||
"U18BE" 18-bit unsigned PCM audio
|
||||
|
||||
"F32LE" 32-bit floating-point audio
|
||||
"F32BE" 32-bit floating-point audio
|
||||
"F64LE" 64-bit floating-point audio
|
||||
"F64BE" 64-bit floating-point audio
|
||||
|
|
@ -1,28 +0,0 @@
|
|||
Media Types
|
||||
-----------
|
||||
|
||||
text/x-raw
|
||||
|
||||
format, G_TYPE_STRING, mandatory
|
||||
The format of the text, see the Formats section for a list of valid format
|
||||
strings.
|
||||
|
||||
Metadata
|
||||
--------
|
||||
|
||||
There are no common metas for this raw format yet.
|
||||
|
||||
Formats
|
||||
-------
|
||||
|
||||
"utf8" plain timed utf8 text (formerly text/plain)
|
||||
|
||||
Parsed timed text in utf8 format.
|
||||
|
||||
"pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup)
|
||||
|
||||
Same as "utf8", but text embedded in an XML-style markup language for
|
||||
size, colour, emphasis, etc.
|
||||
|
||||
See http://developer.gnome.org/pango/stable/PangoMarkupFormat.html
|
||||
|
File diff suppressed because it is too large
Load diff
|
@ -1,69 +0,0 @@
|
|||
playbin
|
||||
--------
|
||||
|
||||
The purpose of this element is to decode and render the media contained in a
|
||||
given generic uri. The element extends GstPipeline and is typically used in
|
||||
playback situations.
|
||||
|
||||
Required features:
|
||||
|
||||
- accept and play any valid uri. This includes
|
||||
- rendering video/audio
|
||||
- overlaying subtitles on the video
|
||||
- optionally read external subtitle files
|
||||
- allow for hardware (non raw) sinks
|
||||
- selection of audio/video/subtitle streams based on language.
|
||||
- perform network buffering/incremental download
|
||||
- gapless playback
|
||||
- support for visualisations with configurable sizes
|
||||
- ability to reject files that are too big, or of a format that would require
|
||||
too much CPU/memory usage.
|
||||
- be very efficient with adding elements such as converters to reduce the
|
||||
amount of negotiation that has to happen.
|
||||
- handle chained oggs. This includes having support for dynamic pad add and
|
||||
remove from a demuxer.
|
||||
|
||||
Components
|
||||
----------
|
||||
|
||||
* decodebin2
|
||||
|
||||
- performs the autoplugging of demuxers/decoders
|
||||
- emits signals when for steering the autoplugging
|
||||
- to decide if a non-raw media format is acceptable as output
|
||||
- to sort the possible decoders for a non-raw format
|
||||
- see also decodebin2 design doc
|
||||
|
||||
* uridecodebin
|
||||
|
||||
- combination of a source to handle the given uri, an optional queueing element
|
||||
and one or more decodebin2 elements to decode the non-raw streams.
|
||||
|
||||
* playsink
|
||||
|
||||
- handles display of audio/video/text.
|
||||
- has request audio/video/text input pad. There is only one sinkpad per type.
|
||||
The requested pads define the configuration of the internal pipeline.
|
||||
- allows for setting audio/video sinks or does automatic sink selection.
|
||||
- allows for configuration of visualisation element.
|
||||
- allows for enable/disable of visualisation, audio and video.
|
||||
|
||||
* playbin
|
||||
|
||||
- combination of one or more uridecodebin elements to read the uri and subtitle
|
||||
uri.
|
||||
- support for queuing new media to support gapless playback.
|
||||
- handles stream selection.
|
||||
- uses playsink to display.
|
||||
- selection of sinks and configuration of uridecodebin with raw output formats.
|
||||
|
||||
|
||||
Gapless playback
|
||||
----------------
|
||||
|
||||
playbin has an "about-to-finish" signal. The application should configure a new
|
||||
uri (and optional suburi) in the callback. When the current media finishes, this
|
||||
new media will be played next.
|
||||
|
||||
|
||||
|
|
@ -1,278 +0,0 @@
|
|||
Design for Stereoscopic & Multiview Video Handling
|
||||
==================================================
|
||||
|
||||
There are two cases to handle:
|
||||
|
||||
* Encoded video output from a demuxer to parser / decoder or from encoders into a muxer.
|
||||
* Raw video buffers
|
||||
|
||||
The design below is somewhat based on the proposals from
|
||||
[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
|
||||
|
||||
Multiview is used as a generic term to refer to handling both
|
||||
stereo content (left and right eye only) as well as extensions for videos
|
||||
containing multiple independent viewpoints.
|
||||
|
||||
Encoded Signalling
|
||||
------------------
|
||||
This is regarding the signalling in caps and buffers from demuxers to
|
||||
parsers (sometimes) or out from encoders.
|
||||
|
||||
For backward compatibility with existing codecs many transports of
|
||||
stereoscopic 3D content use normal 2D video with 2 views packed spatially
|
||||
in some way, and put extra new descriptions in the container/mux.
|
||||
|
||||
Info in the demuxer seems to apply to stereo encodings only. For all
|
||||
MVC methods I know, the multiview encoding is in the video bitstream itself
|
||||
and therefore already available to decoders. Only stereo systems have been retro-fitted
|
||||
into the demuxer.
|
||||
|
||||
Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
|
||||
and it would be useful to be able to put the info onto caps and buffers from the
|
||||
parser without decoding.
|
||||
|
||||
To handle both cases, we need to be able to output the required details on
|
||||
encoded video for decoders to apply onto the raw video buffers they decode.
|
||||
|
||||
*If there ever is a need to transport multiview info for encoded data the
|
||||
same system below for raw video or some variation should work*
|
||||
|
||||
### Encoded Video: Properties that need to be encoded into caps
|
||||
1. multiview-mode (called "Channel Layout" in bug 611157)
|
||||
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
|
||||
(switches between mono and stereo - mp4 can do this)
|
||||
* Uses a buffer flag to mark individual buffers as mono or "not mono"
|
||||
(single|stereo|multiview) for mixed scenarios. The alternative (not
|
||||
proposed) is for the demuxer to switch caps for each mono to not-mono
|
||||
change, and not used a 'mixed' caps variant at all.
|
||||
* _single_ refers to a stream of buffers that only contain 1 view.
|
||||
It is different from mono in that the stream is a marked left or right
|
||||
eye stream for later combining in a mixer or when displaying.
|
||||
* _multiple_ marks a stream with multiple independent views encoded.
|
||||
It is included in this list for completeness. As noted above, there's
|
||||
currently no scenario that requires marking encoded buffers as MVC.
|
||||
2. Frame-packing arrangements / view sequence orderings
|
||||
* Possible frame packings: side-by-side, side-by-side-quincunx,
|
||||
column-interleaved, row-interleaved, top-bottom, checker-board
|
||||
* bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
|
||||
I think that's covered by suitably adjusting pixel-aspect-ratio. If
|
||||
not, they can be added later.
|
||||
* _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
|
||||
* _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
|
||||
* _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
|
||||
1 pixel offset of each eye needs to be accounted when upscaling or displaying
|
||||
* there may be other packings (future expansion)
|
||||
* Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
|
||||
* _frame-by-frame_, each buffer is left, then right view etc
|
||||
* _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
|
||||
Demuxer info indicates which one is which.
|
||||
Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
|
||||
-> *Leave this for future expansion*
|
||||
* _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
|
||||
-> *Leave this for future expansion / deletion*
|
||||
3. view encoding order
|
||||
* Describes how to decide which piece of each frame corresponds to left or right eye
|
||||
* Possible orderings left, right, left-then-right, right-then-left
|
||||
- Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
|
||||
- Need a buffer flag for marking the first buffer of a group.
|
||||
4. "Frame layout flags"
|
||||
* flags for view specific interpretation
|
||||
* horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
|
||||
Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
|
||||
* This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
|
||||
* It might be better just to use a hex value / integer
|
||||
|
||||
Buffer representation for raw video
|
||||
-----------------------------------
|
||||
* Transported as normal video buffers with extra metadata
|
||||
* The caps define the overall buffer width/height, with helper functions to
|
||||
extract the individual views for packed formats
|
||||
* pixel-aspect-ratio adjusted if needed to double the overall width/height
|
||||
* video sinks that don't know about multiview extensions yet will show the packed view as-is
|
||||
For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps
|
||||
can disallow those transports.
|
||||
* _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice
|
||||
the height, so can be handled by adjusting the overall caps and strides
|
||||
* Other exotic layouts need new pixel formats defined (checker-board, column-interleaved, side-by-side-quincunx)
|
||||
* _Frame-by-frame_ - one view per buffer, but with alternating metas marking which buffer is which left/right/other view and using a new buffer flag as described above
|
||||
to mark the start of a group of corresponding frames.
|
||||
* New video caps addition as for encoded buffers
|
||||
|
||||
### Proposed Caps fields
|
||||
Combining the requirements above and collapsing the combinations into mnemonics:
|
||||
|
||||
* multiview-mode =
|
||||
mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
|
||||
frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
|
||||
mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
|
||||
* multiview-flags =
|
||||
+ 0x0000 none
|
||||
+ 0x0001 right-view-first
|
||||
+ 0x0002 left-h-flipped
|
||||
+ 0x0004 left-v-flipped
|
||||
+ 0x0008 right-h-flipped
|
||||
+ 0x0010 right-v-flipped
|
||||
|
||||
### Proposed new buffer flags
|
||||
Add two new GST_VIDEO_BUFFER flags in video-frame.h and make it clear that those
|
||||
flags can apply to encoded video buffers too. wtay says that's currently the
|
||||
case anyway, but the documentation should say it.
|
||||
|
||||
**GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW** - Marks a buffer as representing non-mono content, although it may be a single (left or right) eye view.
|
||||
**GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE** - for frame-sequential methods of transport, mark the "first" of a left/right/other group of frames
|
||||
|
||||
### A new GstMultiviewMeta
|
||||
This provides a place to describe all provided views in a buffer / stream,
|
||||
and through Meta negotiation to inform decoders about which views to decode if
|
||||
not all are wanted.
|
||||
|
||||
* Logical labels/names and mapping to GstVideoMeta numbers
|
||||
* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
|
||||
|
||||
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
|
||||
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
|
||||
|
||||
struct GstVideoMultiviewViewInfo {
|
||||
guint view_label;
|
||||
guint meta_id; // id of the GstVideoMeta for this view
|
||||
|
||||
padding;
|
||||
}
|
||||
|
||||
struct GstVideoMultiviewMeta {
|
||||
guint n_views;
|
||||
GstVideoMultiviewViewInfo *view_info;
|
||||
}
|
||||
|
||||
The meta is optional, and probably only useful later for MVC
|
||||
|
||||
|
||||
Outputting stereo content
|
||||
-------------------------
|
||||
The initial implementation for output will be stereo content in glimagesink
|
||||
|
||||
### Output Considerations with OpenGL
|
||||
* If we have support for stereo GL buffer formats, we can output separate left/right eye images and let the hardware take care of display.
|
||||
* Otherwise, glimagesink needs to render one window with left/right in a suitable frame packing
|
||||
and that will only show correctly in fullscreen on a device set for the right 3D packing -> requires app intervention to set the video mode.
|
||||
* Which could be done manually on the TV, or with HDMI 1.4 by setting the right video mode for the screen to inform the TV or third option, we
|
||||
support rendering to two separate overlay areas on the screen - one for left eye, one for right which can be supported using the 'splitter' element and 2 output sinks or, better, add a 2nd window overlay for split stereo output
|
||||
* Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so initial implementation won't include that
|
||||
|
||||
## Other elements for handling multiview content
|
||||
* videooverlay interface extensions
|
||||
* __Q__: Should this be a new interface?
|
||||
* Element message to communicate the presence of stereoscopic information to the app
|
||||
* App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
|
||||
* Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
|
||||
* New API for the app to set rendering options for stereo/multiview content
|
||||
* This might be best implemented as a **multiview GstContext**, so that
|
||||
the pipeline can share app preferences for content interpretation and downmixing
|
||||
to mono for output, or in the sink and have those down as far upstream/downstream as possible.
|
||||
* Converter element
|
||||
* convert different view layouts
|
||||
* Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
|
||||
* Mixer element
|
||||
* take 2 video streams and output as stereo
|
||||
* later take n video streams
|
||||
* share code with the converter, it just takes input from n pads instead of one.
|
||||
* Splitter element
|
||||
* Output one pad per view
|
||||
|
||||
### Implementing MVC handling in decoders / parsers (and encoders)
|
||||
Things to do to implement MVC handling
|
||||
|
||||
1. Parsing SEI in h264parse and setting caps (patches available in
|
||||
bugzilla for parsing, see below)
|
||||
2. Integrate gstreamer-vaapi MVC support with this proposal
|
||||
3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
|
||||
4. generating SEI in H.264 encoder
|
||||
5. Support for MPEG2 MVC extensions
|
||||
|
||||
## Relevant bugs
|
||||
[bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
|
||||
[bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
|
||||
[bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
|
||||
|
||||
## Other Information
|
||||
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
|
||||
|
||||
## Open Questions
|
||||
|
||||
### Background
|
||||
|
||||
### Representation for GstGL
|
||||
When uploading raw video frames to GL textures, the goal is to implement:
|
||||
|
||||
2. Split packed frames into separate GL textures when uploading, and
|
||||
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
|
||||
multiview-flags fields in the caps should change to reflect the conversion
|
||||
from one incoming GstMemory to multiple GstGLMemory, and change the
|
||||
width/height in the output info as needed.
|
||||
|
||||
This is (currently) targetted as 2 render passes - upload as normal
|
||||
to a single stereo-packed RGBA texture, and then unpack into 2
|
||||
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
|
||||
2 GstGLMemory attached to one buffer. We can optimise the upload later
|
||||
to go directly to 2 textures for common input formats.
|
||||
|
||||
Separat output textures have a few advantages:
|
||||
|
||||
* Filter elements can more easily apply filters in several passes to each
|
||||
texture without fundamental changes to our filters to avoid mixing pixels
|
||||
from separate views.
|
||||
* Centralises the sampling of input video frame packings in the upload code,
|
||||
which makes adding new packings in the future easier.
|
||||
* Sampling multiple textures to generate various output frame-packings
|
||||
for display is conceptually simpler than converting from any input packing
|
||||
to any output packing.
|
||||
* In implementations that support quad buffers, having separate textures
|
||||
makes it trivial to do GL_LEFT/GL_RIGHT output
|
||||
|
||||
For either option, we'll need new glsink output API to pass more
|
||||
information to applications about multiple views for the draw signal/callback.
|
||||
|
||||
I don't know if it's desirable to support *both* methods of representing
|
||||
views. If so, that should be signalled in the caps too. That could be a
|
||||
new multiview-mode for passing views in separate GstMemory objects
|
||||
attached to a GstBuffer, which would not be GL specific.
|
||||
|
||||
### Overriding frame packing interpretation
|
||||
Most sample videos available are frame packed, with no metadata
|
||||
to say so. How should we override that interpretation?
|
||||
|
||||
* Simple answer: Use capssetter + new properties on playbin to
|
||||
override the multiview fields
|
||||
*Basically implemented in playbin, using a pad probe. Needs more work for completeness*
|
||||
|
||||
### Adding extra GstVideoMeta to buffers
|
||||
There should be one GstVideoMeta for the entire video frame in packed
|
||||
layouts, and one GstVideoMeta per GstGLMemory when views are attached
|
||||
to a GstBuffer separately. This should be done by the buffer pool,
|
||||
which knows from the caps.
|
||||
|
||||
### videooverlay interface extensions
|
||||
GstVideoOverlay needs:
|
||||
|
||||
* A way to announce the presence of multiview content when it is
|
||||
detected/signalled in a stream.
|
||||
* A way to tell applications which output methods are supported/available
|
||||
* A way to tell the sink which output method it should use
|
||||
* Possibly a way to tell the sink to override the input frame
|
||||
interpretation / caps - depends on the answer to the question
|
||||
above about how to model overriding input interpretation.
|
||||
|
||||
### What's implemented
|
||||
* Caps handling
|
||||
* gst-plugins-base libsgstvideo pieces
|
||||
* playbin caps overriding
|
||||
* conversion elements - glstereomix, gl3dconvert (needs a rename),
|
||||
glstereosplit.
|
||||
|
||||
### Possible future enhancements
|
||||
* Make GLupload split to separate textures at upload time?
|
||||
* Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
|
||||
* Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
|
||||
- current done by packing then downloading which isn't OK overhead for RGBA download
|
||||
* Think about how we integrate GLstereo - do we need to do anything special,
|
||||
or can the app just render to stereo/quad buffers if they're available?
|
Loading…
Reference in a new issue