mirror of
https://gitlab.freedesktop.org/gstreamer/gstreamer.git
synced 2025-04-26 06:54:49 +00:00
docs: design: move most design docs to gst-docs module
This commit is contained in:
parent
49653b058a
commit
46138b1b1d
13 changed files with 1 additions and 3652 deletions
|
@ -2,16 +2,5 @@ SUBDIRS =
|
||||||
|
|
||||||
|
|
||||||
EXTRA_DIST = \
|
EXTRA_DIST = \
|
||||||
design-audiosinks.txt \
|
|
||||||
design-decodebin.txt \
|
|
||||||
design-encoding.txt \
|
|
||||||
design-orc-integration.txt \
|
|
||||||
draft-hw-acceleration.txt \
|
draft-hw-acceleration.txt \
|
||||||
draft-keyframe-force.txt \
|
draft-va.txt
|
||||||
draft-subtitle-overlays.txt\
|
|
||||||
draft-va.txt \
|
|
||||||
part-interlaced-video.txt \
|
|
||||||
part-mediatype-audio-raw.txt\
|
|
||||||
part-mediatype-text-raw.txt\
|
|
||||||
part-mediatype-video-raw.txt\
|
|
||||||
part-playbin.txt
|
|
||||||
|
|
|
@ -1,138 +0,0 @@
|
||||||
Audiosink design
|
|
||||||
----------------
|
|
||||||
|
|
||||||
Requirements:
|
|
||||||
|
|
||||||
- must operate chain based.
|
|
||||||
Most simple playback pipelines will push audio from the decoders
|
|
||||||
into the audio sink.
|
|
||||||
|
|
||||||
- must operate getrange based
|
|
||||||
Most professional audio applications will operate in a mode where
|
|
||||||
the audio sink pulls samples from the pipeline. This is typically
|
|
||||||
done in a callback from the audiosink requesting N samples. The
|
|
||||||
callback is either scheduled from a thread or from an interrupt
|
|
||||||
from the audio hardware device.
|
|
||||||
|
|
||||||
- Exact sample accurate clocks.
|
|
||||||
the audiosink must be able to provide a clock that is sample
|
|
||||||
accurate even if samples are dropped or when discontinuities are
|
|
||||||
found in the stream.
|
|
||||||
|
|
||||||
- Exact timing of playback.
|
|
||||||
The audiosink must be able to play samples at their exact times.
|
|
||||||
|
|
||||||
- use DMA access when possible.
|
|
||||||
When the hardware can do DMA we should use it. This should also
|
|
||||||
work over bufferpools to avoid data copying to/from kernel space.
|
|
||||||
|
|
||||||
|
|
||||||
Design:
|
|
||||||
|
|
||||||
The design is based on a set of base classes and the concept of a
|
|
||||||
ringbuffer of samples.
|
|
||||||
|
|
||||||
+-----------+ - provide preroll, rendering, timing
|
|
||||||
+ basesink + - caps nego
|
|
||||||
+-----+-----+
|
|
||||||
|
|
|
||||||
+-----V----------+ - manages ringbuffer
|
|
||||||
+ audiobasesink + - manages scheduling (push/pull)
|
|
||||||
+-----+----------+ - manages clock/query/seek
|
|
||||||
| - manages scheduling of samples in the ringbuffer
|
|
||||||
| - manages caps parsing
|
|
||||||
|
|
|
||||||
+-----V------+ - default ringbuffer implementation with a GThread
|
|
||||||
+ audiosink + - subclasses provide open/read/close methods
|
|
||||||
+------------+
|
|
||||||
|
|
||||||
The ringbuffer is a contiguous piece of memory divided into segtotal
|
|
||||||
pieces of segments. Each segment has segsize bytes.
|
|
||||||
|
|
||||||
play position
|
|
||||||
v
|
|
||||||
+---+---+---+-------------------------------------+----------+
|
|
||||||
+ 0 | 1 | 2 | .... | segtotal |
|
|
||||||
+---+---+---+-------------------------------------+----------+
|
|
||||||
<--->
|
|
||||||
segsize bytes = N samples * bytes_per_sample.
|
|
||||||
|
|
||||||
|
|
||||||
The ringbuffer has a play position, which is expressed in
|
|
||||||
segments. The play position is where the device is currently reading
|
|
||||||
samples from the buffer.
|
|
||||||
|
|
||||||
The ringbuffer can be put to the PLAYING or STOPPED state.
|
|
||||||
|
|
||||||
In the STOPPED state no samples are played to the device and the play
|
|
||||||
pointer does not advance.
|
|
||||||
|
|
||||||
In the PLAYING state samples are written to the device and the ringbuffer
|
|
||||||
should call a configurable callback after each segment is written to the
|
|
||||||
device. In this state the play pointer is advanced after each segment is
|
|
||||||
written.
|
|
||||||
|
|
||||||
A write operation to the ringbuffer will put new samples in the ringbuffer.
|
|
||||||
If there is not enough space in the ringbuffer, the write operation will
|
|
||||||
block. The playback of the buffer never stops, even if the buffer is
|
|
||||||
empty. When the buffer is empty, silence is played by the device.
|
|
||||||
|
|
||||||
The ringbuffer is implemented with lockfree atomic operations, especially
|
|
||||||
on the reading side so that low-latency operations are possible.
|
|
||||||
|
|
||||||
Whenever new samples are to be put into the ringbuffer, the position of the
|
|
||||||
read pointer is taken. The required write position is taken and the diff
|
|
||||||
is made between the required and actual position. If the difference is <0,
|
|
||||||
the sample is too late. If the difference is bigger than segtotal, the
|
|
||||||
writing part has to wait for the play pointer to advance.
|
|
||||||
|
|
||||||
|
|
||||||
Scheduling:
|
|
||||||
|
|
||||||
- chain based mode:
|
|
||||||
|
|
||||||
In chain based mode, bytes are written into the ringbuffer. This operation
|
|
||||||
will eventually block when the ringbuffer is filled.
|
|
||||||
|
|
||||||
When no samples arrive in time, the ringbuffer will play silence. Each
|
|
||||||
buffer that arrives will be placed into the ringbuffer at the correct
|
|
||||||
times. This means that dropping samples or inserting silence is done
|
|
||||||
automatically and very accurate and independend of the play pointer.
|
|
||||||
|
|
||||||
In this mode, the ringbuffer is usually kept as full as possible. When
|
|
||||||
using a small buffer (small segsize and segtotal), the latency for audio
|
|
||||||
to start from the sink to when it is played can be kept low but at least
|
|
||||||
one context switch has to be made between read and write.
|
|
||||||
|
|
||||||
- getrange based mode
|
|
||||||
|
|
||||||
In getrange based mode, the audiobasesink will use the callback function
|
|
||||||
of the ringbuffer to get a segsize samples from the peer element. These
|
|
||||||
samples will then be placed in the ringbuffer at the next play position.
|
|
||||||
It is assumed that the getrange function returns fast enough to fill the
|
|
||||||
ringbuffer before the play pointer reaches the write pointer.
|
|
||||||
|
|
||||||
In this mode, the ringbuffer is usually kept as empty as possible. There
|
|
||||||
is no context switch needed between the elements that create the samples
|
|
||||||
and the actual writing of the samples to the device.
|
|
||||||
|
|
||||||
|
|
||||||
DMA mode:
|
|
||||||
|
|
||||||
- Elements that can do DMA based access to the audio device have to subclass
|
|
||||||
from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
|
|
||||||
of GstRingBuffer.
|
|
||||||
|
|
||||||
The ringbuffer subclass should trigger a callback after writing or playing
|
|
||||||
each sample to the device. This callback can be triggered from a thread or
|
|
||||||
from a signal from the audio device.
|
|
||||||
|
|
||||||
|
|
||||||
Clocks:
|
|
||||||
|
|
||||||
The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
|
|
||||||
It can do this by using the play pointer and the delay to calculate the
|
|
||||||
clock time.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,274 +0,0 @@
|
||||||
Decodebin design
|
|
||||||
|
|
||||||
GstDecodeBin
|
|
||||||
------------
|
|
||||||
|
|
||||||
Description:
|
|
||||||
|
|
||||||
Autoplug and decode to raw media
|
|
||||||
|
|
||||||
Input : single pad with ANY caps Output : Dynamic pads
|
|
||||||
|
|
||||||
* Contents
|
|
||||||
|
|
||||||
_ a GstTypeFindElement connected to the single sink pad
|
|
||||||
|
|
||||||
_ optionally a demuxer/parser
|
|
||||||
|
|
||||||
_ optionally one or more DecodeGroup
|
|
||||||
|
|
||||||
* Autoplugging
|
|
||||||
|
|
||||||
The goal is to reach 'target' caps (by default raw media).
|
|
||||||
|
|
||||||
This is done by using the GstCaps of a source pad and finding the available
|
|
||||||
demuxers/decoders GstElement that can be linked to that pad.
|
|
||||||
|
|
||||||
The process starts with the source pad of typefind and stops when no more
|
|
||||||
non-target caps are left. It is commonly done while pre-rolling, but can also
|
|
||||||
happen whenever a new pad appears on any element.
|
|
||||||
|
|
||||||
Once a target caps has been found, that pad is ghosted and the
|
|
||||||
'pad-added' signal is emitted.
|
|
||||||
|
|
||||||
If no compatible elements can be found for a GstCaps, the pad is ghosted and
|
|
||||||
the 'unknown-type' signal is emitted.
|
|
||||||
|
|
||||||
|
|
||||||
* Assisted auto-plugging
|
|
||||||
|
|
||||||
When starting the auto-plugging process for a given GstCaps, two signals are
|
|
||||||
emitted in the following way in order to allow the application/user to assist or
|
|
||||||
fine-tune the process.
|
|
||||||
|
|
||||||
_ 'autoplug-continue' :
|
|
||||||
|
|
||||||
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
|
|
||||||
|
|
||||||
This signal is fired at the very beginning with the source pad GstCaps. If
|
|
||||||
the callback returns TRUE, the process continues normally. If the callback
|
|
||||||
returns FALSE, then the GstCaps are considered as a target caps and the
|
|
||||||
autoplugging process stops.
|
|
||||||
|
|
||||||
- 'autoplug-factories' :
|
|
||||||
|
|
||||||
GValueArray user_function (GstElement* decodebin, GstPad* pad,
|
|
||||||
GstCaps* caps);
|
|
||||||
|
|
||||||
Get a list of elementfactories for @pad with @caps. This function is used to
|
|
||||||
instruct decodebin2 of the elements it should try to autoplug. The default
|
|
||||||
behaviour when this function is not overriden is to get all elements that
|
|
||||||
can handle @caps from the registry sorted by rank.
|
|
||||||
|
|
||||||
- 'autoplug-select' :
|
|
||||||
|
|
||||||
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
|
|
||||||
GValueArray* factories);
|
|
||||||
|
|
||||||
This signal is fired once autoplugging has got a list of compatible
|
|
||||||
GstElementFactory. The signal is emitted with the GstCaps of the source pad
|
|
||||||
and a pointer on the GValueArray of compatible factories.
|
|
||||||
|
|
||||||
The callback should return the index of the elementfactory in @factories
|
|
||||||
that should be tried next.
|
|
||||||
|
|
||||||
If the callback returns -1, the autoplugging process will stop as if no
|
|
||||||
compatible factories were found.
|
|
||||||
|
|
||||||
The default implementation of this function will try to autoplug the first
|
|
||||||
factory of the list.
|
|
||||||
|
|
||||||
* Target Caps
|
|
||||||
|
|
||||||
The target caps are a read/write GObject property of decodebin.
|
|
||||||
|
|
||||||
By default the target caps are:
|
|
||||||
|
|
||||||
_ Raw audio : audio/x-raw
|
|
||||||
|
|
||||||
_ and raw video : video/x-raw
|
|
||||||
|
|
||||||
_ and Text : text/plain, text/x-pango-markup
|
|
||||||
|
|
||||||
|
|
||||||
* media chain/group handling
|
|
||||||
|
|
||||||
When autoplugging, all streams coming out of a demuxer will be grouped in a
|
|
||||||
DecodeGroup.
|
|
||||||
|
|
||||||
All new source pads created on that demuxer after it has emitted the
|
|
||||||
'no-more-pads' signal will be put in another DecodeGroup.
|
|
||||||
|
|
||||||
Only one decodegroup can be active at any given time. If a new decodegroup is
|
|
||||||
created while another one exists, that decodegroup will be set as blocking until
|
|
||||||
the existing one has drained.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
DecodeGroup
|
|
||||||
-----------
|
|
||||||
|
|
||||||
Description:
|
|
||||||
|
|
||||||
Streams belonging to the same group/chain of a media file.
|
|
||||||
|
|
||||||
* Contents
|
|
||||||
|
|
||||||
The DecodeGroup contains:
|
|
||||||
|
|
||||||
_ a GstMultiQueue to which all streams of a the media group are connected.
|
|
||||||
|
|
||||||
_ the eventual decoders which are autoplugged in order to produce the
|
|
||||||
requested target pads.
|
|
||||||
|
|
||||||
* Proper group draining
|
|
||||||
|
|
||||||
The DecodeGroup takes care that all the streams in the group are completely
|
|
||||||
drained (EOS has come through all source ghost pads).
|
|
||||||
|
|
||||||
* Pre-roll and block
|
|
||||||
|
|
||||||
The DecodeGroup has a global blocking feature. If enabled, all the ghosted
|
|
||||||
source pads for that group will be blocked.
|
|
||||||
|
|
||||||
A method is available to unblock all blocked pads for that group.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
GstMultiQueue
|
|
||||||
-------------
|
|
||||||
|
|
||||||
Description:
|
|
||||||
|
|
||||||
Multiple input-output data queue
|
|
||||||
|
|
||||||
The GstMultiQueue achieves the same functionality as GstQueue, with a few
|
|
||||||
differences:
|
|
||||||
|
|
||||||
* Multiple streams handling.
|
|
||||||
|
|
||||||
The element handles queueing data on more than one stream at once. To
|
|
||||||
achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src
|
|
||||||
pads (src_%u).
|
|
||||||
|
|
||||||
When requesting a given sinkpad, the associated srcpad for that stream will
|
|
||||||
be created. Ex: requesting sink_1 will generate src_1.
|
|
||||||
|
|
||||||
|
|
||||||
* Non-starvation on multiple streams.
|
|
||||||
|
|
||||||
If more than one stream is used with the element, the streams' queues will
|
|
||||||
be dynamically grown (up to a limit), in order to ensure that no stream is
|
|
||||||
risking data starvation. This guarantees that at any given time there are at
|
|
||||||
least N bytes queued and available for each individual stream.
|
|
||||||
|
|
||||||
If an EOS event comes through a srcpad, the associated queue should be
|
|
||||||
considered as 'not-empty' in the queue-size-growing algorithm.
|
|
||||||
|
|
||||||
|
|
||||||
* Non-linked srcpads graceful handling.
|
|
||||||
|
|
||||||
A GstTask is started for all srcpads when going to GST_STATE_PAUSED.
|
|
||||||
|
|
||||||
The task are blocking against a GCondition which will be fired in two
|
|
||||||
different cases:
|
|
||||||
|
|
||||||
_ When the associated queue has received a buffer.
|
|
||||||
|
|
||||||
_ When the associated queue was previously declared as 'not-linked' and the
|
|
||||||
first buffer of the queue is scheduled to be pushed synchronously in
|
|
||||||
relation to the order in which it arrived globally in the element (see
|
|
||||||
'Synchronous data pushing' below).
|
|
||||||
|
|
||||||
When woken up by the GCondition, the GstTask will try to push the next
|
|
||||||
GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
|
|
||||||
GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
|
|
||||||
pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
|
|
||||||
'not-linked'.
|
|
||||||
|
|
||||||
If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
|
|
||||||
then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
|
|
||||||
return GST_FLOW_NOT_LINKED.
|
|
||||||
|
|
||||||
* Synchronous data pushing for non-linked pads.
|
|
||||||
|
|
||||||
In order to better support dynamic switching between streams, the multiqueue
|
|
||||||
(unlike the current GStreamer queue) continues to push buffers on non-linked
|
|
||||||
pads rather than shutting down.
|
|
||||||
|
|
||||||
In addition, to prevent a non-linked stream from very quickly consuming all
|
|
||||||
available buffers and thus 'racing ahead' of the other streams, the element
|
|
||||||
must ensure that buffers and inlined events for a non-linked stream are pushed
|
|
||||||
in the same order as they were received, relative to the other streams
|
|
||||||
controlled by the element. This means that a buffer cannot be pushed to a
|
|
||||||
non-linked pad any sooner than buffers in any other stream which were received
|
|
||||||
before it.
|
|
||||||
|
|
||||||
|
|
||||||
=====================================
|
|
||||||
Parsers, decoders and auto-plugging
|
|
||||||
=====================================
|
|
||||||
|
|
||||||
This section has DRAFT status.
|
|
||||||
|
|
||||||
Some media formats come in different "flavours" or "stream formats". These
|
|
||||||
formats differ in the way the setup data and media data is signalled and/or
|
|
||||||
packaged. An example for this is H.264 video, where there is a bytestream
|
|
||||||
format (with codec setup data signalled inline and units prefixed by a sync
|
|
||||||
code and packet length information) and a "raw" format where codec setup
|
|
||||||
data is signalled out of band (via the caps) and the chunking is implicit
|
|
||||||
in the way the buffers were muxed into a container, to mention just two of
|
|
||||||
the possible variants.
|
|
||||||
|
|
||||||
Especially on embedded platforms it is common that decoders can only
|
|
||||||
handle one particular stream format, and not all of them.
|
|
||||||
|
|
||||||
Where there are multiple stream formats, parsers are usually expected
|
|
||||||
to be able to convert between the different formats. This will, if
|
|
||||||
implemented correctly, work as expected in a static pipeline such as
|
|
||||||
|
|
||||||
... ! parser ! decoder ! sink
|
|
||||||
|
|
||||||
where the parser can query the decoder's capabilities even before
|
|
||||||
processing the first piece of data, and configure itself to convert
|
|
||||||
accordingly, if conversion is needed at all.
|
|
||||||
|
|
||||||
In an auto-plugging context this is not so straight-forward though,
|
|
||||||
because elements are plugged incrementally and not before the previous
|
|
||||||
element has processes some data and decided what it will output exactly
|
|
||||||
(unless the template caps are completely fixed, then it can continue
|
|
||||||
right away, this is not always the case here though, see below). A
|
|
||||||
parser will thus have to decide on *some* output format so auto-plugging
|
|
||||||
can continue. It doesn't know anything about the available decoders and
|
|
||||||
their capabilities though, so it's possible that it will choose a format
|
|
||||||
that is not supported by any of the available decoders, or by the preferred
|
|
||||||
decoder.
|
|
||||||
|
|
||||||
If the parser had sufficiently concise but fixed source pad template caps,
|
|
||||||
decodebin could continue to plug a decoder right away, allowing the
|
|
||||||
parser to configure itself in the same way as it would with a static
|
|
||||||
pipeline. This is not an option, unfortunately, because often the
|
|
||||||
parser needs to process some data to determine e.g. the format's profile or
|
|
||||||
other stream properties (resolution, sample rate, channel configuration, etc.),
|
|
||||||
and there may be different decoders for different profiles (e.g. DSP codec
|
|
||||||
for baseline profile, and software fallback for main/high profile; or a DSP
|
|
||||||
codec only supporting certain resolutions, with a software fallback for
|
|
||||||
unusual resolutions). So if decodebin just plugged the most highest-ranking
|
|
||||||
decoder, that decoder might not be be able to handle the actual stream later
|
|
||||||
on, which would yield an error (this is a data flow error then which would
|
|
||||||
be hard to intercept and avoid in decodebin). In other words, we can't solve
|
|
||||||
this issue by plugging a decoder right away with the parser.
|
|
||||||
|
|
||||||
So decodebin needs to communicate to the parser the set of available decoder
|
|
||||||
caps (which would contain the relevant capabilities/restrictions such as
|
|
||||||
supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
|
|
||||||
filtering/sorting of course.
|
|
||||||
|
|
||||||
This is done by plugging a capsfilter element right after the parser, and
|
|
||||||
constructing set of filter caps from the list of available decoders (one
|
|
||||||
appends at the end just the name(s) of the caps structures from the parser
|
|
||||||
pad template caps to function as an 'ANY other' caps equivalent). This let
|
|
||||||
the parser negotiate to a supported stream format in the same way as with
|
|
||||||
the static pipeline mentioned above, but of course incur some overhead
|
|
||||||
through the additional capsfilter element.
|
|
||||||
|
|
|
@ -1,571 +0,0 @@
|
||||||
Encoding and Muxing
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
Summary
|
|
||||||
-------
|
|
||||||
A. Problems
|
|
||||||
B. Goals
|
|
||||||
1. EncodeBin
|
|
||||||
2. Encoding Profile System
|
|
||||||
3. Helper Library for Profiles
|
|
||||||
I. Use-cases researched
|
|
||||||
|
|
||||||
|
|
||||||
A. Problems this proposal attempts to solve
|
|
||||||
-------------------------------------------
|
|
||||||
|
|
||||||
* Duplication of pipeline code for gstreamer-based applications
|
|
||||||
wishing to encode and or mux streams, leading to subtle differences
|
|
||||||
and inconsistencies across those applications.
|
|
||||||
|
|
||||||
* No unified system for describing encoding targets for applications
|
|
||||||
in a user-friendly way.
|
|
||||||
|
|
||||||
* No unified system for creating encoding targets for applications,
|
|
||||||
resulting in duplication of code across all applications,
|
|
||||||
differences and inconsistencies that come with that duplication,
|
|
||||||
and applications hardcoding element names and settings resulting in
|
|
||||||
poor portability.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
B. Goals
|
|
||||||
--------
|
|
||||||
|
|
||||||
1. Convenience encoding element
|
|
||||||
|
|
||||||
Create a convenience GstBin for encoding and muxing several streams,
|
|
||||||
hereafter called 'EncodeBin'.
|
|
||||||
|
|
||||||
This element will only contain one single property, which is a
|
|
||||||
profile.
|
|
||||||
|
|
||||||
2. Define a encoding profile system
|
|
||||||
|
|
||||||
2. Encoding profile helper library
|
|
||||||
|
|
||||||
Create a helper library to:
|
|
||||||
* create EncodeBin instances based on profiles, and
|
|
||||||
* help applications to create/load/save/browse those profiles.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
1. EncodeBin
|
|
||||||
------------
|
|
||||||
|
|
||||||
1.1 Proposed API
|
|
||||||
----------------
|
|
||||||
|
|
||||||
EncodeBin is a GstBin subclass.
|
|
||||||
|
|
||||||
It implements the GstTagSetter interface, by which it will proxy the
|
|
||||||
calls to the muxer.
|
|
||||||
|
|
||||||
Only two introspectable property (i.e. usable without extra API):
|
|
||||||
* A GstEncodingProfile*
|
|
||||||
* The name of the profile to use
|
|
||||||
|
|
||||||
When a profile is selected, encodebin will:
|
|
||||||
* Add REQUEST sinkpads for all the GstStreamProfile
|
|
||||||
* Create the muxer and expose the source pad
|
|
||||||
|
|
||||||
Whenever a request pad is created, encodebin will:
|
|
||||||
* Create the chain of elements for that pad
|
|
||||||
* Ghost the sink pad
|
|
||||||
* Return that ghost pad
|
|
||||||
|
|
||||||
This allows reducing the code to the minimum for applications
|
|
||||||
wishing to encode a source for a given profile:
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
encbin = gst_element_factory_make("encodebin, NULL);
|
|
||||||
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
|
|
||||||
gst_element_link (encbin, filesink);
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
vsrcpad = gst_element_get_src_pad(source, "src1");
|
|
||||||
vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
|
|
||||||
gst_pad_link(vsrcpad, vsinkpad);
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
|
|
||||||
1.2 Explanation of the Various stages in EncodeBin
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
This describes the various stages which can happen in order to end
|
|
||||||
up with a multiplexed stream that can then be stored or streamed.
|
|
||||||
|
|
||||||
1.2.1 Incoming streams
|
|
||||||
|
|
||||||
The streams fed to EncodeBin can be of various types:
|
|
||||||
|
|
||||||
* Video
|
|
||||||
* Uncompressed (but maybe subsampled)
|
|
||||||
* Compressed
|
|
||||||
* Audio
|
|
||||||
* Uncompressed (audio/x-raw)
|
|
||||||
* Compressed
|
|
||||||
* Timed text
|
|
||||||
* Private streams
|
|
||||||
|
|
||||||
|
|
||||||
1.2.2 Steps involved for raw video encoding
|
|
||||||
|
|
||||||
(0) Incoming Stream
|
|
||||||
|
|
||||||
(1) Transform raw video feed (optional)
|
|
||||||
|
|
||||||
Here we modify the various fundamental properties of a raw video
|
|
||||||
stream to be compatible with the intersection of:
|
|
||||||
* The encoder GstCaps and
|
|
||||||
* The specified "Stream Restriction" of the profile/target
|
|
||||||
|
|
||||||
The fundamental properties that can be modified are:
|
|
||||||
* width/height
|
|
||||||
This is done with a video scaler.
|
|
||||||
The DAR (Display Aspect Ratio) MUST be respected.
|
|
||||||
If needed, black borders can be added to comply with the target DAR.
|
|
||||||
* framerate
|
|
||||||
* format/colorspace/depth
|
|
||||||
All of this is done with a colorspace converter
|
|
||||||
|
|
||||||
(2) Actual encoding (optional for raw streams)
|
|
||||||
|
|
||||||
An encoder (with some optional settings) is used.
|
|
||||||
|
|
||||||
(3) Muxing
|
|
||||||
|
|
||||||
A muxer (with some optional settings) is used.
|
|
||||||
|
|
||||||
(4) Outgoing encoded and muxed stream
|
|
||||||
|
|
||||||
|
|
||||||
1.2.3 Steps involved for raw audio encoding
|
|
||||||
|
|
||||||
This is roughly the same as for raw video, expect for (1)
|
|
||||||
|
|
||||||
(1) Transform raw audo feed (optional)
|
|
||||||
|
|
||||||
We modify the various fundamental properties of a raw audio stream to
|
|
||||||
be compatible with the intersection of:
|
|
||||||
* The encoder GstCaps and
|
|
||||||
* The specified "Stream Restriction" of the profile/target
|
|
||||||
|
|
||||||
The fundamental properties that can be modifier are:
|
|
||||||
* Number of channels
|
|
||||||
* Type of raw audio (integer or floating point)
|
|
||||||
* Depth (number of bits required to encode one sample)
|
|
||||||
|
|
||||||
|
|
||||||
1.2.4 Steps involved for encoded audio/video streams
|
|
||||||
|
|
||||||
Steps (1) and (2) are replaced by a parser if a parser is available
|
|
||||||
for the given format.
|
|
||||||
|
|
||||||
|
|
||||||
1.2.5 Steps involved for other streams
|
|
||||||
|
|
||||||
Other streams will just be forwarded as-is to the muxer, provided the
|
|
||||||
muxer accepts the stream type.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
2. Encoding Profile System
|
|
||||||
--------------------------
|
|
||||||
|
|
||||||
This work is based on:
|
|
||||||
* The existing GstPreset system for elements [0]
|
|
||||||
* The gnome-media GConf audio profile system [1]
|
|
||||||
* The investigation done into device profiles by Arista and
|
|
||||||
Transmageddon [2 and 3]
|
|
||||||
|
|
||||||
2.2 Terminology
|
|
||||||
---------------
|
|
||||||
|
|
||||||
* Encoding Target Category
|
|
||||||
A Target Category is a classification of devices/systems/use-cases
|
|
||||||
for encoding.
|
|
||||||
|
|
||||||
Such a classification is required in order for:
|
|
||||||
* Applications with a very-specific use-case to limit the number of
|
|
||||||
profiles they can offer the user. A screencasting application has
|
|
||||||
no use with the online services targets for example.
|
|
||||||
* Offering the user some initial classification in the case of a
|
|
||||||
more generic encoding application (like a video editor or a
|
|
||||||
transcoder).
|
|
||||||
|
|
||||||
Ex:
|
|
||||||
Consumer devices
|
|
||||||
Online service
|
|
||||||
Intermediate Editing Format
|
|
||||||
Screencast
|
|
||||||
Capture
|
|
||||||
Computer
|
|
||||||
|
|
||||||
* Encoding Profile Target
|
|
||||||
A Profile Target describes a specific entity for which we wish to
|
|
||||||
encode.
|
|
||||||
A Profile Target must belong to at least one Target Category.
|
|
||||||
It will define at least one Encoding Profile.
|
|
||||||
|
|
||||||
Ex (with category):
|
|
||||||
Nokia N900 (Consumer device)
|
|
||||||
Sony PlayStation 3 (Consumer device)
|
|
||||||
Youtube (Online service)
|
|
||||||
DNxHD (Intermediate editing format)
|
|
||||||
HuffYUV (Screencast)
|
|
||||||
Theora (Computer)
|
|
||||||
|
|
||||||
* Encoding Profile
|
|
||||||
A specific combination of muxer, encoders, presets and limitations.
|
|
||||||
|
|
||||||
Ex:
|
|
||||||
Nokia N900/H264 HQ
|
|
||||||
Ipod/High Quality
|
|
||||||
DVD/Pal
|
|
||||||
Youtube/High Quality
|
|
||||||
HTML5/Low Bandwith
|
|
||||||
DNxHD
|
|
||||||
|
|
||||||
2.3 Encoding Profile
|
|
||||||
--------------------
|
|
||||||
|
|
||||||
An encoding profile requires the following information:
|
|
||||||
|
|
||||||
* Name
|
|
||||||
This string is not translatable and must be unique.
|
|
||||||
A recommendation to guarantee uniqueness of the naming could be:
|
|
||||||
<target>/<name>
|
|
||||||
* Description
|
|
||||||
This is a translatable string describing the profile
|
|
||||||
* Muxing format
|
|
||||||
This is a string containing the GStreamer media-type of the
|
|
||||||
container format.
|
|
||||||
* Muxing preset
|
|
||||||
This is an optional string describing the preset(s) to use on the
|
|
||||||
muxer.
|
|
||||||
* Multipass setting
|
|
||||||
This is a boolean describing whether the profile requires several
|
|
||||||
passes.
|
|
||||||
* List of Stream Profile
|
|
||||||
|
|
||||||
2.3.1 Stream Profiles
|
|
||||||
|
|
||||||
A Stream Profile consists of:
|
|
||||||
|
|
||||||
* Type
|
|
||||||
The type of stream profile (audio, video, text, private-data)
|
|
||||||
* Encoding Format
|
|
||||||
This is a string containing the GStreamer media-type of the encoding
|
|
||||||
format to be used. If encoding is not to be applied, the raw audio
|
|
||||||
media type will be used.
|
|
||||||
* Encoding preset
|
|
||||||
This is an optional string describing the preset(s) to use on the
|
|
||||||
encoder.
|
|
||||||
* Restriction
|
|
||||||
This is an optional GstCaps containing the restriction of the
|
|
||||||
stream that can be fed to the encoder.
|
|
||||||
This will generally containing restrictions in video
|
|
||||||
width/heigh/framerate or audio depth.
|
|
||||||
* presence
|
|
||||||
This is an integer specifying how many streams can be used in the
|
|
||||||
containing profile. 0 means that any number of streams can be
|
|
||||||
used.
|
|
||||||
* pass
|
|
||||||
This is an integer which is only meaningful if the multipass flag
|
|
||||||
has been set in the profile. If it has been set it indicates which
|
|
||||||
pass this Stream Profile corresponds to.
|
|
||||||
|
|
||||||
2.4 Example profile
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
The representation used here is XML only as an example. No decision is
|
|
||||||
made as to which formatting to use for storing targets and profiles.
|
|
||||||
|
|
||||||
<gst-encoding-target>
|
|
||||||
<name>Nokia N900</name>
|
|
||||||
<category>Consumer Device</category>
|
|
||||||
<profiles>
|
|
||||||
<profile>Nokia N900/H264 HQ</profile>
|
|
||||||
<profile>Nokia N900/MP3</profile>
|
|
||||||
<profile>Nokia N900/AAC</profile>
|
|
||||||
</profiles>
|
|
||||||
</gst-encoding-target>
|
|
||||||
|
|
||||||
<gst-encoding-profile>
|
|
||||||
<name>Nokia N900/H264 HQ</name>
|
|
||||||
<description>
|
|
||||||
High Quality H264/AAC for the Nokia N900
|
|
||||||
</description>
|
|
||||||
<format>video/quicktime,variant=iso</format>
|
|
||||||
<streams>
|
|
||||||
<stream-profile>
|
|
||||||
<type>audio</type>
|
|
||||||
<format>audio/mpeg,mpegversion=4</format>
|
|
||||||
<preset>Quality High/Main</preset>
|
|
||||||
<restriction>audio/x-raw,channels=[1,2]</restriction>
|
|
||||||
<presence>1</presence>
|
|
||||||
</stream-profile>
|
|
||||||
<stream-profile>
|
|
||||||
<type>video</type>
|
|
||||||
<format>video/x-h264</format>
|
|
||||||
<preset>Profile Baseline/Quality High</preset>
|
|
||||||
<restriction>
|
|
||||||
video/x-raw,width=[16, 800],\
|
|
||||||
height=[16, 480],framerate=[1/1, 30000/1001]
|
|
||||||
</restriction>
|
|
||||||
<presence>1</presence>
|
|
||||||
</stream-profile>
|
|
||||||
</streams>
|
|
||||||
|
|
||||||
</gst-encoding-profile>
|
|
||||||
|
|
||||||
2.5 API
|
|
||||||
-------
|
|
||||||
A proposed C API is contained in the gstprofile.h file in this directory.
|
|
||||||
|
|
||||||
|
|
||||||
2.6 Modifications required in the existing GstPreset system
|
|
||||||
-----------------------------------------------------------
|
|
||||||
|
|
||||||
2.6.1. Temporary preset.
|
|
||||||
|
|
||||||
Currently a preset needs to be saved on disk in order to be
|
|
||||||
used.
|
|
||||||
|
|
||||||
This makes it impossible to have temporary presets (that exist only
|
|
||||||
during the lifetime of a process), which might be required in the
|
|
||||||
new proposed profile system
|
|
||||||
|
|
||||||
2.6.2 Categorisation of presets.
|
|
||||||
|
|
||||||
Currently presets are just aliases of a group of property/value
|
|
||||||
without any meanings or explanation as to how they exclude each
|
|
||||||
other.
|
|
||||||
|
|
||||||
Take for example the H264 encoder. It can have presets for:
|
|
||||||
* passes (1,2 or 3 passes)
|
|
||||||
* profiles (Baseline, Main, ...)
|
|
||||||
* quality (Low, medium, High)
|
|
||||||
|
|
||||||
In order to programmatically know which presets exclude each other,
|
|
||||||
we here propose the categorisation of these presets.
|
|
||||||
|
|
||||||
This can be done in one of two ways
|
|
||||||
1. in the name (by making the name be [<category>:]<name>)
|
|
||||||
This would give for example: "Quality:High", "Profile:Baseline"
|
|
||||||
2. by adding a new _meta key
|
|
||||||
This would give for example: _meta/category:quality
|
|
||||||
|
|
||||||
2.6.3 Aggregation of presets.
|
|
||||||
|
|
||||||
There can be more than one choice of presets to be done for an
|
|
||||||
element (quality, profile, pass).
|
|
||||||
|
|
||||||
This means that one can not currently describe the full
|
|
||||||
configuration of an element with a single string but with many.
|
|
||||||
|
|
||||||
The proposal here is to extend the GstPreset API to be able to set
|
|
||||||
all presets using one string and a well-known separator ('/').
|
|
||||||
|
|
||||||
This change only requires changes in the core preset handling code.
|
|
||||||
|
|
||||||
This would allow doing the following:
|
|
||||||
gst_preset_load_preset (h264enc,
|
|
||||||
"pass:1/profile:baseline/quality:high");
|
|
||||||
|
|
||||||
2.7 Points to be determined
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
This document hasn't determined yet how to solve the following
|
|
||||||
problems:
|
|
||||||
|
|
||||||
2.7.1 Storage of profiles
|
|
||||||
|
|
||||||
One proposal for storage would be to use a system wide directory
|
|
||||||
(like $prefix/share/gstreamer-0.10/profiles) and store XML files for
|
|
||||||
every individual profiles.
|
|
||||||
|
|
||||||
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
|
|
||||||
|
|
||||||
This poses some limitations as to what to do if some applications
|
|
||||||
want to have some profiles limited to their own usage.
|
|
||||||
|
|
||||||
|
|
||||||
3. Helper library for profiles
|
|
||||||
------------------------------
|
|
||||||
|
|
||||||
These helper methods could also be added to existing libraries (like
|
|
||||||
GstPreset, GstPbUtils, ..).
|
|
||||||
|
|
||||||
The various API proposed are in the accompanying gstprofile.h file.
|
|
||||||
|
|
||||||
3.1 Getting user-readable names for formats
|
|
||||||
|
|
||||||
This is already provided by GstPbUtils.
|
|
||||||
|
|
||||||
3.2 Hierarchy of profiles
|
|
||||||
|
|
||||||
The goal is for applications to be able to present to the user a list
|
|
||||||
of combo-boxes for choosing their output profile:
|
|
||||||
|
|
||||||
[ Category ] # optional, depends on the application
|
|
||||||
[ Device/Site/.. ] # optional, depends on the application
|
|
||||||
[ Profile ]
|
|
||||||
|
|
||||||
Convenience methods are offered to easily get lists of categories,
|
|
||||||
devices, and profiles.
|
|
||||||
|
|
||||||
3.3 Creating Profiles
|
|
||||||
|
|
||||||
The goal is for applications to be able to easily create profiles.
|
|
||||||
|
|
||||||
The applications needs to be able to have a fast/efficient way to:
|
|
||||||
* select a container format and see all compatible streams he can use
|
|
||||||
with it.
|
|
||||||
* select a codec format and see which container formats he can use
|
|
||||||
with it.
|
|
||||||
|
|
||||||
The remaining parts concern the restrictions to encoder
|
|
||||||
input.
|
|
||||||
|
|
||||||
3.4 Ensuring availability of plugins for Profiles
|
|
||||||
|
|
||||||
When an application wishes to use a Profile, it should be able to
|
|
||||||
query whether it has all the needed plugins to use it.
|
|
||||||
|
|
||||||
This part will use GstPbUtils to query, and if needed install the
|
|
||||||
missing plugins through the installed distribution plugin installer.
|
|
||||||
|
|
||||||
|
|
||||||
I. Use-cases researched
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
This is a list of various use-cases where encoding/muxing is being
|
|
||||||
used.
|
|
||||||
|
|
||||||
* Transcoding
|
|
||||||
|
|
||||||
The goal is to convert with as minimal loss of quality any input
|
|
||||||
file for a target use.
|
|
||||||
A specific variant of this is transmuxing (see below).
|
|
||||||
|
|
||||||
Example applications: Arista, Transmageddon
|
|
||||||
|
|
||||||
* Rendering timelines
|
|
||||||
|
|
||||||
The incoming streams are a collection of various segments that need
|
|
||||||
to be rendered.
|
|
||||||
Those segments can vary in nature (i.e. the video width/height can
|
|
||||||
change).
|
|
||||||
This requires the use of identiy with the single-segment property
|
|
||||||
activated to transform the incoming collection of segments to a
|
|
||||||
single continuous segment.
|
|
||||||
|
|
||||||
Example applications: PiTiVi, Jokosher
|
|
||||||
|
|
||||||
* Encoding of live sources
|
|
||||||
|
|
||||||
The major risk to take into account is the encoder not encoding the
|
|
||||||
incoming stream fast enough. This is outside of the scope of
|
|
||||||
encodebin, and should be solved by using queues between the sources
|
|
||||||
and encodebin, as well as implementing QoS in encoders and sources
|
|
||||||
(the encoders emitting QoS events, and the upstream elements
|
|
||||||
adapting themselves accordingly).
|
|
||||||
|
|
||||||
Example applications: camerabin, cheese
|
|
||||||
|
|
||||||
* Screencasting applications
|
|
||||||
|
|
||||||
This is similar to encoding of live sources.
|
|
||||||
The difference being that due to the nature of the source (size and
|
|
||||||
amount/frequency of updates) one might want to do the encoding in
|
|
||||||
two parts:
|
|
||||||
* The actual live capture is encoded with a 'almost-lossless' codec
|
|
||||||
(such as huffyuv)
|
|
||||||
* Once the capture is done, the file created in the first step is
|
|
||||||
then rendered to the desired target format.
|
|
||||||
|
|
||||||
Fixing sources to only emit region-updates and having encoders
|
|
||||||
capable of encoding those streams would fix the need for the first
|
|
||||||
step but is outside of the scope of encodebin.
|
|
||||||
|
|
||||||
Example applications: Istanbul, gnome-shell, recordmydesktop
|
|
||||||
|
|
||||||
* Live transcoding
|
|
||||||
|
|
||||||
This is the case of an incoming live stream which will be
|
|
||||||
broadcasted/transmitted live.
|
|
||||||
One issue to take into account is to reduce the encoding latency to
|
|
||||||
a minimum. This should mostly be done by picking low-latency
|
|
||||||
encoders.
|
|
||||||
|
|
||||||
Example applications: Rygel, Coherence
|
|
||||||
|
|
||||||
* Transmuxing
|
|
||||||
|
|
||||||
Given a certain file, the aim is to remux the contents WITHOUT
|
|
||||||
decoding into either a different container format or the same
|
|
||||||
container format.
|
|
||||||
Remuxing into the same container format is useful when the file was
|
|
||||||
not created properly (for example, the index is missing).
|
|
||||||
Whenever available, parsers should be applied on the encoded streams
|
|
||||||
to validate and/or fix the streams before muxing them.
|
|
||||||
|
|
||||||
Metadata from the original file must be kept in the newly created
|
|
||||||
file.
|
|
||||||
|
|
||||||
Example applications: Arista, Transmaggedon
|
|
||||||
|
|
||||||
* Loss-less cutting
|
|
||||||
|
|
||||||
Given a certain file, the aim is to extract a certain part of the
|
|
||||||
file without going through the process of decoding and re-encoding
|
|
||||||
that file.
|
|
||||||
This is similar to the transmuxing use-case.
|
|
||||||
|
|
||||||
Example applications: PiTiVi, Transmageddon, Arista, ...
|
|
||||||
|
|
||||||
* Multi-pass encoding
|
|
||||||
|
|
||||||
Some encoders allow doing a multi-pass encoding.
|
|
||||||
The initial pass(es) are only used to collect encoding estimates and
|
|
||||||
are not actually muxed and outputted.
|
|
||||||
The final pass uses previously collected information, and the output
|
|
||||||
is then muxed and outputted.
|
|
||||||
|
|
||||||
* Archiving and intermediary format
|
|
||||||
|
|
||||||
The requirement is to have lossless
|
|
||||||
|
|
||||||
* CD ripping
|
|
||||||
|
|
||||||
Example applications: Sound-juicer
|
|
||||||
|
|
||||||
* DVD ripping
|
|
||||||
|
|
||||||
Example application: Thoggen
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
* Research links
|
|
||||||
|
|
||||||
Some of these are still active documents, some other not
|
|
||||||
|
|
||||||
[0] GstPreset API documentation
|
|
||||||
http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
|
|
||||||
|
|
||||||
[1] gnome-media GConf profiles
|
|
||||||
http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
|
|
||||||
|
|
||||||
[2] Research on a Device Profile API
|
|
||||||
http://gstreamer.freedesktop.org/wiki/DeviceProfile
|
|
||||||
|
|
||||||
[3] Research on defining presets usage
|
|
||||||
http://gstreamer.freedesktop.org/wiki/PresetDesign
|
|
||||||
|
|
|
@ -1,204 +0,0 @@
|
||||||
|
|
||||||
Orc Integration
|
|
||||||
===============
|
|
||||||
|
|
||||||
Sections
|
|
||||||
--------
|
|
||||||
|
|
||||||
- About Orc
|
|
||||||
- Fast memcpy()
|
|
||||||
- Normal Usage
|
|
||||||
- Build Process
|
|
||||||
- Testing
|
|
||||||
- Orc Limitations
|
|
||||||
|
|
||||||
|
|
||||||
About Orc
|
|
||||||
---------
|
|
||||||
|
|
||||||
Orc code can be in one of two forms: in .orc files that is converted
|
|
||||||
by orcc to C code that calls liborc functions, or C code that calls
|
|
||||||
liborc to create complex operations at runtime. The former is mostly
|
|
||||||
for functions with predetermined functionality. The latter is for
|
|
||||||
functionality that is determined at runtime, where writing .orc
|
|
||||||
functions for all combinations would be prohibitive. Orc also has
|
|
||||||
a fast memcpy and memset which are useful independently.
|
|
||||||
|
|
||||||
|
|
||||||
Fast memcpy()
|
|
||||||
-------------
|
|
||||||
|
|
||||||
*** This part is not integrated yet. ***
|
|
||||||
|
|
||||||
Orc has built-in functions orc_memcpy() and orc_memset() that work
|
|
||||||
like memcpy() and memset(). These are meant for large copies only.
|
|
||||||
A reasonable cutoff for using orc_memcpy() instead of memcpy() is
|
|
||||||
if the number of bytes is generally greater than 100. DO NOT use
|
|
||||||
orc_memcpy() if the typical is size is less than 20 bytes, especially
|
|
||||||
if the size is known at compile time, as these cases are inlined by
|
|
||||||
the compiler.
|
|
||||||
|
|
||||||
(Example: sys/ximage/ximagesink.c)
|
|
||||||
|
|
||||||
Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to
|
|
||||||
libgstximagesink_la_LIBADD. Then, in the source file, add:
|
|
||||||
|
|
||||||
#ifdef HAVE_ORC
|
|
||||||
#include <orc/orc.h>
|
|
||||||
#else
|
|
||||||
#define orc_memcpy(a,b,c) memcpy(a,b,c)
|
|
||||||
#endif
|
|
||||||
|
|
||||||
Then switch relevant uses of memcpy() to orc_memcpy().
|
|
||||||
|
|
||||||
The above example works whether or not Orc is enabled at compile
|
|
||||||
time.
|
|
||||||
|
|
||||||
|
|
||||||
Normal Usage
|
|
||||||
------------
|
|
||||||
|
|
||||||
The following lines are added near the top of Makefile.am for plugins
|
|
||||||
that use Orc code in .orc files (this is for the volume plugin):
|
|
||||||
|
|
||||||
ORC_BASE=volume
|
|
||||||
include $(top_srcdir)/common/orc.mk
|
|
||||||
|
|
||||||
Also add the generated source file to the plugin build:
|
|
||||||
|
|
||||||
nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES)
|
|
||||||
|
|
||||||
And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and
|
|
||||||
$(ORC_LIBS) to libgstvolume_la_LIBADD.
|
|
||||||
|
|
||||||
The value assigned to ORC_BASE does not need to be related to
|
|
||||||
the name of the plugin.
|
|
||||||
|
|
||||||
|
|
||||||
Advanced Usage
|
|
||||||
--------------
|
|
||||||
|
|
||||||
The Holy Grail of Orc usage is to programmatically generate Orc code
|
|
||||||
at runtime, have liborc compile it into binary code at runtime, and
|
|
||||||
then execute this code. Currently, the best example of this is in
|
|
||||||
Schroedinger. An example of how this would be used is audioconvert:
|
|
||||||
given an input format, channel position manipulation, dithering and
|
|
||||||
quantizing configuration, and output format, a Orc code generator
|
|
||||||
would create an OrcProgram, add the appropriate instructions to do
|
|
||||||
each step based on the configuration, and then compile the program.
|
|
||||||
Successfully compiling the program would return a function pointer
|
|
||||||
that can be called to perform the operation.
|
|
||||||
|
|
||||||
This sort of advanced usage requires structural changes to current
|
|
||||||
plugins (e.g., audioconvert) and will probably be developed
|
|
||||||
incrementally. Moreover, if such code is intended to be used without
|
|
||||||
Orc as strict build/runtime requirement, two codepaths would need to
|
|
||||||
be developed and tested. For this reason, until GStreamer requires
|
|
||||||
Orc, I think it's a good idea to restrict such advanced usage to the
|
|
||||||
cog plugin in -bad, which requires Orc.
|
|
||||||
|
|
||||||
|
|
||||||
Build Process
|
|
||||||
-------------
|
|
||||||
|
|
||||||
The goal of the build process is to make Orc non-essential for most
|
|
||||||
developers and users. This is not to say you shouldn't have Orc
|
|
||||||
installed -- without it, you will get slow backup C code, just that
|
|
||||||
people compiling GStreamer are not forced to switch from Liboil to
|
|
||||||
Orc immediately.
|
|
||||||
|
|
||||||
With Orc installed, the build process will use the Orc Compiler (orcc)
|
|
||||||
to convert each .orc file into a temporary C source (tmp-orc.c) and a
|
|
||||||
temporary header file (${name}orc.h if constructed from ${base}.orc).
|
|
||||||
The C source file is compiled and linked to the plugin, and the header
|
|
||||||
file is included by other source files in the plugin.
|
|
||||||
|
|
||||||
If 'make orc-update' is run in the source directory, the files
|
|
||||||
tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and
|
|
||||||
${base}orc-dist.h respectively. The -dist.[ch] files are automatically
|
|
||||||
disted via orc.mk. The -dist.[ch] files should be checked in to
|
|
||||||
git whenever the .orc source is changed and checked in. Example
|
|
||||||
workflow:
|
|
||||||
|
|
||||||
edit .orc file
|
|
||||||
... make, test, etc.
|
|
||||||
make orc-update
|
|
||||||
git add volume.orc volumeorc-dist.c volumeorc-dist.h
|
|
||||||
git commit
|
|
||||||
|
|
||||||
At 'make dist' time, all of the .orc files are compiled, and then
|
|
||||||
copied to their -dist.[ch] counterparts, and then the -dist.[ch]
|
|
||||||
files are added to the dist directory.
|
|
||||||
|
|
||||||
Without Orc installed (or --disable-orc given to configure), the
|
|
||||||
-dist.[ch] files are copied to tmp-orc.c and ${name}orc.h. When
|
|
||||||
compiled Orc disabled, DISABLE_ORC is defined in config.h, and
|
|
||||||
the C backup code is compiled. This backup code is pure C, and
|
|
||||||
does not include orc headers or require linking against liborc.
|
|
||||||
|
|
||||||
The common/orc.mk build method is limited by the inflexibility of
|
|
||||||
automake. The file tmp-orc.c must be a fixed filename, using ORC_NAME
|
|
||||||
to generate the filename does not work because it conflicts with
|
|
||||||
automake's dependency generation. Building multiple .orc files
|
|
||||||
is not possible due to this restriction.
|
|
||||||
|
|
||||||
|
|
||||||
Testing
|
|
||||||
-------
|
|
||||||
|
|
||||||
If you create another .orc file, please add it to
|
|
||||||
tests/orc/Makefile.am. This causes automatic test code to be
|
|
||||||
generated and run during 'make check'. Each function in the .orc
|
|
||||||
file is tested by comparing the results of executing the run-time
|
|
||||||
compiled code and the C backup function.
|
|
||||||
|
|
||||||
|
|
||||||
Orc Limitations
|
|
||||||
---------------
|
|
||||||
|
|
||||||
audioconvert
|
|
||||||
|
|
||||||
Orc doesn't have a mechanism for generating random numbers, which
|
|
||||||
prevents its use as-is for dithering. One way around this is to
|
|
||||||
generate suitable dithering values in one pass, then use those
|
|
||||||
values in a second Orc-based pass.
|
|
||||||
|
|
||||||
Orc doesn't handle 64-bit float, for no good reason.
|
|
||||||
|
|
||||||
Irrespective of Orc handling 64-bit float, it would be useful to
|
|
||||||
have a direct 32-bit float to 16-bit integer conversion.
|
|
||||||
|
|
||||||
audioconvert is a good candidate for programmatically generated
|
|
||||||
Orc code.
|
|
||||||
|
|
||||||
audioconvert enumerates functions in terms of big-endian vs.
|
|
||||||
little-endian. Orc's functions are "native" and "swapped".
|
|
||||||
Programmatically generating code removes the need to worry about
|
|
||||||
this.
|
|
||||||
|
|
||||||
Orc doesn't handle 24-bit samples. Fixing this is not a priority
|
|
||||||
(for ds).
|
|
||||||
|
|
||||||
videoscale
|
|
||||||
|
|
||||||
Orc doesn't handle horizontal resampling yet. The plan is to add
|
|
||||||
special sampling opcodes, for nearest, bilinear, and cubic
|
|
||||||
interpolation.
|
|
||||||
|
|
||||||
videotestsrc
|
|
||||||
|
|
||||||
Lots of code in videotestsrc needs to be rewritten to be SIMD
|
|
||||||
(and Orc) friendly, e.g., stuff that uses oil_splat_u8().
|
|
||||||
|
|
||||||
A fast low-quality random number generator in Orc would be useful
|
|
||||||
here.
|
|
||||||
|
|
||||||
volume
|
|
||||||
|
|
||||||
Many of the comments on audioconvert apply here as well.
|
|
||||||
|
|
||||||
There are a bunch of FIXMEs in here that are due to misapplied
|
|
||||||
patches.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,91 +0,0 @@
|
||||||
Forcing keyframes
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
Consider the following use case:
|
|
||||||
|
|
||||||
We have a pipeline that performs video and audio capture from a live source,
|
|
||||||
compresses and muxes the streams and writes the resulting data into a file.
|
|
||||||
|
|
||||||
Inside the uncompressed video data we have a specific pattern inserted at
|
|
||||||
specific moments that should trigger a switch to a new file, meaning, we close
|
|
||||||
the existing file we are writing to and start writing to a new file.
|
|
||||||
|
|
||||||
We want the new file to start with a keyframe so that one can start decoding
|
|
||||||
the file immediately.
|
|
||||||
|
|
||||||
Components:
|
|
||||||
|
|
||||||
1) We need an element that is able to detect the pattern in the video stream.
|
|
||||||
|
|
||||||
2) We need to inform the video encoder that it should start encoding a keyframe
|
|
||||||
starting from exactly the frame with the pattern.
|
|
||||||
|
|
||||||
3) We need to inform the demuxer that it should flush out any pending data and
|
|
||||||
start creating the start of a new file with the keyframe as a first video
|
|
||||||
frame.
|
|
||||||
|
|
||||||
4) We need to inform the sink element that it should start writing to the next
|
|
||||||
file. This requires application interaction to instruct the sink of the new
|
|
||||||
filename. The application should also be free to ignore the boundary and
|
|
||||||
continue to write to the existing file. The application will typically use
|
|
||||||
an event pad probe to detect the custom event.
|
|
||||||
|
|
||||||
Implementation:
|
|
||||||
|
|
||||||
The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM
|
|
||||||
event that marks the keyframe boundary. This event is inserted into the
|
|
||||||
pipeline by the application upon a certain trigger. In the above use case this
|
|
||||||
trigger would be given by the element that detects the pattern, in the form of
|
|
||||||
an element message.
|
|
||||||
|
|
||||||
The custom event would travel further downstream to instruct encoder, muxer and
|
|
||||||
sink about the possible switch.
|
|
||||||
|
|
||||||
The information passed in the event consists of:
|
|
||||||
|
|
||||||
name: GstForceKeyUnit
|
|
||||||
(G_TYPE_UINT64)"timestamp" : the timestamp of the buffer that
|
|
||||||
triggered the event.
|
|
||||||
(G_TYPE_UINT64)"stream-time" : the stream position that triggered the
|
|
||||||
event.
|
|
||||||
(G_TYPE_UINT64)"running-time" : the running time of the stream when the
|
|
||||||
event was triggered.
|
|
||||||
(G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in
|
|
||||||
the caps or those sent at the start of
|
|
||||||
the stream.
|
|
||||||
|
|
||||||
.... : optional other data fields.
|
|
||||||
|
|
||||||
Note that this event is purely informational, no element is required to
|
|
||||||
perform an action but it should forward the event downstream, just like any
|
|
||||||
other event it does not handle.
|
|
||||||
|
|
||||||
Elements understanding the event should behave as follows:
|
|
||||||
|
|
||||||
1) The video encoder receives the event before the next frame. Upon reception
|
|
||||||
of the event it schedules to encode the next frame as a keyframe.
|
|
||||||
Before pushing out the encoded keyframe it must push the GstForceKeyUnit
|
|
||||||
event downstream.
|
|
||||||
|
|
||||||
2) The muxer receives the GstForceKeyUnit event and flushes out its current state,
|
|
||||||
preparing to produce data that can be used as a keyunit. Before pushing out
|
|
||||||
the new data it pushes the GstForceKeyUnit event downstream.
|
|
||||||
|
|
||||||
3) The application receives the GstForceKeyUnit on a sink padprobe of the sink
|
|
||||||
and reconfigures the sink to make it perform new actions after receiving
|
|
||||||
the next buffer.
|
|
||||||
|
|
||||||
|
|
||||||
Upstream
|
|
||||||
--------
|
|
||||||
|
|
||||||
When using RTP packets can get lost or receivers can be added at any time,
|
|
||||||
they may request a new key frame.
|
|
||||||
|
|
||||||
An downstream element sends an upstream "GstForceKeyUnit" event up the
|
|
||||||
pipeline.
|
|
||||||
|
|
||||||
When an element produces some kind of key unit in output, but has
|
|
||||||
no such concept in its input (like an encoder that takes raw frames),
|
|
||||||
it consumes the event (doesn't pass it upstream), and instead sends
|
|
||||||
a downstream GstForceKeyUnit event and a new keyframe.
|
|
|
@ -1,546 +0,0 @@
|
||||||
===============================================================
|
|
||||||
Subtitle overlays, hardware-accelerated decoding and playbin
|
|
||||||
===============================================================
|
|
||||||
|
|
||||||
Status: EARLY DRAFT / BRAINSTORMING
|
|
||||||
|
|
||||||
=== 1. Background ===
|
|
||||||
|
|
||||||
Subtitles can be muxed in containers or come from an external source.
|
|
||||||
|
|
||||||
Subtitles come in many shapes and colours. Usually they are either
|
|
||||||
text-based (incl. 'pango markup'), or bitmap-based (e.g. DVD subtitles
|
|
||||||
and the most common form of DVB subs). Bitmap based subtitles are
|
|
||||||
usually compressed in some way, like some form of run-length encoding.
|
|
||||||
|
|
||||||
Subtitles are currently decoded and rendered in subtitle-format-specific
|
|
||||||
overlay elements. These elements have two sink pads (one for raw video
|
|
||||||
and one for the subtitle format in question) and one raw video source pad.
|
|
||||||
|
|
||||||
They will take care of synchronising the two input streams, and of
|
|
||||||
decoding and rendering the subtitles on top of the raw video stream.
|
|
||||||
|
|
||||||
Digression: one could theoretically have dedicated decoder/render elements
|
|
||||||
that output an AYUV or ARGB image, and then let a videomixer element do
|
|
||||||
the actual overlaying, but this is not very efficient, because it requires
|
|
||||||
us to allocate and blend whole pictures (1920x1080 AYUV = 8MB,
|
|
||||||
1280x720 AYUV = 3.6MB, 720x576 AYUV = 1.6MB) even if the overlay region
|
|
||||||
is only a small rectangle at the bottom. This wastes memory and CPU.
|
|
||||||
We could do something better by introducing a new format that only
|
|
||||||
encodes the region(s) of interest, but we don't have such a format yet, and
|
|
||||||
are not necessarily keen to rewrite this part of the logic in playbin
|
|
||||||
at this point - and we can't change existing elements' behaviour, so would
|
|
||||||
need to introduce new elements for this.
|
|
||||||
|
|
||||||
Playbin2 supports outputting compressed formats, i.e. it does not
|
|
||||||
force decoding to a raw format, but is happy to output to a non-raw
|
|
||||||
format as long as the sink supports that as well.
|
|
||||||
|
|
||||||
In case of certain hardware-accelerated decoding APIs, we will make use
|
|
||||||
of that functionality. However, the decoder will not output a raw video
|
|
||||||
format then, but some kind of hardware/API-specific format (in the caps)
|
|
||||||
and the buffers will reference hardware/API-specific objects that
|
|
||||||
the hardware/API-specific sink will know how to handle.
|
|
||||||
|
|
||||||
|
|
||||||
=== 2. The Problem ===
|
|
||||||
|
|
||||||
In the case of such hardware-accelerated decoding, the decoder will not
|
|
||||||
output raw pixels that can easily be manipulated. Instead, it will
|
|
||||||
output hardware/API-specific objects that can later be used to render
|
|
||||||
a frame using the same API.
|
|
||||||
|
|
||||||
Even if we could transform such a buffer into raw pixels, we most
|
|
||||||
likely would want to avoid that, in order to avoid the need to
|
|
||||||
map the data back into system memory (and then later back to the GPU).
|
|
||||||
It's much better to upload the much smaller encoded data to the GPU/DSP
|
|
||||||
and then leave it there until rendered.
|
|
||||||
|
|
||||||
Currently playbin only supports subtitles on top of raw decoded video.
|
|
||||||
It will try to find a suitable overlay element from the plugin registry
|
|
||||||
based on the input subtitle caps and the rank. (It is assumed that we
|
|
||||||
will be able to convert any raw video format into any format required
|
|
||||||
by the overlay using a converter such as videoconvert.)
|
|
||||||
|
|
||||||
It will not render subtitles if the video sent to the sink is not
|
|
||||||
raw YUV or RGB or if conversions have been disabled by setting the
|
|
||||||
native-video flag on playbin.
|
|
||||||
|
|
||||||
Subtitle rendering is considered an important feature. Enabling
|
|
||||||
hardware-accelerated decoding by default should not lead to a major
|
|
||||||
feature regression in this area.
|
|
||||||
|
|
||||||
This means that we need to support subtitle rendering on top of
|
|
||||||
non-raw video.
|
|
||||||
|
|
||||||
|
|
||||||
=== 3. Possible Solutions ===
|
|
||||||
|
|
||||||
The goal is to keep knowledge of the subtitle format within the
|
|
||||||
format-specific GStreamer plugins, and knowledge of any specific
|
|
||||||
video acceleration API to the GStreamer plugins implementing
|
|
||||||
that API. We do not want to make the pango/dvbsuboverlay/dvdspu/kate
|
|
||||||
plugins link to libva/libvdpau/etc. and we do not want to make
|
|
||||||
the vaapi/vdpau plugins link to all of libpango/libkate/libass etc.
|
|
||||||
|
|
||||||
|
|
||||||
Multiple possible solutions come to mind:
|
|
||||||
|
|
||||||
(a) backend-specific overlay elements
|
|
||||||
|
|
||||||
e.g. vaapitextoverlay, vdpautextoverlay, vaapidvdspu, vdpaudvdspu,
|
|
||||||
vaapidvbsuboverlay, vdpaudvbsuboverlay, etc.
|
|
||||||
|
|
||||||
This assumes the overlay can be done directly on the backend-specific
|
|
||||||
object passed around.
|
|
||||||
|
|
||||||
The main drawback with this solution is that it leads to a lot of
|
|
||||||
code duplication and may also lead to uncertainty about distributing
|
|
||||||
certain duplicated pieces of code. The code duplication is pretty
|
|
||||||
much unavoidable, since making textoverlay, dvbsuboverlay, dvdspu,
|
|
||||||
kate, assrender, etc. available in form of base classes to derive
|
|
||||||
from is not really an option. Similarly, one would not really want
|
|
||||||
the vaapi/vdpau plugin to depend on a bunch of other libraries
|
|
||||||
such as libpango, libkate, libtiger, libass, etc.
|
|
||||||
|
|
||||||
One could add some new kind of overlay plugin feature though in
|
|
||||||
combination with a generic base class of some sort, but in order
|
|
||||||
to accommodate all the different cases and formats one would end
|
|
||||||
up with quite convoluted/tricky API.
|
|
||||||
|
|
||||||
(Of course there could also be a GstFancyVideoBuffer that provides
|
|
||||||
an abstraction for such video accelerated objects and that could
|
|
||||||
provide an API to add overlays to it in a generic way, but in the
|
|
||||||
end this is just a less generic variant of (c), and it is not clear
|
|
||||||
that there are real benefits to a specialised solution vs. a more
|
|
||||||
generic one).
|
|
||||||
|
|
||||||
|
|
||||||
(b) convert backend-specific object to raw pixels and then overlay
|
|
||||||
|
|
||||||
Even where possible technically, this is most likely very
|
|
||||||
inefficient.
|
|
||||||
|
|
||||||
|
|
||||||
(c) attach the overlay data to the backend-specific video frame buffers
|
|
||||||
in a generic way and do the actual overlaying/blitting later in
|
|
||||||
backend-specific code such as the video sink (or an accelerated
|
|
||||||
encoder/transcoder)
|
|
||||||
|
|
||||||
In this case, the actual overlay rendering (i.e. the actual text
|
|
||||||
rendering or decoding DVD/DVB data into pixels) is done in the
|
|
||||||
subtitle-format-specific GStreamer plugin. All knowledge about
|
|
||||||
the subtitle format is contained in the overlay plugin then,
|
|
||||||
and all knowledge about the video backend in the video backend
|
|
||||||
specific plugin.
|
|
||||||
|
|
||||||
The main question then is how to get the overlay pixels (and
|
|
||||||
we will only deal with pixels here) from the overlay element
|
|
||||||
to the video sink.
|
|
||||||
|
|
||||||
This could be done in multiple ways: One could send custom
|
|
||||||
events downstream with the overlay data, or one could attach
|
|
||||||
the overlay data directly to the video buffers in some way.
|
|
||||||
|
|
||||||
Sending inline events has the advantage that is is fairly
|
|
||||||
transparent to any elements between the overlay element and
|
|
||||||
the video sink: if an effects plugin creates a new video
|
|
||||||
buffer for the output, nothing special needs to be done to
|
|
||||||
maintain the subtitle overlay information, since the overlay
|
|
||||||
data is not attached to the buffer. However, it slightly
|
|
||||||
complicates things at the sink, since it would also need to
|
|
||||||
look for the new event in question instead of just processing
|
|
||||||
everything in its buffer render function.
|
|
||||||
|
|
||||||
If one attaches the overlay data to the buffer directly, any
|
|
||||||
element between overlay and video sink that creates a new
|
|
||||||
video buffer would need to be aware of the overlay data
|
|
||||||
attached to it and copy it over to the newly-created buffer.
|
|
||||||
|
|
||||||
One would have to do implement a special kind of new query
|
|
||||||
(e.g. FEATURE query) that is not passed on automatically by
|
|
||||||
gst_pad_query_default() in order to make sure that all elements
|
|
||||||
downstream will handle the attached overlay data. (This is only
|
|
||||||
a problem if we want to also attach overlay data to raw video
|
|
||||||
pixel buffers; for new non-raw types we can just make it
|
|
||||||
mandatory and assume support and be done with it; for existing
|
|
||||||
non-raw types nothing changes anyway if subtitles don't work)
|
|
||||||
(we need to maintain backwards compatibility for existing raw
|
|
||||||
video pipelines like e.g.: ..decoder ! suboverlay ! encoder..)
|
|
||||||
|
|
||||||
Even though slightly more work, attaching the overlay information
|
|
||||||
to buffers seems more intuitive than sending it interleaved as
|
|
||||||
events. And buffers stored or passed around (e.g. via the
|
|
||||||
"last-buffer" property in the sink when doing screenshots via
|
|
||||||
playbin) always contain all the information needed.
|
|
||||||
|
|
||||||
|
|
||||||
(d) create a video/x-raw-*-delta format and use a backend-specific videomixer
|
|
||||||
|
|
||||||
This possibility was hinted at already in the digression in
|
|
||||||
section 1. It would satisfy the goal of keeping subtitle format
|
|
||||||
knowledge in the subtitle plugins and video backend knowledge
|
|
||||||
in the video backend plugin. It would also add a concept that
|
|
||||||
might be generally useful (think ximagesrc capture with xdamage).
|
|
||||||
However, it would require adding foorender variants of all the
|
|
||||||
existing overlay elements, and changing playbin to that new
|
|
||||||
design, which is somewhat intrusive. And given the general
|
|
||||||
nature of such a new format/API, we would need to take a lot
|
|
||||||
of care to be able to accommodate all possible use cases when
|
|
||||||
designing the API, which makes it considerably more ambitious.
|
|
||||||
Lastly, we would need to write videomixer variants for the
|
|
||||||
various accelerated video backends as well.
|
|
||||||
|
|
||||||
|
|
||||||
Overall (c) appears to be the most promising solution. It is the least
|
|
||||||
intrusive and should be fairly straight-forward to implement with
|
|
||||||
reasonable effort, requiring only small changes to existing elements
|
|
||||||
and requiring no new elements.
|
|
||||||
|
|
||||||
Doing the final overlaying in the sink as opposed to a videomixer
|
|
||||||
or overlay in the middle of the pipeline has other advantages:
|
|
||||||
|
|
||||||
- if video frames need to be dropped, e.g. for QoS reasons,
|
|
||||||
we could also skip the actual subtitle overlaying and
|
|
||||||
possibly the decoding/rendering as well, if the
|
|
||||||
implementation and API allows for that to be delayed.
|
|
||||||
|
|
||||||
- the sink often knows the actual size of the window/surface/screen
|
|
||||||
the output video is rendered to. This *may* make it possible to
|
|
||||||
render the overlay image in a higher resolution than the input
|
|
||||||
video, solving a long standing issue with pixelated subtitles on
|
|
||||||
top of low-resolution videos that are then scaled up in the sink.
|
|
||||||
This would require for the rendering to be delayed of course instead
|
|
||||||
of just attaching an AYUV/ARGB/RGBA blog of pixels to the video buffer
|
|
||||||
in the overlay, but that could all be supported.
|
|
||||||
|
|
||||||
- if the video backend / sink has support for high-quality text
|
|
||||||
rendering (clutter?) we could just pass the text or pango markup
|
|
||||||
to the sink and let it do the rest (this is unlikely to be
|
|
||||||
supported in the general case - text and glyph rendering is
|
|
||||||
hard; also, we don't really want to make up our own text markup
|
|
||||||
system, and pango markup is probably too limited for complex
|
|
||||||
karaoke stuff).
|
|
||||||
|
|
||||||
|
|
||||||
=== 4. API needed ===
|
|
||||||
|
|
||||||
(a) Representation of subtitle overlays to be rendered
|
|
||||||
|
|
||||||
We need to pass the overlay pixels from the overlay element to the
|
|
||||||
sink somehow. Whatever the exact mechanism, let's assume we pass
|
|
||||||
a refcounted GstVideoOverlayComposition struct or object.
|
|
||||||
|
|
||||||
A composition is made up of one or more overlays/rectangles.
|
|
||||||
|
|
||||||
In the simplest case an overlay rectangle is just a blob of
|
|
||||||
RGBA/ABGR [FIXME?] or AYUV pixels with positioning info and other
|
|
||||||
metadata, and there is only one rectangle to render.
|
|
||||||
|
|
||||||
We're keeping the naming generic ("OverlayFoo" rather than
|
|
||||||
"SubtitleFoo") here, since this might also be handy for
|
|
||||||
other use cases such as e.g. logo overlays or so. It is not
|
|
||||||
designed for full-fledged video stream mixing though.
|
|
||||||
|
|
||||||
// Note: don't mind the exact implementation details, they'll be hidden
|
|
||||||
|
|
||||||
// FIXME: might be confusing in 0.11 though since GstXOverlay was
|
|
||||||
// renamed to GstVideoOverlay in 0.11, but not much we can do,
|
|
||||||
// maybe we can rename GstVideoOverlay to something better
|
|
||||||
|
|
||||||
struct GstVideoOverlayComposition
|
|
||||||
{
|
|
||||||
guint num_rectangles;
|
|
||||||
GstVideoOverlayRectangle ** rectangles;
|
|
||||||
|
|
||||||
/* lowest rectangle sequence number still used by the upstream
|
|
||||||
* overlay element. This way a renderer maintaining some kind of
|
|
||||||
* rectangles <-> surface cache can know when to free cached
|
|
||||||
* surfaces/rectangles. */
|
|
||||||
guint min_seq_num_used;
|
|
||||||
|
|
||||||
/* sequence number for the composition (same series as rectangles) */
|
|
||||||
guint seq_num;
|
|
||||||
}
|
|
||||||
|
|
||||||
struct GstVideoOverlayRectangle
|
|
||||||
{
|
|
||||||
/* Position on video frame and dimension of output rectangle in
|
|
||||||
* output frame terms (already adjusted for the PAR of the output
|
|
||||||
* frame). x/y can be negative (overlay will be clipped then) */
|
|
||||||
gint x, y;
|
|
||||||
guint render_width, render_height;
|
|
||||||
|
|
||||||
/* Dimensions of overlay pixels */
|
|
||||||
guint width, height, stride;
|
|
||||||
|
|
||||||
/* This is the PAR of the overlay pixels */
|
|
||||||
guint par_n, par_d;
|
|
||||||
|
|
||||||
/* Format of pixels, GST_VIDEO_FORMAT_ARGB on big-endian systems,
|
|
||||||
* and BGRA on little-endian systems (i.e. pixels are treated as
|
|
||||||
* 32-bit values and alpha is always in the most-significant byte,
|
|
||||||
* and blue is in the least-significant byte).
|
|
||||||
*
|
|
||||||
* FIXME: does anyone actually use AYUV in practice? (we do
|
|
||||||
* in our utility function to blend on top of raw video)
|
|
||||||
* What about AYUV and endianness? Do we always have [A][Y][U][V]
|
|
||||||
* in memory? */
|
|
||||||
/* FIXME: maybe use our own enum? */
|
|
||||||
GstVideoFormat format;
|
|
||||||
|
|
||||||
/* Refcounted blob of memory, no caps or timestamps */
|
|
||||||
GstBuffer *pixels;
|
|
||||||
|
|
||||||
// FIXME: how to express source like text or pango markup?
|
|
||||||
// (just add source type enum + source buffer with data)
|
|
||||||
//
|
|
||||||
// FOR 0.10: always send pixel blobs, but attach source data in
|
|
||||||
// addition (reason: if downstream changes, we can't renegotiate
|
|
||||||
// that properly, if we just do a query of supported formats from
|
|
||||||
// the start). Sink will just ignore pixels and use pango markup
|
|
||||||
// from source data if it supports that.
|
|
||||||
//
|
|
||||||
// FOR 0.11: overlay should query formats (pango markup, pixels)
|
|
||||||
// supported by downstream and then only send that. We can
|
|
||||||
// renegotiate via the reconfigure event.
|
|
||||||
//
|
|
||||||
|
|
||||||
/* sequence number: useful for backends/renderers/sinks that want
|
|
||||||
* to maintain a cache of rectangles <-> surfaces. The value of
|
|
||||||
* the min_seq_num_used in the composition tells the renderer which
|
|
||||||
* rectangles have expired. */
|
|
||||||
guint seq_num;
|
|
||||||
|
|
||||||
/* FIXME: we also need a (private) way to cache converted/scaled
|
|
||||||
* pixel blobs */
|
|
||||||
}
|
|
||||||
|
|
||||||
(a1) Overlay consumer API:
|
|
||||||
|
|
||||||
How would this work in a video sink that supports scaling of textures:
|
|
||||||
|
|
||||||
gst_foo_sink_render () {
|
|
||||||
/* assume only one for now */
|
|
||||||
if video_buffer has composition:
|
|
||||||
composition = video_buffer.get_composition()
|
|
||||||
|
|
||||||
for each rectangle in composition:
|
|
||||||
if rectangle.source_data_type == PANGO_MARKUP
|
|
||||||
actor = text_from_pango_markup (rectangle.get_source_data())
|
|
||||||
else
|
|
||||||
pixels = rectangle.get_pixels_unscaled (FORMAT_RGBA, ...)
|
|
||||||
actor = texture_from_rgba (pixels, ...)
|
|
||||||
|
|
||||||
.. position + scale on top of video surface ...
|
|
||||||
}
|
|
||||||
|
|
||||||
(a2) Overlay producer API:
|
|
||||||
|
|
||||||
e.g. logo or subpicture overlay: got pixels, stuff into rectangle:
|
|
||||||
|
|
||||||
if (logoverlay->cached_composition == NULL) {
|
|
||||||
comp = composition_new ();
|
|
||||||
|
|
||||||
rect = rectangle_new (format, pixels_buf,
|
|
||||||
width, height, stride, par_n, par_d,
|
|
||||||
x, y, render_width, render_height);
|
|
||||||
|
|
||||||
/* composition adds its own ref for the rectangle */
|
|
||||||
composition_add_rectangle (comp, rect);
|
|
||||||
rectangle_unref (rect);
|
|
||||||
|
|
||||||
/* buffer adds its own ref for the composition */
|
|
||||||
video_buffer_attach_composition (comp);
|
|
||||||
|
|
||||||
/* we take ownership of the composition and save it for later */
|
|
||||||
logoverlay->cached_composition = comp;
|
|
||||||
} else {
|
|
||||||
video_buffer_attach_composition (logoverlay->cached_composition);
|
|
||||||
}
|
|
||||||
|
|
||||||
FIXME: also add some API to modify render position/dimensions of
|
|
||||||
a rectangle (probably requires creation of new rectangle, unless
|
|
||||||
we handle writability like with other mini objects).
|
|
||||||
|
|
||||||
(b) Fallback overlay rendering/blitting on top of raw video
|
|
||||||
|
|
||||||
Eventually we want to use this overlay mechanism not only for
|
|
||||||
hardware-accelerated video, but also for plain old raw video,
|
|
||||||
either at the sink or in the overlay element directly.
|
|
||||||
|
|
||||||
Apart from the advantages listed earlier in section 3, this
|
|
||||||
allows us to consolidate a lot of overlaying/blitting code that
|
|
||||||
is currently repeated in every single overlay element in one
|
|
||||||
location. This makes it considerably easier to support a whole
|
|
||||||
range of raw video formats out of the box, add SIMD-optimised
|
|
||||||
rendering using ORC, or handle corner cases correctly.
|
|
||||||
|
|
||||||
(Note: side-effect of overlaying raw video at the video sink is
|
|
||||||
that if e.g. a screnshotter gets the last buffer via the last-buffer
|
|
||||||
property of basesink, it would get an image without the subtitles
|
|
||||||
on top. This could probably be fixed by re-implementing the
|
|
||||||
property in GstVideoSink though. Playbin2 could handle this
|
|
||||||
internally as well).
|
|
||||||
|
|
||||||
void
|
|
||||||
gst_video_overlay_composition_blend (GstVideoOverlayComposition * comp
|
|
||||||
GstBuffer * video_buf)
|
|
||||||
{
|
|
||||||
guint n;
|
|
||||||
|
|
||||||
g_return_if_fail (gst_buffer_is_writable (video_buf));
|
|
||||||
g_return_if_fail (GST_BUFFER_CAPS (video_buf) != NULL);
|
|
||||||
|
|
||||||
... parse video_buffer caps into BlendVideoFormatInfo ...
|
|
||||||
|
|
||||||
for each rectangle in the composition: {
|
|
||||||
|
|
||||||
if (gst_video_format_is_yuv (video_buf_format)) {
|
|
||||||
overlay_format = FORMAT_AYUV;
|
|
||||||
} else if (gst_video_format_is_rgb (video_buf_format)) {
|
|
||||||
overlay_format = FORMAT_ARGB;
|
|
||||||
} else {
|
|
||||||
/* FIXME: grayscale? */
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
/* this will scale and convert AYUV<->ARGB if needed */
|
|
||||||
pixels = rectangle_get_pixels_scaled (rectangle, overlay_format);
|
|
||||||
|
|
||||||
... clip output rectangle ...
|
|
||||||
|
|
||||||
__do_blend (video_buf_format, video_buf->data,
|
|
||||||
overlay_format, pixels->data,
|
|
||||||
x, y, width, height, stride);
|
|
||||||
|
|
||||||
gst_buffer_unref (pixels);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
(c) Flatten all rectangles in a composition
|
|
||||||
|
|
||||||
We cannot assume that the video backend API can handle any
|
|
||||||
number of rectangle overlays, it's possible that it only
|
|
||||||
supports one single overlay, in which case we need to squash
|
|
||||||
all rectangles into one.
|
|
||||||
|
|
||||||
However, we'll just declare this a corner case for now, and
|
|
||||||
implement it only if someone actually needs it. It's easy
|
|
||||||
to add later API-wise. Might be a bit tricky if we have
|
|
||||||
rectangles with different PARs/formats (e.g. subs and a logo),
|
|
||||||
though we could probably always just use the code from (b)
|
|
||||||
with a fully transparent video buffer to create a flattened
|
|
||||||
overlay buffer.
|
|
||||||
|
|
||||||
(d) core API: new FEATURE query
|
|
||||||
|
|
||||||
For 0.10 we need to add a FEATURE query, so the overlay element
|
|
||||||
can query whether the sink downstream and all elements between
|
|
||||||
the overlay element and the sink support the new overlay API.
|
|
||||||
Elements in between need to support it because the render
|
|
||||||
positions and dimensions need to be updated if the video is
|
|
||||||
cropped or rescaled, for example.
|
|
||||||
|
|
||||||
In order to ensure that all elements support the new API,
|
|
||||||
we need to drop the query in the pad default query handler
|
|
||||||
(so it only succeeds if all elements handle it explicitly).
|
|
||||||
|
|
||||||
Might want two variants of the feature query - one where
|
|
||||||
all elements in the chain need to support it explicitly
|
|
||||||
and one where it's enough if some element downstream
|
|
||||||
supports it.
|
|
||||||
|
|
||||||
In 0.11 this could probably be handled via GstMeta and
|
|
||||||
ALLOCATION queries (and/or we could simply require
|
|
||||||
elements to be aware of this API from the start).
|
|
||||||
|
|
||||||
There appears to be no issue with downstream possibly
|
|
||||||
not being linked yet at the time when an overlay would
|
|
||||||
want to do such a query.
|
|
||||||
|
|
||||||
|
|
||||||
Other considerations:
|
|
||||||
|
|
||||||
- renderers (overlays or sinks) may be able to handle only ARGB or only AYUV
|
|
||||||
(for most graphics/hw-API it's likely ARGB of some sort, while our
|
|
||||||
blending utility functions will likely want the same colour space as
|
|
||||||
the underlying raw video format, which is usually YUV of some sort).
|
|
||||||
We need to convert where required, and should cache the conversion.
|
|
||||||
|
|
||||||
- renderers may or may not be able to scale the overlay. We need to
|
|
||||||
do the scaling internally if not (simple case: just horizontal scaling
|
|
||||||
to adjust for PAR differences; complex case: both horizontal and vertical
|
|
||||||
scaling, e.g. if subs come from a different source than the video or the
|
|
||||||
video has been rescaled or cropped between overlay element and sink).
|
|
||||||
|
|
||||||
- renderers may be able to generate (possibly scaled) pixels on demand
|
|
||||||
from the original data (e.g. a string or RLE-encoded data). We will
|
|
||||||
ignore this for now, since this functionality can still be added later
|
|
||||||
via API additions. The most interesting case would be to pass a pango
|
|
||||||
markup string, since e.g. clutter can handle that natively.
|
|
||||||
|
|
||||||
- renderers may be able to write data directly on top of the video pixels
|
|
||||||
(instead of creating an intermediary buffer with the overlay which is
|
|
||||||
then blended on top of the actual video frame), e.g. dvdspu, dvbsuboverlay
|
|
||||||
|
|
||||||
However, in the interest of simplicity, we should probably ignore the
|
|
||||||
fact that some elements can blend their overlays directly on top of the
|
|
||||||
video (decoding/uncompressing them on the fly), even more so as it's
|
|
||||||
not obvious that it's actually faster to decode the same overlay
|
|
||||||
70-90 times (say) (ie. ca. 3 seconds of video frames) and then blend
|
|
||||||
it 70-90 times instead of decoding it once into a temporary buffer
|
|
||||||
and then blending it directly from there, possibly SIMD-accelerated.
|
|
||||||
Also, this is only relevant if the video is raw video and not some
|
|
||||||
hardware-acceleration backend object.
|
|
||||||
|
|
||||||
And ultimately it is the overlay element that decides whether to do
|
|
||||||
the overlay right there and then or have the sink do it (if supported).
|
|
||||||
It could decide to keep doing the overlay itself for raw video and
|
|
||||||
only use our new API for non-raw video.
|
|
||||||
|
|
||||||
- renderers may want to make sure they only upload the overlay pixels once
|
|
||||||
per rectangle if that rectangle recurs in subsequent frames (as part of
|
|
||||||
the same composition or a different composition), as is likely. This caching
|
|
||||||
of e.g. surfaces needs to be done renderer-side and can be accomplished
|
|
||||||
based on the sequence numbers. The composition contains the lowest
|
|
||||||
sequence number still in use upstream (an overlay element may want to
|
|
||||||
cache created compositions+rectangles as well after all to re-use them
|
|
||||||
for multiple frames), based on that the renderer can expire cached
|
|
||||||
objects. The caching needs to be done renderer-side because attaching
|
|
||||||
renderer-specific objects to the rectangles won't work well given the
|
|
||||||
refcounted nature of rectangles and compositions, making it unpredictable
|
|
||||||
when a rectangle or composition will be freed or from which thread
|
|
||||||
context it will be freed. The renderer-specific objects are likely bound
|
|
||||||
to other types of renderer-specific contexts, and need to be managed
|
|
||||||
in connection with those.
|
|
||||||
|
|
||||||
- composition/rectangles should internally provide a certain degree of
|
|
||||||
thread-safety. Multiple elements (sinks, overlay element) might access
|
|
||||||
or use the same objects from multiple threads at the same time, and it
|
|
||||||
is expected that elements will keep a ref to compositions and rectangles
|
|
||||||
they push downstream for a while, e.g. until the current subtitle
|
|
||||||
composition expires.
|
|
||||||
|
|
||||||
=== 5. Future considerations ===
|
|
||||||
|
|
||||||
- alternatives: there may be multiple versions/variants of the same subtitle
|
|
||||||
stream. On DVDs, there may be a 4:3 version and a 16:9 version of the same
|
|
||||||
subtitles. We could attach both variants and let the renderer pick the best
|
|
||||||
one for the situation (currently we just use the 16:9 version). With totem,
|
|
||||||
it's ultimately totem that adds the 'black bars' at the top/bottom, so totem
|
|
||||||
also knows if it's got a 4:3 display and can/wants to fit 4:3 subs (which
|
|
||||||
may render on top of the bars) or not, for example.
|
|
||||||
|
|
||||||
=== 6. Misc. FIXMEs ===
|
|
||||||
|
|
||||||
TEST: should these look (roughly) alike (note text distortion) - needs fixing in textoverlay
|
|
||||||
|
|
||||||
gst-launch-0.10 \
|
|
||||||
videotestsrc ! video/x-raw,width=640,height=480,pixel-aspect-ratio=1/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
|
||||||
videotestsrc ! video/x-raw,width=320,height=480,pixel-aspect-ratio=2/1 ! textoverlay text=Hello font-desc=72 ! xvimagesink \
|
|
||||||
videotestsrc ! video/x-raw,width=640,height=240,pixel-aspect-ratio=1/2 ! textoverlay text=Hello font-desc=72 ! xvimagesink
|
|
||||||
|
|
||||||
~~~ THE END ~~~
|
|
||||||
|
|
|
@ -1,107 +0,0 @@
|
||||||
Interlaced Video
|
|
||||||
================
|
|
||||||
|
|
||||||
Video buffers have a number of states identifiable through a combination of caps
|
|
||||||
and buffer flags.
|
|
||||||
|
|
||||||
Possible states:
|
|
||||||
- Progressive
|
|
||||||
- Interlaced
|
|
||||||
- Plain
|
|
||||||
- One field
|
|
||||||
- Two fields
|
|
||||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
|
||||||
field that can be used for telecine pulldown
|
|
||||||
- Telecine
|
|
||||||
- One field
|
|
||||||
- Two fields
|
|
||||||
- Progressive
|
|
||||||
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
|
|
||||||
- Three fields - this should be a progressive buffer with a repeated 'first'
|
|
||||||
field that can be used for telecine pulldown
|
|
||||||
|
|
||||||
Note: It can be seen that the difference between the plain interlaced and
|
|
||||||
telecine states is that in the telecine state, buffers containing two fields may
|
|
||||||
be progressive.
|
|
||||||
|
|
||||||
Tools for identification:
|
|
||||||
- GstVideoInfo
|
|
||||||
- GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_...
|
|
||||||
- PROGRESSIVE
|
|
||||||
- INTERLEAVED
|
|
||||||
- MIXED
|
|
||||||
- Buffers flags - GST_VIDEO_BUFFER_FLAG_...
|
|
||||||
- TFF
|
|
||||||
- RFF
|
|
||||||
- ONEFIELD
|
|
||||||
- INTERLACED
|
|
||||||
|
|
||||||
|
|
||||||
Identification of Buffer States
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Note that flags are not necessarily interpreted in the same way for all
|
|
||||||
different states nor are they necessarily required nor make sense in all cases.
|
|
||||||
|
|
||||||
|
|
||||||
Progressive
|
|
||||||
...........
|
|
||||||
|
|
||||||
If the interlace mode in the video info corresponding to a buffer is
|
|
||||||
"progressive", then the buffer is progressive.
|
|
||||||
|
|
||||||
|
|
||||||
Plain Interlaced
|
|
||||||
................
|
|
||||||
|
|
||||||
If the video info interlace mode is "interleaved", then the buffer is plain
|
|
||||||
interlaced.
|
|
||||||
|
|
||||||
GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be
|
|
||||||
displayed first. The timestamp on the buffer corresponds to the first field.
|
|
||||||
|
|
||||||
GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag)
|
|
||||||
should be repeated. This is generally only used for telecine purposes but as the
|
|
||||||
telecine state was added long after the interlaced state was added and defined,
|
|
||||||
this flag remains valid for plain interlaced buffers.
|
|
||||||
|
|
||||||
GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF
|
|
||||||
flag is to be used. The other field should be ignored.
|
|
||||||
|
|
||||||
|
|
||||||
Telecine
|
|
||||||
........
|
|
||||||
|
|
||||||
If video info interlace mode is "mixed" then the buffers are in some form of
|
|
||||||
telecine state.
|
|
||||||
|
|
||||||
The TFF and ONEFIELD flags have the same semantics as for the plain interlaced
|
|
||||||
state.
|
|
||||||
|
|
||||||
GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains
|
|
||||||
only repeated fields that are present in other buffers and are as such
|
|
||||||
unneeded. For example, in a sequence of three telecined frames, we might have:
|
|
||||||
|
|
||||||
AtAb AtBb BtBb
|
|
||||||
|
|
||||||
In this situation, we only need the first and third buffers as the second
|
|
||||||
buffer contains fields present in the first and third.
|
|
||||||
|
|
||||||
Note that the following state can have its second buffer identified using the
|
|
||||||
ONEFIELD flag (and TFF not set):
|
|
||||||
|
|
||||||
AtAb AtBb BtCb
|
|
||||||
|
|
||||||
The telecine state requires one additional flag to be able to identify
|
|
||||||
progressive buffers.
|
|
||||||
|
|
||||||
The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an
|
|
||||||
'interlaced' or 'mixed' buffer that contains two fields that, when combined
|
|
||||||
with fields from adjacent buffers, allow reconstruction of progressive frames.
|
|
||||||
The absence of the flag implies the buffer containing two fields is a
|
|
||||||
progressive frame.
|
|
||||||
|
|
||||||
For example in the following sequence, the third buffer would be mixed (yes, it
|
|
||||||
is a strange pattern, but it can happen):
|
|
||||||
|
|
||||||
AtAb AtBb BtCb CtDb DtDb
|
|
|
@ -1,76 +0,0 @@
|
||||||
Media Types
|
|
||||||
-----------
|
|
||||||
|
|
||||||
audio/x-raw
|
|
||||||
|
|
||||||
format, G_TYPE_STRING, mandatory
|
|
||||||
The format of the audio samples, see the Formats section for a list
|
|
||||||
of valid sample formats.
|
|
||||||
|
|
||||||
rate, G_TYPE_INT, mandatory
|
|
||||||
The samplerate of the audio
|
|
||||||
|
|
||||||
channels, G_TYPE_INT, mandatory
|
|
||||||
The number of channels
|
|
||||||
|
|
||||||
channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels
|
|
||||||
Bitmask of channel positions present. May be omitted for mono and
|
|
||||||
stereo. May be set to 0 to denote that the channels are unpositioned.
|
|
||||||
|
|
||||||
layout, G_TYPE_STRING, mandatory
|
|
||||||
The layout of channels within a buffer. Possible values are
|
|
||||||
"interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR)
|
|
||||||
|
|
||||||
Use GstAudioInfo and related helper API to create and parse raw audio caps.
|
|
||||||
|
|
||||||
|
|
||||||
Metadata
|
|
||||||
--------
|
|
||||||
|
|
||||||
"GstAudioDownmixMeta"
|
|
||||||
A matrix for downmixing multichannel audio to a lower numer of channels.
|
|
||||||
|
|
||||||
|
|
||||||
Formats
|
|
||||||
-------
|
|
||||||
|
|
||||||
The following values can be used for the format string property.
|
|
||||||
|
|
||||||
"S8" 8-bit signed PCM audio
|
|
||||||
"U8" 8-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"S16LE" 16-bit signed PCM audio
|
|
||||||
"S16BE" 16-bit signed PCM audio
|
|
||||||
"U16LE" 16-bit unsigned PCM audio
|
|
||||||
"U16BE" 16-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"S24_32LE" 24-bit signed PCM audio packed into 32-bit
|
|
||||||
"S24_32BE" 24-bit signed PCM audio packed into 32-bit
|
|
||||||
"U24_32LE" 24-bit unsigned PCM audio packed into 32-bit
|
|
||||||
"U24_32BE" 24-bit unsigned PCM audio packed into 32-bit
|
|
||||||
|
|
||||||
"S32LE" 32-bit signed PCM audio
|
|
||||||
"S32BE" 32-bit signed PCM audio
|
|
||||||
"U32LE" 32-bit unsigned PCM audio
|
|
||||||
"U32BE" 32-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"S24LE" 24-bit signed PCM audio
|
|
||||||
"S24BE" 24-bit signed PCM audio
|
|
||||||
"U24LE" 24-bit unsigned PCM audio
|
|
||||||
"U24BE" 24-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"S20LE" 20-bit signed PCM audio
|
|
||||||
"S20BE" 20-bit signed PCM audio
|
|
||||||
"U20LE" 20-bit unsigned PCM audio
|
|
||||||
"U20BE" 20-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"S18LE" 18-bit signed PCM audio
|
|
||||||
"S18BE" 18-bit signed PCM audio
|
|
||||||
"U18LE" 18-bit unsigned PCM audio
|
|
||||||
"U18BE" 18-bit unsigned PCM audio
|
|
||||||
|
|
||||||
"F32LE" 32-bit floating-point audio
|
|
||||||
"F32BE" 32-bit floating-point audio
|
|
||||||
"F64LE" 64-bit floating-point audio
|
|
||||||
"F64BE" 64-bit floating-point audio
|
|
||||||
|
|
|
@ -1,28 +0,0 @@
|
||||||
Media Types
|
|
||||||
-----------
|
|
||||||
|
|
||||||
text/x-raw
|
|
||||||
|
|
||||||
format, G_TYPE_STRING, mandatory
|
|
||||||
The format of the text, see the Formats section for a list of valid format
|
|
||||||
strings.
|
|
||||||
|
|
||||||
Metadata
|
|
||||||
--------
|
|
||||||
|
|
||||||
There are no common metas for this raw format yet.
|
|
||||||
|
|
||||||
Formats
|
|
||||||
-------
|
|
||||||
|
|
||||||
"utf8" plain timed utf8 text (formerly text/plain)
|
|
||||||
|
|
||||||
Parsed timed text in utf8 format.
|
|
||||||
|
|
||||||
"pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup)
|
|
||||||
|
|
||||||
Same as "utf8", but text embedded in an XML-style markup language for
|
|
||||||
size, colour, emphasis, etc.
|
|
||||||
|
|
||||||
See http://developer.gnome.org/pango/stable/PangoMarkupFormat.html
|
|
||||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -1,69 +0,0 @@
|
||||||
playbin
|
|
||||||
--------
|
|
||||||
|
|
||||||
The purpose of this element is to decode and render the media contained in a
|
|
||||||
given generic uri. The element extends GstPipeline and is typically used in
|
|
||||||
playback situations.
|
|
||||||
|
|
||||||
Required features:
|
|
||||||
|
|
||||||
- accept and play any valid uri. This includes
|
|
||||||
- rendering video/audio
|
|
||||||
- overlaying subtitles on the video
|
|
||||||
- optionally read external subtitle files
|
|
||||||
- allow for hardware (non raw) sinks
|
|
||||||
- selection of audio/video/subtitle streams based on language.
|
|
||||||
- perform network buffering/incremental download
|
|
||||||
- gapless playback
|
|
||||||
- support for visualisations with configurable sizes
|
|
||||||
- ability to reject files that are too big, or of a format that would require
|
|
||||||
too much CPU/memory usage.
|
|
||||||
- be very efficient with adding elements such as converters to reduce the
|
|
||||||
amount of negotiation that has to happen.
|
|
||||||
- handle chained oggs. This includes having support for dynamic pad add and
|
|
||||||
remove from a demuxer.
|
|
||||||
|
|
||||||
Components
|
|
||||||
----------
|
|
||||||
|
|
||||||
* decodebin2
|
|
||||||
|
|
||||||
- performs the autoplugging of demuxers/decoders
|
|
||||||
- emits signals when for steering the autoplugging
|
|
||||||
- to decide if a non-raw media format is acceptable as output
|
|
||||||
- to sort the possible decoders for a non-raw format
|
|
||||||
- see also decodebin2 design doc
|
|
||||||
|
|
||||||
* uridecodebin
|
|
||||||
|
|
||||||
- combination of a source to handle the given uri, an optional queueing element
|
|
||||||
and one or more decodebin2 elements to decode the non-raw streams.
|
|
||||||
|
|
||||||
* playsink
|
|
||||||
|
|
||||||
- handles display of audio/video/text.
|
|
||||||
- has request audio/video/text input pad. There is only one sinkpad per type.
|
|
||||||
The requested pads define the configuration of the internal pipeline.
|
|
||||||
- allows for setting audio/video sinks or does automatic sink selection.
|
|
||||||
- allows for configuration of visualisation element.
|
|
||||||
- allows for enable/disable of visualisation, audio and video.
|
|
||||||
|
|
||||||
* playbin
|
|
||||||
|
|
||||||
- combination of one or more uridecodebin elements to read the uri and subtitle
|
|
||||||
uri.
|
|
||||||
- support for queuing new media to support gapless playback.
|
|
||||||
- handles stream selection.
|
|
||||||
- uses playsink to display.
|
|
||||||
- selection of sinks and configuration of uridecodebin with raw output formats.
|
|
||||||
|
|
||||||
|
|
||||||
Gapless playback
|
|
||||||
----------------
|
|
||||||
|
|
||||||
playbin has an "about-to-finish" signal. The application should configure a new
|
|
||||||
uri (and optional suburi) in the callback. When the current media finishes, this
|
|
||||||
new media will be played next.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,278 +0,0 @@
|
||||||
Design for Stereoscopic & Multiview Video Handling
|
|
||||||
==================================================
|
|
||||||
|
|
||||||
There are two cases to handle:
|
|
||||||
|
|
||||||
* Encoded video output from a demuxer to parser / decoder or from encoders into a muxer.
|
|
||||||
* Raw video buffers
|
|
||||||
|
|
||||||
The design below is somewhat based on the proposals from
|
|
||||||
[bug 611157](https://bugzilla.gnome.org/show_bug.cgi?id=611157)
|
|
||||||
|
|
||||||
Multiview is used as a generic term to refer to handling both
|
|
||||||
stereo content (left and right eye only) as well as extensions for videos
|
|
||||||
containing multiple independent viewpoints.
|
|
||||||
|
|
||||||
Encoded Signalling
|
|
||||||
------------------
|
|
||||||
This is regarding the signalling in caps and buffers from demuxers to
|
|
||||||
parsers (sometimes) or out from encoders.
|
|
||||||
|
|
||||||
For backward compatibility with existing codecs many transports of
|
|
||||||
stereoscopic 3D content use normal 2D video with 2 views packed spatially
|
|
||||||
in some way, and put extra new descriptions in the container/mux.
|
|
||||||
|
|
||||||
Info in the demuxer seems to apply to stereo encodings only. For all
|
|
||||||
MVC methods I know, the multiview encoding is in the video bitstream itself
|
|
||||||
and therefore already available to decoders. Only stereo systems have been retro-fitted
|
|
||||||
into the demuxer.
|
|
||||||
|
|
||||||
Also, sometimes extension descriptions are in the codec (e.g. H.264 SEI FPA packets)
|
|
||||||
and it would be useful to be able to put the info onto caps and buffers from the
|
|
||||||
parser without decoding.
|
|
||||||
|
|
||||||
To handle both cases, we need to be able to output the required details on
|
|
||||||
encoded video for decoders to apply onto the raw video buffers they decode.
|
|
||||||
|
|
||||||
*If there ever is a need to transport multiview info for encoded data the
|
|
||||||
same system below for raw video or some variation should work*
|
|
||||||
|
|
||||||
### Encoded Video: Properties that need to be encoded into caps
|
|
||||||
1. multiview-mode (called "Channel Layout" in bug 611157)
|
|
||||||
* Whether a stream is mono, for a single eye, stereo, mixed-mono-stereo
|
|
||||||
(switches between mono and stereo - mp4 can do this)
|
|
||||||
* Uses a buffer flag to mark individual buffers as mono or "not mono"
|
|
||||||
(single|stereo|multiview) for mixed scenarios. The alternative (not
|
|
||||||
proposed) is for the demuxer to switch caps for each mono to not-mono
|
|
||||||
change, and not used a 'mixed' caps variant at all.
|
|
||||||
* _single_ refers to a stream of buffers that only contain 1 view.
|
|
||||||
It is different from mono in that the stream is a marked left or right
|
|
||||||
eye stream for later combining in a mixer or when displaying.
|
|
||||||
* _multiple_ marks a stream with multiple independent views encoded.
|
|
||||||
It is included in this list for completeness. As noted above, there's
|
|
||||||
currently no scenario that requires marking encoded buffers as MVC.
|
|
||||||
2. Frame-packing arrangements / view sequence orderings
|
|
||||||
* Possible frame packings: side-by-side, side-by-side-quincunx,
|
|
||||||
column-interleaved, row-interleaved, top-bottom, checker-board
|
|
||||||
* bug 611157 - sreerenj added side-by-side-full and top-bottom-full but
|
|
||||||
I think that's covered by suitably adjusting pixel-aspect-ratio. If
|
|
||||||
not, they can be added later.
|
|
||||||
* _top-bottom_, _side-by-side_, _column-interleaved_, _row-interleaved_ are as the names suggest.
|
|
||||||
* _checker-board_, samples are left/right pixels in a chess grid +-+-+-/-+-+-+
|
|
||||||
* _side-by-side-quincunx_. Side By Side packing, but quincunx sampling -
|
|
||||||
1 pixel offset of each eye needs to be accounted when upscaling or displaying
|
|
||||||
* there may be other packings (future expansion)
|
|
||||||
* Possible view sequence orderings: frame-by-frame, frame-primary-secondary-tracks, sequential-row-interleaved
|
|
||||||
* _frame-by-frame_, each buffer is left, then right view etc
|
|
||||||
* _frame-primary-secondary-tracks_ - the file has 2 video tracks (primary and secondary), one is left eye, one is right.
|
|
||||||
Demuxer info indicates which one is which.
|
|
||||||
Handling this means marking each stream as all-left and all-right views, decoding separately, and combining automatically (inserting a mixer/combiner in playbin)
|
|
||||||
-> *Leave this for future expansion*
|
|
||||||
* _sequential-row-interleaved_ Mentioned by sreerenj in bug patches, I can't find a mention of such a thing. Maybe it's in MPEG-2
|
|
||||||
-> *Leave this for future expansion / deletion*
|
|
||||||
3. view encoding order
|
|
||||||
* Describes how to decide which piece of each frame corresponds to left or right eye
|
|
||||||
* Possible orderings left, right, left-then-right, right-then-left
|
|
||||||
- Need to figure out how we find the correct frame in the demuxer to start decoding when seeking in frame-sequential streams
|
|
||||||
- Need a buffer flag for marking the first buffer of a group.
|
|
||||||
4. "Frame layout flags"
|
|
||||||
* flags for view specific interpretation
|
|
||||||
* horizontal-flip-left, horizontal-flip-right, vertical-flip-left, vertical-flip-right
|
|
||||||
Indicates that one or more views has been encoded in a flipped orientation, usually due to camera with mirror or displays with mirrors.
|
|
||||||
* This should be an actual flags field. Registered GLib flags types aren't generally well supported in our caps - the type might not be loaded/registered yet when parsing a caps string, so they can't be used in caps templates in the registry.
|
|
||||||
* It might be better just to use a hex value / integer
|
|
||||||
|
|
||||||
Buffer representation for raw video
|
|
||||||
-----------------------------------
|
|
||||||
* Transported as normal video buffers with extra metadata
|
|
||||||
* The caps define the overall buffer width/height, with helper functions to
|
|
||||||
extract the individual views for packed formats
|
|
||||||
* pixel-aspect-ratio adjusted if needed to double the overall width/height
|
|
||||||
* video sinks that don't know about multiview extensions yet will show the packed view as-is
|
|
||||||
For frame-sequence outputs, things might look weird, but just adding multiview-mode to the sink caps
|
|
||||||
can disallow those transports.
|
|
||||||
* _row-interleaved_ packing is actually just side-by-side memory layout with half frame width, twice
|
|
||||||
the height, so can be handled by adjusting the overall caps and strides
|
|
||||||
* Other exotic layouts need new pixel formats defined (checker-board, column-interleaved, side-by-side-quincunx)
|
|
||||||
* _Frame-by-frame_ - one view per buffer, but with alternating metas marking which buffer is which left/right/other view and using a new buffer flag as described above
|
|
||||||
to mark the start of a group of corresponding frames.
|
|
||||||
* New video caps addition as for encoded buffers
|
|
||||||
|
|
||||||
### Proposed Caps fields
|
|
||||||
Combining the requirements above and collapsing the combinations into mnemonics:
|
|
||||||
|
|
||||||
* multiview-mode =
|
|
||||||
mono | left | right | sbs | sbs-quin | col | row | topbot | checkers |
|
|
||||||
frame-by-frame | mixed-sbs | mixed-sbs-quin | mixed-col | mixed-row |
|
|
||||||
mixed-topbot | mixed-checkers | mixed-frame-by-frame | multiview-frames mixed-multiview-frames
|
|
||||||
* multiview-flags =
|
|
||||||
+ 0x0000 none
|
|
||||||
+ 0x0001 right-view-first
|
|
||||||
+ 0x0002 left-h-flipped
|
|
||||||
+ 0x0004 left-v-flipped
|
|
||||||
+ 0x0008 right-h-flipped
|
|
||||||
+ 0x0010 right-v-flipped
|
|
||||||
|
|
||||||
### Proposed new buffer flags
|
|
||||||
Add two new GST_VIDEO_BUFFER flags in video-frame.h and make it clear that those
|
|
||||||
flags can apply to encoded video buffers too. wtay says that's currently the
|
|
||||||
case anyway, but the documentation should say it.
|
|
||||||
|
|
||||||
**GST_VIDEO_BUFFER_FLAG_MULTIPLE_VIEW** - Marks a buffer as representing non-mono content, although it may be a single (left or right) eye view.
|
|
||||||
**GST_VIDEO_BUFFER_FLAG_FIRST_IN_BUNDLE** - for frame-sequential methods of transport, mark the "first" of a left/right/other group of frames
|
|
||||||
|
|
||||||
### A new GstMultiviewMeta
|
|
||||||
This provides a place to describe all provided views in a buffer / stream,
|
|
||||||
and through Meta negotiation to inform decoders about which views to decode if
|
|
||||||
not all are wanted.
|
|
||||||
|
|
||||||
* Logical labels/names and mapping to GstVideoMeta numbers
|
|
||||||
* Standard view labels LEFT/RIGHT, and non-standard ones (strings)
|
|
||||||
|
|
||||||
GST_VIDEO_MULTIVIEW_VIEW_LEFT = 1
|
|
||||||
GST_VIDEO_MULTIVIEW_VIEW_RIGHT = 2
|
|
||||||
|
|
||||||
struct GstVideoMultiviewViewInfo {
|
|
||||||
guint view_label;
|
|
||||||
guint meta_id; // id of the GstVideoMeta for this view
|
|
||||||
|
|
||||||
padding;
|
|
||||||
}
|
|
||||||
|
|
||||||
struct GstVideoMultiviewMeta {
|
|
||||||
guint n_views;
|
|
||||||
GstVideoMultiviewViewInfo *view_info;
|
|
||||||
}
|
|
||||||
|
|
||||||
The meta is optional, and probably only useful later for MVC
|
|
||||||
|
|
||||||
|
|
||||||
Outputting stereo content
|
|
||||||
-------------------------
|
|
||||||
The initial implementation for output will be stereo content in glimagesink
|
|
||||||
|
|
||||||
### Output Considerations with OpenGL
|
|
||||||
* If we have support for stereo GL buffer formats, we can output separate left/right eye images and let the hardware take care of display.
|
|
||||||
* Otherwise, glimagesink needs to render one window with left/right in a suitable frame packing
|
|
||||||
and that will only show correctly in fullscreen on a device set for the right 3D packing -> requires app intervention to set the video mode.
|
|
||||||
* Which could be done manually on the TV, or with HDMI 1.4 by setting the right video mode for the screen to inform the TV or third option, we
|
|
||||||
support rendering to two separate overlay areas on the screen - one for left eye, one for right which can be supported using the 'splitter' element and 2 output sinks or, better, add a 2nd window overlay for split stereo output
|
|
||||||
* Intel hardware doesn't do stereo GL buffers - only nvidia and AMD, so initial implementation won't include that
|
|
||||||
|
|
||||||
## Other elements for handling multiview content
|
|
||||||
* videooverlay interface extensions
|
|
||||||
* __Q__: Should this be a new interface?
|
|
||||||
* Element message to communicate the presence of stereoscopic information to the app
|
|
||||||
* App needs to be able to override the input interpretation - ie, set multiview-mode and multiview-flags
|
|
||||||
* Most videos I've seen are side-by-side or top-bottom with no frame-packing metadata
|
|
||||||
* New API for the app to set rendering options for stereo/multiview content
|
|
||||||
* This might be best implemented as a **multiview GstContext**, so that
|
|
||||||
the pipeline can share app preferences for content interpretation and downmixing
|
|
||||||
to mono for output, or in the sink and have those down as far upstream/downstream as possible.
|
|
||||||
* Converter element
|
|
||||||
* convert different view layouts
|
|
||||||
* Render to anaglyphs of different types (magenta/green, red/blue, etc) and output as mono
|
|
||||||
* Mixer element
|
|
||||||
* take 2 video streams and output as stereo
|
|
||||||
* later take n video streams
|
|
||||||
* share code with the converter, it just takes input from n pads instead of one.
|
|
||||||
* Splitter element
|
|
||||||
* Output one pad per view
|
|
||||||
|
|
||||||
### Implementing MVC handling in decoders / parsers (and encoders)
|
|
||||||
Things to do to implement MVC handling
|
|
||||||
|
|
||||||
1. Parsing SEI in h264parse and setting caps (patches available in
|
|
||||||
bugzilla for parsing, see below)
|
|
||||||
2. Integrate gstreamer-vaapi MVC support with this proposal
|
|
||||||
3. Help with [libav MVC implementation](https://wiki.libav.org/Blueprint/MVC)
|
|
||||||
4. generating SEI in H.264 encoder
|
|
||||||
5. Support for MPEG2 MVC extensions
|
|
||||||
|
|
||||||
## Relevant bugs
|
|
||||||
[bug 685215](https://bugzilla.gnome.org/show_bug.cgi?id=685215) - codecparser h264: Add initial MVC parser
|
|
||||||
[bug 696135](https://bugzilla.gnome.org/show_bug.cgi?id=696135) - h264parse: Add mvc stream parsing support
|
|
||||||
[bug 732267](https://bugzilla.gnome.org/show_bug.cgi?id=732267) - h264parse: extract base stream from MVC or SVC encoded streams
|
|
||||||
|
|
||||||
## Other Information
|
|
||||||
[Matroska 3D support notes](http://www.matroska.org/technical/specs/notes.html#3D)
|
|
||||||
|
|
||||||
## Open Questions
|
|
||||||
|
|
||||||
### Background
|
|
||||||
|
|
||||||
### Representation for GstGL
|
|
||||||
When uploading raw video frames to GL textures, the goal is to implement:
|
|
||||||
|
|
||||||
2. Split packed frames into separate GL textures when uploading, and
|
|
||||||
attach multiple GstGLMemory's to the GstBuffer. The multiview-mode and
|
|
||||||
multiview-flags fields in the caps should change to reflect the conversion
|
|
||||||
from one incoming GstMemory to multiple GstGLMemory, and change the
|
|
||||||
width/height in the output info as needed.
|
|
||||||
|
|
||||||
This is (currently) targetted as 2 render passes - upload as normal
|
|
||||||
to a single stereo-packed RGBA texture, and then unpack into 2
|
|
||||||
smaller textures, output with GST_VIDEO_MULTIVIEW_MODE_SEPARATED, as
|
|
||||||
2 GstGLMemory attached to one buffer. We can optimise the upload later
|
|
||||||
to go directly to 2 textures for common input formats.
|
|
||||||
|
|
||||||
Separat output textures have a few advantages:
|
|
||||||
|
|
||||||
* Filter elements can more easily apply filters in several passes to each
|
|
||||||
texture without fundamental changes to our filters to avoid mixing pixels
|
|
||||||
from separate views.
|
|
||||||
* Centralises the sampling of input video frame packings in the upload code,
|
|
||||||
which makes adding new packings in the future easier.
|
|
||||||
* Sampling multiple textures to generate various output frame-packings
|
|
||||||
for display is conceptually simpler than converting from any input packing
|
|
||||||
to any output packing.
|
|
||||||
* In implementations that support quad buffers, having separate textures
|
|
||||||
makes it trivial to do GL_LEFT/GL_RIGHT output
|
|
||||||
|
|
||||||
For either option, we'll need new glsink output API to pass more
|
|
||||||
information to applications about multiple views for the draw signal/callback.
|
|
||||||
|
|
||||||
I don't know if it's desirable to support *both* methods of representing
|
|
||||||
views. If so, that should be signalled in the caps too. That could be a
|
|
||||||
new multiview-mode for passing views in separate GstMemory objects
|
|
||||||
attached to a GstBuffer, which would not be GL specific.
|
|
||||||
|
|
||||||
### Overriding frame packing interpretation
|
|
||||||
Most sample videos available are frame packed, with no metadata
|
|
||||||
to say so. How should we override that interpretation?
|
|
||||||
|
|
||||||
* Simple answer: Use capssetter + new properties on playbin to
|
|
||||||
override the multiview fields
|
|
||||||
*Basically implemented in playbin, using a pad probe. Needs more work for completeness*
|
|
||||||
|
|
||||||
### Adding extra GstVideoMeta to buffers
|
|
||||||
There should be one GstVideoMeta for the entire video frame in packed
|
|
||||||
layouts, and one GstVideoMeta per GstGLMemory when views are attached
|
|
||||||
to a GstBuffer separately. This should be done by the buffer pool,
|
|
||||||
which knows from the caps.
|
|
||||||
|
|
||||||
### videooverlay interface extensions
|
|
||||||
GstVideoOverlay needs:
|
|
||||||
|
|
||||||
* A way to announce the presence of multiview content when it is
|
|
||||||
detected/signalled in a stream.
|
|
||||||
* A way to tell applications which output methods are supported/available
|
|
||||||
* A way to tell the sink which output method it should use
|
|
||||||
* Possibly a way to tell the sink to override the input frame
|
|
||||||
interpretation / caps - depends on the answer to the question
|
|
||||||
above about how to model overriding input interpretation.
|
|
||||||
|
|
||||||
### What's implemented
|
|
||||||
* Caps handling
|
|
||||||
* gst-plugins-base libsgstvideo pieces
|
|
||||||
* playbin caps overriding
|
|
||||||
* conversion elements - glstereomix, gl3dconvert (needs a rename),
|
|
||||||
glstereosplit.
|
|
||||||
|
|
||||||
### Possible future enhancements
|
|
||||||
* Make GLupload split to separate textures at upload time?
|
|
||||||
* Needs new API to extract multiple textures from the upload. Currently only outputs 1 result RGBA texture.
|
|
||||||
* Make GLdownload able to take 2 input textures, pack them and colorconvert / download as needed.
|
|
||||||
- current done by packing then downloading which isn't OK overhead for RGBA download
|
|
||||||
* Think about how we integrate GLstereo - do we need to do anything special,
|
|
||||||
or can the app just render to stereo/quad buffers if they're available?
|
|
Loading…
Reference in a new issue