mirror of
https://gitlab.freedesktop.org/gstreamer/gstreamer.git
synced 2024-11-08 18:39:54 +00:00
572 lines
16 KiB
Text
572 lines
16 KiB
Text
|
Encoding and Muxing
|
||
|
-------------------
|
||
|
|
||
|
Summary
|
||
|
-------
|
||
|
A. Problems
|
||
|
B. Goals
|
||
|
1. EncodeBin
|
||
|
2. Encoding Profile System
|
||
|
3. Helper Library for Profiles
|
||
|
I. Use-cases researched
|
||
|
|
||
|
|
||
|
A. Problems this proposal attempts to solve
|
||
|
-------------------------------------------
|
||
|
|
||
|
* Duplication of pipeline code for gstreamer-based applications
|
||
|
wishing to encode and or mux streams, leading to subtle differences
|
||
|
and inconsistencies accross those applications.
|
||
|
|
||
|
* No unified system for describing encoding targets for applications
|
||
|
in a user-friendly way.
|
||
|
|
||
|
* No unified system for creating encoding targets for applications,
|
||
|
resulting in duplication of code accross all applications,
|
||
|
differences and inconsistencies that come with that duplication,
|
||
|
and applications hardcoding element names and settings resulting in
|
||
|
poor portability.
|
||
|
|
||
|
|
||
|
|
||
|
B. Goals
|
||
|
--------
|
||
|
|
||
|
1. Convenience encoding element
|
||
|
|
||
|
Create a convenience GstBin for encoding and muxing several streams,
|
||
|
hereafter called 'EncodeBin'.
|
||
|
|
||
|
This element will only contain one single property, which is a
|
||
|
profile.
|
||
|
|
||
|
2. Define a encoding profile system
|
||
|
|
||
|
2. Encoding profile helper library
|
||
|
|
||
|
Create a helper library to:
|
||
|
* create EncodeBin instances based on profiles, and
|
||
|
* help applications to create/load/save/browse those profiles.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
1. EncodeBin
|
||
|
------------
|
||
|
|
||
|
1.1 Proposed API
|
||
|
----------------
|
||
|
|
||
|
EncodeBin is a GstBin subclass.
|
||
|
|
||
|
It implements the GstTagSetter interface, by which it will proxy the
|
||
|
calls to the muxer.
|
||
|
|
||
|
Only two introspectable property (i.e. usable without extra API):
|
||
|
* A GstEncodingProfile*
|
||
|
* The name of the profile to use
|
||
|
|
||
|
When a profile is selected, encodebin will:
|
||
|
* Add REQUEST sinkpads for all the GstStreamProfile
|
||
|
* Create the muxer and expose the source pad
|
||
|
|
||
|
Whenever a request pad is created, encodebin will:
|
||
|
* Create the chain of elements for that pad
|
||
|
* Ghost the sink pad
|
||
|
* Return that ghost pad
|
||
|
|
||
|
This allows reducing the code to the minimum for applications
|
||
|
wishing to encode a source for a given profile:
|
||
|
|
||
|
...
|
||
|
|
||
|
encbin = gst_element_factory_make("encodebin, NULL);
|
||
|
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
|
||
|
gst_element_link (encbin, filesink);
|
||
|
|
||
|
...
|
||
|
|
||
|
vsrcpad = gst_element_get_src_pad(source, "src1");
|
||
|
vsinkpad = gst_element_get_request_pad (encbin, "video_%d");
|
||
|
gst_pad_link(vsrcpad, vsinkpad);
|
||
|
|
||
|
...
|
||
|
|
||
|
|
||
|
1.2 Explanation of the Various stages in EncodeBin
|
||
|
--------------------------------------------------
|
||
|
|
||
|
This describes the various stages which can happen in order to end
|
||
|
up with a multiplexed stream that can then be stored or streamed.
|
||
|
|
||
|
1.2.1 Incoming streams
|
||
|
|
||
|
The streams fed to EncodeBin can be of various types:
|
||
|
|
||
|
* Video
|
||
|
* Uncompressed (but maybe subsampled)
|
||
|
* Compressed
|
||
|
* Audio
|
||
|
* Uncompressed (audio/x-raw-{int|float})
|
||
|
* Compressed
|
||
|
* Timed text
|
||
|
* Private streams
|
||
|
|
||
|
|
||
|
1.2.2 Steps involved for raw video encoding
|
||
|
|
||
|
(0) Incoming Stream
|
||
|
|
||
|
(1) Transform raw video feed (optional)
|
||
|
|
||
|
Here we modify the various fundamental properties of a raw video
|
||
|
stream to be compatible with the intersection of:
|
||
|
* The encoder GstCaps and
|
||
|
* The specified "Stream Restriction" of the profile/target
|
||
|
|
||
|
The fundamental properties that can be modified are:
|
||
|
* width/height
|
||
|
This is done with a video scaler.
|
||
|
The DAR (Display Aspect Ratio) MUST be respected.
|
||
|
If needed, black borders can be added to comply with the target DAR.
|
||
|
* framerate
|
||
|
* format/colorspace/depth
|
||
|
All of this is done with a colorspace converter
|
||
|
|
||
|
(2) Actual encoding (optional for raw streams)
|
||
|
|
||
|
An encoder (with some optional settings) is used.
|
||
|
|
||
|
(3) Muxing
|
||
|
|
||
|
A muxer (with some optional settings) is used.
|
||
|
|
||
|
(4) Outgoing encoded and muxed stream
|
||
|
|
||
|
|
||
|
1.2.3 Steps involved for raw audio encoding
|
||
|
|
||
|
This is roughly the same as for raw video, expect for (1)
|
||
|
|
||
|
(1) Transform raw audo feed (optional)
|
||
|
|
||
|
We modify the various fundamental properties of a raw audio stream to
|
||
|
be compatible with the intersection of:
|
||
|
* The encoder GstCaps and
|
||
|
* The specified "Stream Restriction" of the profile/target
|
||
|
|
||
|
The fundamental properties that can be modifier are:
|
||
|
* Number of channels
|
||
|
* Type of raw audio (integer or floating point)
|
||
|
* Depth (number of bits required to encode one sample)
|
||
|
|
||
|
|
||
|
1.2.4 Steps involved for encoded audio/video streams
|
||
|
|
||
|
Steps (1) and (2) are replaced by a parser if a parser is available
|
||
|
for the given format.
|
||
|
|
||
|
|
||
|
1.2.5 Steps involved for other streams
|
||
|
|
||
|
Other streams will just be forwarded as-is to the muxer, provided the
|
||
|
muxer accepts the stream type.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
2. Encoding Profile System
|
||
|
--------------------------
|
||
|
|
||
|
This work is based on:
|
||
|
* The existing GstPreset system for elements [0]
|
||
|
* The gnome-media GConf audio profile system [1]
|
||
|
* The investigation done into device profiles by Arista and
|
||
|
Transmageddon [2 and 3]
|
||
|
|
||
|
2.2 Terminology
|
||
|
---------------
|
||
|
|
||
|
* Encoding Target Category
|
||
|
A Target Category is a classification of devices/systems/use-cases
|
||
|
for encoding.
|
||
|
|
||
|
Such a classification is required in order for:
|
||
|
* Applications with a very-specific use-case to limit the number of
|
||
|
profiles they can offer the user. A screencasting application has
|
||
|
no use with the online services targets for example.
|
||
|
* Offering the user some initial classification in the case of a
|
||
|
more generic encoding application (like a video editor or a
|
||
|
transcoder).
|
||
|
|
||
|
Ex:
|
||
|
Consumer devices
|
||
|
Online service
|
||
|
Intermediate Editing Format
|
||
|
Screencast
|
||
|
Capture
|
||
|
Computer
|
||
|
|
||
|
* Encoding Profile Target
|
||
|
A Profile Target describes a specific entity for which we wish to
|
||
|
encode.
|
||
|
A Profile Target must belong to at least one Target Category.
|
||
|
It will define at least one Encoding Profile.
|
||
|
|
||
|
Ex (with category):
|
||
|
Nokia N900 (Consumer device)
|
||
|
Sony PlayStation 3 (Consumer device)
|
||
|
Youtube (Online service)
|
||
|
DNxHD (Intermediate editing format)
|
||
|
HuffYUV (Screencast)
|
||
|
Theora (Computer)
|
||
|
|
||
|
* Encoding Profile
|
||
|
A specific combination of muxer, encoders, presets and limitations.
|
||
|
|
||
|
Ex:
|
||
|
Nokia N900/H264 HQ
|
||
|
Ipod/High Quality
|
||
|
DVD/Pal
|
||
|
Youtube/High Quality
|
||
|
HTML5/Low Bandwith
|
||
|
DNxHD
|
||
|
|
||
|
2.3 Encoding Profile
|
||
|
--------------------
|
||
|
|
||
|
An encoding profile requires the following information:
|
||
|
|
||
|
* Name
|
||
|
This string is not translatable and must be unique.
|
||
|
A recommendation to guarantee uniqueness of the naming could be:
|
||
|
<target>/<name>
|
||
|
* Description
|
||
|
This is a translatable string describing the profile
|
||
|
* Muxing format
|
||
|
This is a string containing the GStreamer media-type of the
|
||
|
container format.
|
||
|
* Muxing preset
|
||
|
This is an optional string describing the preset(s) to use on the
|
||
|
muxer.
|
||
|
* Multipass setting
|
||
|
This is a boolean describing whether the profile requires several
|
||
|
passes.
|
||
|
* List of Stream Profile
|
||
|
|
||
|
2.3.1 Stream Profiles
|
||
|
|
||
|
A Stream Profile consists of:
|
||
|
|
||
|
* Type
|
||
|
The type of stream profile (audio, video, text, private-data)
|
||
|
* Encoding Format
|
||
|
This is a string containing the GStreamer media-type of the encoding
|
||
|
format to be used. If encoding is not to be applied, the raw audio
|
||
|
media type will be used.
|
||
|
* Encoding preset
|
||
|
This is an optional string describing the preset(s) to use on the
|
||
|
encoder.
|
||
|
* Restriction
|
||
|
This is an optional GstCaps containing the restriction of the
|
||
|
stream that can be fed to the encoder.
|
||
|
This will generally containing restrictions in video
|
||
|
width/heigh/framerate or audio depth.
|
||
|
* presence
|
||
|
This is an integer specifying how many streams can be used in the
|
||
|
containing profile. 0 means that any number of streams can be
|
||
|
used.
|
||
|
* pass
|
||
|
This is an integer which is only meaningful if the multipass flag
|
||
|
has been set in the profile. If it has been set it indicates which
|
||
|
pass this Stream Profile corresponds to.
|
||
|
|
||
|
2.4 Example profile
|
||
|
-------------------
|
||
|
|
||
|
The representation used here is XML only as an example. No decision is
|
||
|
made as to which formatting to use for storing targets and profiles.
|
||
|
|
||
|
<gst-encoding-target>
|
||
|
<name>Nokia N900</name>
|
||
|
<category>Consumer Device</category>
|
||
|
<profiles>
|
||
|
<profile>Nokia N900/H264 HQ</profile>
|
||
|
<profile>Nokia N900/MP3</profile>
|
||
|
<profile>Nokia N900/AAC</profile>
|
||
|
</profiles>
|
||
|
</gst-encoding-target>
|
||
|
|
||
|
<gst-encoding-profile>
|
||
|
<name>Nokia N900/H264 HQ</name>
|
||
|
<description>
|
||
|
High Quality H264/AAC for the Nokia N900
|
||
|
</description>
|
||
|
<format>video/quicktime,variant=iso</format>
|
||
|
<streams>
|
||
|
<stream-profile>
|
||
|
<type>audio</type>
|
||
|
<format>audio/mpeg,mpegversion=4</format>
|
||
|
<preset>Quality High/Main</preset>
|
||
|
<restriction>audio/x-raw-int,channels=[1,2]</restriction>
|
||
|
<presence>1</presence>
|
||
|
</stream-profile>
|
||
|
<stream-profile>
|
||
|
<type>video</type>
|
||
|
<format>video/x-h264</format>
|
||
|
<preset>Profile Baseline/Quality High</preset>
|
||
|
<restriction>
|
||
|
video/x-raw-yuv,width=[16, 800],\
|
||
|
height=[16, 480],framerate=[1/1, 30000/1001]
|
||
|
</restriction>
|
||
|
<presence>1</presence>
|
||
|
</stream-profile>
|
||
|
</streams>
|
||
|
|
||
|
</gst-encoding-profile>
|
||
|
|
||
|
2.5 API
|
||
|
-------
|
||
|
A proposed C API is contained in the gstprofile.h file in this directory.
|
||
|
|
||
|
|
||
|
2.6 Modifications required in the existing GstPreset system
|
||
|
-----------------------------------------------------------
|
||
|
|
||
|
2.6.1. Temporary preset.
|
||
|
|
||
|
Currently a preset needs to be saved on disk in order to be
|
||
|
used.
|
||
|
|
||
|
This makes it impossible to have temporary presets (that exist only
|
||
|
during the lifetime of a process), which might be required in the
|
||
|
new proposed profile system
|
||
|
|
||
|
2.6.2 Categorisation of presets.
|
||
|
|
||
|
Currently presets are just aliases of a group of property/value
|
||
|
without any meanings or explanation as to how they exclude each
|
||
|
other.
|
||
|
|
||
|
Take for example the H264 encoder. It can have presets for:
|
||
|
* passes (1,2 or 3 passes)
|
||
|
* profiles (Baseline, Main, ...)
|
||
|
* quality (Low, medium, High)
|
||
|
|
||
|
In order to programmatically know which presets exclude each other,
|
||
|
we here propose the categorisation of these presets.
|
||
|
|
||
|
This can be done in one of two ways
|
||
|
1. in the name (by making the name be [<category>:]<name>)
|
||
|
This would give for example: "Quality:High", "Profile:Baseline"
|
||
|
2. by adding a new _meta key
|
||
|
This would give for example: _meta/category:quality
|
||
|
|
||
|
2.6.3 Aggregation of presets.
|
||
|
|
||
|
There can be more than one choice of presets to be done for an
|
||
|
element (quality, profile, pass).
|
||
|
|
||
|
This means that one can not currently describe the full
|
||
|
configuration of an element with a single string but with many.
|
||
|
|
||
|
The proposal here is to extend the GstPreset API to be able to set
|
||
|
all presets using one string and a well-known separator ('/').
|
||
|
|
||
|
This change only requires changes in the core preset handling code.
|
||
|
|
||
|
This would allow doing the following:
|
||
|
gst_preset_load_preset (h264enc,
|
||
|
"pass:1/profile:baseline/quality:high");
|
||
|
|
||
|
2.7 Points to be determined
|
||
|
---------------------------
|
||
|
|
||
|
This document hasn't determined yet how to solve the following
|
||
|
problems:
|
||
|
|
||
|
2.7.1 Storage of profiles
|
||
|
|
||
|
One proposal for storage would be to use a system wide directory
|
||
|
(like $prefix/share/gstreamer-0.10/profiles) and store XML files for
|
||
|
every individual profiles.
|
||
|
|
||
|
Users could then add their own profiles in ~/.gstreamer-0.10/profiles
|
||
|
|
||
|
This poses some limitations as to what to do if some applications
|
||
|
want to have some profiles limited to their own usage.
|
||
|
|
||
|
|
||
|
3. Helper library for profiles
|
||
|
------------------------------
|
||
|
|
||
|
These helper methods could also be added to existing libraries (like
|
||
|
GstPreset, GstPbUtils, ..).
|
||
|
|
||
|
The various API proposed are in the accompanying gstprofile.h file.
|
||
|
|
||
|
3.1 Getting user-readable names for formats
|
||
|
|
||
|
This is already provided by GstPbUtils.
|
||
|
|
||
|
3.2 Hierarchy of profiles
|
||
|
|
||
|
The goal is for applications to be able to present to the user a list
|
||
|
of combo-boxes for choosing their output profile:
|
||
|
|
||
|
[ Category ] # optional, depends on the application
|
||
|
[ Device/Site/.. ] # optional, depends on the application
|
||
|
[ Profile ]
|
||
|
|
||
|
Convenience methods are offered to easily get lists of categories,
|
||
|
devices, and profiles.
|
||
|
|
||
|
3.3 Creating Profiles
|
||
|
|
||
|
The goal is for applications to be able to easily create profiles.
|
||
|
|
||
|
The applications needs to be able to have a fast/efficient way to:
|
||
|
* select a container format and see all compatible streams he can use
|
||
|
with it.
|
||
|
* select a codec format and see which container formats he can use
|
||
|
with it.
|
||
|
|
||
|
The remaining parts concern the restrictions to encoder
|
||
|
input.
|
||
|
|
||
|
3.4 Ensuring availability of plugins for Profiles
|
||
|
|
||
|
When an application wishes to use a Profile, it should be able to
|
||
|
query whether it has all the needed plugins to use it.
|
||
|
|
||
|
This part will use GstPbUtils to query, and if needed install the
|
||
|
missing plugins through the installed distribution plugin installer.
|
||
|
|
||
|
|
||
|
I. Use-cases researched
|
||
|
-----------------------
|
||
|
|
||
|
This is a list of various use-cases where encoding/muxing is being
|
||
|
used.
|
||
|
|
||
|
* Transcoding
|
||
|
|
||
|
The goal is to convert with as minimal loss of quality any input
|
||
|
file for a target use.
|
||
|
A specific variant of this is transmuxing (see below).
|
||
|
|
||
|
Example applications: Arista, Transmageddon
|
||
|
|
||
|
* Rendering timelines
|
||
|
|
||
|
The incoming streams are a collection of various segments that need
|
||
|
to be rendered.
|
||
|
Those segments can vary in nature (i.e. the video width/height can
|
||
|
change).
|
||
|
This requires the use of identiy with the single-segment property
|
||
|
activated to transform the incoming collection of segments to a
|
||
|
single continuous segment.
|
||
|
|
||
|
Example applications: PiTiVi, Jokosher
|
||
|
|
||
|
* Encoding of live sources
|
||
|
|
||
|
The major risk to take into account is the encoder not encoding the
|
||
|
incoming stream fast enough. This is outside of the scope of
|
||
|
encodebin, and should be solved by using queues between the sources
|
||
|
and encodebin, as well as implementing QoS in encoders and sources
|
||
|
(the encoders emitting QoS events, and the upstream elements
|
||
|
adapting themselves accordingly).
|
||
|
|
||
|
Example applications: camerabin, cheese
|
||
|
|
||
|
* Screencasting applications
|
||
|
|
||
|
This is similar to encoding of live sources.
|
||
|
The difference being that due to the nature of the source (size and
|
||
|
amount/frequency of updates) one might want to do the encoding in
|
||
|
two parts:
|
||
|
* The actual live capture is encoded with a 'almost-lossless' codec
|
||
|
(such as huffyuv)
|
||
|
* Once the capture is done, the file created in the first step is
|
||
|
then rendered to the desired target format.
|
||
|
|
||
|
Fixing sources to only emit region-updates and having encoders
|
||
|
capable of encoding those streams would fix the need for the first
|
||
|
step but is outside of the scope of encodebin.
|
||
|
|
||
|
Example applications: Istanbul, gnome-shell, recordmydesktop
|
||
|
|
||
|
* Live transcoding
|
||
|
|
||
|
This is the case of an incoming live stream which will be
|
||
|
broadcasted/transmitted live.
|
||
|
One issue to take into account is to reduce the encoding latency to
|
||
|
a minimum. This should mostly be done by picking low-latency
|
||
|
encoders.
|
||
|
|
||
|
Example applications: Rygel, Coherence
|
||
|
|
||
|
* Transmuxing
|
||
|
|
||
|
Given a certain file, the aim is to remux the contents WITHOUT
|
||
|
decoding into either a different container format or the same
|
||
|
container format.
|
||
|
Remuxing into the same container format is useful when the file was
|
||
|
not created properly (for example, the index is missing).
|
||
|
Whenever available, parsers should be applied on the encoded streams
|
||
|
to validate and/or fix the streams before muxing them.
|
||
|
|
||
|
Metadata from the original file must be kept in the newly created
|
||
|
file.
|
||
|
|
||
|
Example applications: Arista, Transmaggedon
|
||
|
|
||
|
* Loss-less cutting
|
||
|
|
||
|
Given a certain file, the aim is to extract a certain part of the
|
||
|
file without going through the process of decoding and re-encoding
|
||
|
that file.
|
||
|
This is similar to the transmuxing use-case.
|
||
|
|
||
|
Example applications: PiTiVi, Transmageddon, Arista, ...
|
||
|
|
||
|
* Multi-pass encoding
|
||
|
|
||
|
Some encoders allow doing a multi-pass encoding.
|
||
|
The initial pass(es) are only used to collect encoding estimates and
|
||
|
are not actually muxed and outputted.
|
||
|
The final pass uses previously collected information, and the output
|
||
|
is then muxed and outputted.
|
||
|
|
||
|
* Archiving and intermediary format
|
||
|
|
||
|
The requirement is to have lossless
|
||
|
|
||
|
* CD ripping
|
||
|
|
||
|
Example applications: Sound-juicer
|
||
|
|
||
|
* DVD ripping
|
||
|
|
||
|
Example application: Thoggen
|
||
|
|
||
|
|
||
|
|
||
|
* Research links
|
||
|
|
||
|
Some of these are still active documents, some other not
|
||
|
|
||
|
[0] GstPreset API documentation
|
||
|
http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
|
||
|
|
||
|
[1] gnome-media GConf profiles
|
||
|
http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
|
||
|
|
||
|
[2] Research on a Device Profile API
|
||
|
http://gstreamer.freedesktop.org/wiki/DeviceProfile
|
||
|
|
||
|
[3] Research on defining presets usage
|
||
|
http://gstreamer.freedesktop.org/wiki/PresetDesign
|
||
|
|