docs/design: Add encoding/profile proposal/design

This commit is contained in:
Edward Hervey 2009-10-07 16:23:22 +02:00
parent 640cf95158
commit 07b1bbef43
4 changed files with 794 additions and 0 deletions

View file

@ -0,0 +1,106 @@
GStreamer: Research into encoding and muxing
--------------------------------------------
Use Cases
---------
This is a list of various use-cases where encoding/muxing is being
used.
* Transcoding
The goal is to convert with as minimal loss of quality any input
file for a target use.
A specific variant of this is transmuxing (see below).
Example applications: Arista, Transmageddon
* Rendering timelines
The incoming streams are a collection of various segments that need
to be rendered.
Those segments can vary in nature (i.e. the video width/height can
change).
This requires the use of identiy with the single-segment property
activated to transform the incoming collection of segments to a
single continuous segment.
Example applications: PiTiVi, Jokosher
* Encoding of live sources
The major risk to take into account is the encoder not encoding the
incoming stream fast enough. This is outside of the scope of
encodebin, and should be solved by using queues between the sources
and encodebin, as well as implementing QoS in encoders and sources
(the encoders emitting QoS events, and the upstream elements
adapting themselves accordingly).
Example applications: camerabin, cheese
* Screencasting applications
This is similar to encoding of live sources.
The difference being that due to the nature of the source (size and
amount/frequency of updates) one might want to do the encoding in
two parts:
* The actual live capture is encoded with a 'almost-lossless' codec
(such as huffyuv)
* Once the capture is done, the file created in the first step is
then rendered to the desired target format.
Fixing sources to only emit region-updates and having encoders
capable of encoding those streams would fix the need for the first
step but is outside of the scope of encodebin.
Example applications: Istanbul, gnome-shell, recordmydesktop
* Live transcoding
This is the case of an incoming live stream which will be
broadcasted/transmitted live.
One issue to take into account is to reduce the encoding latency to
a minimum. This should mostly be done by picking low-latency
encoders.
Example applications: Rygel, Coherence
* Transmuxing
Given a certain file, the aim is to remux the contents WITHOUT
decoding into either a different container format or the same
container format.
Remuxing into the same container format is useful when the file was
not created properly (for example, the index is missing).
Whenever available, parsers should be applied on the encoded streams
to validate and/or fix the streams before muxing them.
Metadata from the original file must be kept in the newly created
file.
Example applications: Arista, Transmaggedon
* Loss-less cutting
Given a certain file, the aim is to extract a certain part of the
file without going through the process of decoding and re-encoding
that file.
This is similar to the transmuxing use-case.
Example applications: PiTiVi, Transmageddon, Arista, ...
* Multi-pass encoding
Some encoders allow doing a multi-pass encoding.
The initial pass(es) are only used to collect encoding estimates and
are not actually muxed and outputted.
The final pass uses previously collected information, and the output
is then muxed and outputted.
* CD ripping
Example applications: Sound-juicer
* DVD ripping

423
docs/design/encoding.txt Normal file
View file

@ -0,0 +1,423 @@
Encoding and Muxing
-------------------
Summary
-------
0 Problems
0 Goals
1. EncodeBin
2. Encoding Profile System
3. Helper Library for Profiles
0. Problems this proposal attempts to solve
-------------------------------------------
* Duplication of pipeline code for gstreamer-based applications
wishing to encode and or mux streams, leading to subtle differences
and inconsistencies accross those applications.
* No unified system for describing encoding targets for applications
in a user-friendly way.
* No unified system for creating encoding targets for applications,
resulting in duplication of code accross all applications,
differences and inconsistencies that come with that duplication.
0. Goals
--------
1. Convenience encoding element
Create a convenience GstBin for encoding and muxing several streams,
hereafter called 'EncodeBin'.
This element will only contain one single property, which is a
profile.
2. Define a encoding profile system
2. Encoding profile helper library
Create a helper library to:
* create EncodeBin instances based on profiles, and
* help applications to create/load/save those profiles.
1. EncodeBin
------------
1.1 Proposed API
----------------
EncodeBin is a GstBin subclass.
It implements the GstTagSetter interface, by which it will proxy the
calls to the muxer.
Only two introspectable property (i.e. usable without extra API):
* A GstEncodingProfile*
* The name of the profile to use
When a profile is selected, encodebin will:
* Add REQUEST sinkpads for all the GstStreamProfile
* Create the muxer and expose the source pad
Whenever a request pad is created, encodebin will:
* Create the chain of elements for that pad
* Ghost the sink pad
* Return that ghost pad
This allows reducing the code to the minimum for applications
wishing to encode a source for a given profile:
...
encbin = gst_element_factory_make("encodebin, NULL);
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
gst_element_link (encbin, filesink);
...
vsrcpad = gst_element_get_src_pad(source, "src1");
vsinkpad = gst_element_get_request_pad (encbin, "video_%d");
gst_pad_link(vsrcpad, vsinkpad);
...
1.2 Explanation of the Various stages in EncodeBin
--------------------------------------------------
This describes the various stages which can happen in order to end
up with a multiplexed stream that can then be stored or streamed.
1.2.1 Incoming streams
The streams fed to EncodeBin can be of various types:
* Video
* Uncompressed (but maybe subsampled)
* Compressed
* Audio
* Uncompressed (audio/x-raw-{int|float})
* Compressed
* Timed text
* Private streams
1.2.2 Steps involved for raw video encoding
(0) Incoming Stream
(1) Transform raw video feed (optional)
Here we modify the various fundamental properties of a raw video
stream to be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modified are:
* width/height
This is done with a video scaler.
The DAR (Display Aspect Ratio) MUST be respected.
If needed, black borders can be added to comply with the target DAR.
* framerate
* format/colorspace/depth
All of this is done with a colorspace converter
(2) Actual encoding (optional for raw streams)
An encoder (with some optional settings) is used.
(3) Muxing
A muxer (with some optional settings) is used.
(4) Outgoing encoded and muxed stream
1.2.3 Steps involved for raw audio encoding
This is roughly the same as for raw video, expect for (1)
(1) Transform raw audo feed (optional)
We modify the various fundamental properties of a raw audio stream to
be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modifier are:
* Number of channels
* Type of raw audio (integer or floating point)
* Depth (number of bits required to encode one sample)
1.2.4 Steps involved for encoded audio/video streams
Steps (1) and (2) are replaced by a parser if a parser is available
for the given format.
1.2.5 Steps involved for other streams
Other streams will just be forwarded as-is to the muxer, provided the
muxer accepts the stream type.
2. Encoding Profile System
--------------------------
This work is based on:
* The existing GstPreset system for elements [0]
* The gnome-media GConf audio profile system [1]
* The investigation done into device profiles by Arista and
Transmageddon [2 and 3]
2.2 Terminology
---------------
* Encoding Target Category
A Target Category is a classification of devices/systems/use-cases
for encoding.
Such a classification is required in order for:
* Applications with a very-specific use-case to limit the number of
profiles they can offer the user. A screencasting application has no
use with the online services targets for example.
* Offering the user some initial classification in the case of a
more generic encoding application (like a video editor or a
transcoder).
Ex:
Consumer devices
Online service
Intermediate editing format
Screencast
Capture
Computer
* Encoding Profile Target
A Profile Target describes a specific entity for which we wish to
encode.
A Profile Target must belong to at least one Target Category.
It will define at least one Encoding Profile.
Ex (with category):
Nokia N900 (Consumer device)
Sony PlayStation 3 (Consumer device)
Youtube (Online service)
DNxHD (Intermediate editing format)
HuffYUV (Screencast)
Theora (Computer)
* Encoding Profile
A specific combination of muxer, encoders, presets and limitations.
Ex:
Nokia N900/H264 HQ
Ipod/High Quality
DVD/Pal
Youtube/High Quality
HTML5/Low Bandwith
DNxHD
2.3 Encoding Profile
--------------------
An encoding profile requires the following information:
* Name
This string is not translatable and must be unique.
A recommendation to guarantee uniqueness of the naming could be:
<target>/<name>
* Description
This is a translatable string describing the profile
* Muxing format
This is a string containing the GStreamer mime-type of the
container format.
* Muxing preset
This is an optional string describing the preset(s) to use on the
muxer.
* Multipass setting
This is a boolean describing whether the profile requires several
passes.
* List of Stream Profile
2.3.1 Stream Profiles
A Stream Profile consists of:
* Type
The type of stream profile (audio, video, text, private-data)
* Encoding Format
This is a string containing the GStreamer mime-type of the encoding
format to be used. If encoding is not to be applied, the raw audio
mime type will be used.
* Encoding preset
This is an optional string describing the preset(s) to use on the
encoder.
* Restriction
This is an optional GstCaps containing the restriction of the
stream that can be fed to the encoder.
This will generally containing restrictions in video
width/heigh/framerate or audio depth.
* presence
This is an integer specifying how many streams can be used in the
containing profile. 0 means that any number of streams can be
used.
* pass
This is an integer which is only meaningful if the multipass flag
has been set in the profile. If it has been set it indicates which
pass this Stream Profile corresponds to.
2.4 Example profile
-------------------
The representation used here is XML only as an example. No decision is
made as to which formatting to use for storing targets and profiles.
<gst-encoding-target>
<name>Nokia N900</name>
<category>Consumer Device</category>
<profiles>
<profile>Nokia N900/H264 HQ</profile>
<profile>Nokia N900/MP3</profile>
<profile>Nokia N900/AAC</profile>
</profiles>
</gst-encoding-target>
<gst-encoding-profile>
<name>Nokia N900/H264 HQ</name>
<description>
High Quality H264/AAC for the Nokia N900
</description>
<format>video/quicktime,variant=iso</format>
<streams>
<stream-profile>
<type>audio</type>
<format>audio/mpeg,mpegversion=4</format>
<preset>Quality High/Main</preset>
<restriction>audio/x-raw-int,channels=[1,2]</restriction>
<presence>1</presence>
</stream-profile>
<stream-profile>
<type>video</type>
<format>video/x-h264</format>
<preset>Profile Baseline/Quality High</preset>
<restriction>
video/x-raw-yuv,width=[16, 800],\
height=[16, 480],framerate=[1/1, 30000/1001]
</restriction>
<presence>1</presence>
</stream-profile>
</streams>
</gst-encoding-profile>
2.5 API
-------
A proposed C API is contained in the gstprofile.h file in this directory.
2.6 Modifications required in the existing GstPreset system
-----------------------------------------------------------
2.6.1. Temporary preset.
Currently a preset needs to be saved on disk in order to be
used.
This makes it impossible to have temporary presets (that exist only
during the lifetime of a process), which might be required in the
new proposed profile system
2.6.2 Categorisation of presets.
Currently presets are just aliases of a group of property/value
without any meanings or explanation as to how they exclude each
other.
Take for example the H264 encoder. It can have presets for:
* passes (1,2 or 3 passes)
* profiles (Baseline, Main, ...)
* quality (Low, medium, High)
In order to programmatically know which presets exclude each other,
we here propose the categorisation of these presets.
This can be done in one of two ways
1. in the name (by making the name be [<category>:]<name>)
This would give for example: "Quality:High", "Profile:Baseline"
2. by adding a new _meta key
This would give for example: _meta/category:quality
2.6.3 Aggregation of presets.
There can be more than one choice of presets to be done for an
element (quality, profile, pass).
This means that one can not currently describe the full
configuration of an element with a single string but with many.
The proposal here is to extend the GstPreset API to be able to set
all presets using one string and a well-known separator ('/').
This change only requires changes in the core preset handling code.
This would allow doing the following:
gst_preset_load_preset (h264enc,
"pass:1/profile:baseline/quality:high");
3. Helper library for profiles
------------------------------
These helper methods could also be added to existing libraries (like
GstPreset, GstPbUtils, ..).
The various API proposed are in the accompanying gstprofile.h file.
3.1 Getting user-readable names for formats
This is already provided by GstPbUtils.
3.2 Hierarchy of profiles
The goal is for applications to be able to present to the user a list
of combo-boxes for choosing their output profile:
[ Category ] # optional, depends on the application
[ Device/Site/.. ] # optional, depends on the application
[ Profile ]
Convenience methods are offered to easily get lists of categories,
devices, and profiles.
3.3 Creating Profiles
The goal is for applications to be able to easily create profiles.
The applications needs to be able to have a fast/efficient way to:
* select a container format and see all compatible streams he can use
with it.
* select a codec format and see which container formats he can use
with it.
The remaining parts concern the restrictions to encoder
input.
[0] http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstPreset.html
[1] http://www.gnome.org/~bmsmith/gconf-docs/C/gnome-media.html
[2] http://gstreamer.freedesktop.org/wiki/DeviceProfile
[3] http://gstreamer.freedesktop.org/wiki/PresetDesign

View file

@ -0,0 +1,46 @@
/* GStreamer encoding bin
* Copyright (C) 2009 Edward Hervey <edward.hervey@collabora.co.uk>
* (C) 2009 Nokia Corporation
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 02111-1307, USA.
*/
#ifndef __GST_ENCODEBIN_H__
#define __GST_ENCODEBIN_H__
#include <gst/gst.h>
#include <gst/gstprofile.h>
#define GST_TYPE_ENCODE_BIN (gst_encode_bin_get_type())
#define GST_ENCODE_BIN(obj) (G_TYPE_CHECK_INSTANCE_CAST((obj),GST_TYPE_ENCODE_BIN,GstPlayBin))
#define GST_ENCODE_BIN_CLASS(klass) (G_TYPE_CHECK_CLASS_CAST((klass),GST_TYPE_ENCODE_BIN,GstPlayBinClass))
#define GST_IS_ENCODE_BIN(obj) (G_TYPE_CHECK_INSTANCE_TYPE((obj),GST_TYPE_ENCODE_BIN))
#define GST_IS_ENCODE_BIN_CLASS(klass) (G_TYPE_CHECK_CLASS_TYPE((klass),GST_TYPE_ENCODE_BIN))
typedef struct _GstEncodebin GstEncodeBin;
struct _GstEncodeBin {
GstBin parent;
GstProfile *profile;
};
GType gst_encode_bin_get_type(void);
GstElement *gst_encode_bin_new (GstProfile *profile, gchar *name);
gboolean gst_encode_bin_set_profile (GstEncodeBin *ebin, GstProfile *profile);
#endif __GST_ENCODEBIN_H__

219
docs/design/gstprofile.h Normal file
View file

@ -0,0 +1,219 @@
/* GStreamer encoding profiles library
* Copyright (C) 2009 Edward Hervey <edward.hervey@collabora.co.uk>
* (C) 2009 Nokia Corporation
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 02111-1307, USA.
*/
#ifndef __GST_PROFILE_H__
#define __GST_PROFILE_H__
#include <gst/gst.h>
typedef enum {
GST_ENCODING_PROFILE_VIDEO,
GST_ENCODING_PROFILE_AUDIO,
GST_ENCODING_PROFILE_TEXT,
GST_ENCODING_PROFILE_UNKNOWN
} GstEncodingProfileType;
typedef struct _GstEncodingTarget GstEncodingTarget;
typedef struct _GstEncodingProfile GstEncodingProfile;
typedef struct _GstStreamEncodingProfile GstStreamEncodingProfile;
typedef struct _GstVideoEncodingProfile GstVideoEncodingProfile;
/* FIXME/UNKNOWNS
*
* Should encoding categories be well-known strings/quarks ?
*
*/
/**
* GstEncodingTarget:
* @name: The name of the target profile.
* @category: The target category (device, service, use-case).
* @profiles: A list of #GstProfile this device supports.
*
*/
struct _GstEncodingTarget {
gchar *name;
gchar *category;
GList *profiles;
}
/**
* GstEncodingProfile:
* @name: The name of the profile
* @format: The GStreamer mime type corresponding to the muxing format.
* @preset: The name of the #GstPreset(s) to be used on the muxer. This is optional.
* @multipass: Whether this profile is a multi-pass profile or not.
* @encodingprofiles: A list of #GstStreamEncodingProfile for the various streams.
*
*/
struct _GstEncodingProfile {
gchar *name;
gchar *format;
gchar *preset;
gboolean multipass;
GList *encodingprofiles;
};
/**
* GstStreamEncodingProfile:
* @type: Type of profile
* @format: The GStreamer mime type corresponding to the encoding format.
* @preset: The name of the #GstPreset to be used on the encoder. This is optional.
* @restriction: The #GstCaps restricting the input. This is optional.
* @presence: The number of streams that can be created. 0 => any.
*/
struct _GstStreamEncodingProfile {
GstEncodingProfileType type;
gchar *format;
gchar *preset;
GstCaps *restriction;
guint presence;
};
/**
* GstVideoEncodingProfile:
* @profile: common #GstEncodingProfile part.
* @pass: The pass number if this is part of a multi-pass profile. Starts at 1
* for multi-pass. Set to 0 if this is not part of a multi-pass profile.
*/
struct _GstVideoEncodingProfile {
GstStreamEncodingProfile profile;
guint pass;
};
/* Generic helper API */
/**
* gst_encoding_category_list_target:
* @category: a profile target category name. Can be NULL.
*
* Returns the list of all available #GstProfileTarget for the given @category.
* If @category is #NULL, then all available #GstProfileTarget are returned.
*/
GList *gst_encoding_category_list_target (gchar *category);
/**
* list available profile target categories
*/
GList *gst_profile_list_target_categories ();
gboolean gst_profile_target_save (GstProfileTarget *target);
/*
* Application convenience methods (possibly to be added in gst-pb-utils)
*/
/**
* gst_pb_utils_create_encoder:
* @caps: The #GstCaps corresponding to a codec format
* @preset: The name of a preset
* @name: The name to give to the returned instance, can be #NULL.
*
* Creates an encoder which can output the given @caps. If several encoders can
* output the given @caps, then the one with the highest rank will be picked.
* If a @preset is specified, it will be applied to the created encoder before
* returning it.
* If a @preset is specified, then the highest-ranked encoder that can accept
* the givein preset will be returned.
*
* Returns: The encoder instance with the preset applied if it is available.
* #NULL if no encoder is available.
*/
GstElement *gst_pb_utils_create_encoder(GstCaps *caps, gchar *preset, gchar *name);
/**
* gst_pb_utils_create_encoder_format:
*
* Convenience version of @gst_pb_utils_create_encoder except one does not need
* to create a #GstCaps.
*/
GstElement *gst_pb_utils_create_encoder_format(gchar *format, gchar *preset,
gchar *name);
/**
* gst_pb_utils_create_muxer:
* @caps: The #GstCaps corresponding to a codec format
* @preset: The name of a preset
*
* Creates an muxer which can output the given @caps. If several muxers can
* output the given @caps, then the one with the highest rank will be picked.
* If a @preset is specified, it will be applied to the created muxer before
* returning it.
* If a @preset is specified, then the highest-ranked muxer that can accept
* the givein preset will be returned.
*
* Returns: The muxer instance with the preset applied if it is available.
* #NULL if no muxer is available.
*/
GstElement *gst_pb_utils_create_muxer(GstCaps *caps, gchar *preset);
/**
* gst_pb_utils_create_muxer_format:
*
* Convenience version of @gst_pb_utils_create_muxer except one does not need
* to create a #GstCaps.
*/
GstElement *gst_pb_utils_create_muxer_format(gchar *format, gchar *preset,
gchar *name);
/**
* gst_pb_utils_encoders_compatible_with_muxer:
* @muxer: a muxer instance
*
* Finds a list of available encoders whose output can be fed to the given
* @muxer.
*
* Returns: A list of compatible encoders, or #NULL if none can be found.
*/
GList *gst_pb_utils_encoders_compatible_with_muxer(GstElement *muxer);
GList *gst_pb_utils_muxers_compatible_with_encoder(GstElement *encoder);
/*
* GstPreset modifications
*/
/**
* gst_preset_create:
* @preset: The #GstPreset on which to create the preset
* @name: A name for the preset
* @properties: The properties
*
* Creates a new preset with the given properties. This preset will only
* exist during the lifetime of the process.
* If you wish to use it after the lifetime of the process, you must call
* @gst_preset_save_preset.
*
* Returns: #TRUE if the preset could be created, else #FALSE.
*/
gboolean gst_preset_create (GstPreset *preset, gchar *name,
GstStructure *properties);
/**
* gst_preset_reset:
* @preset: a #GstPreset
*
* Sets all the properties of the element back to their default values.
*/
/* FIXME : This could actually be put at the GstObject level, or maybe even
* at the GObject leve. */
void gst_preset_reset (GstPreset *preset);
#endif /* __GST_PROFILE_H__ */