gstreamer/docs/random/mimetypes

Mimetypes in GStreamer
======================

1) What is a mimetype
---------------------
A mimetype is a combination of two (short) strings (words), the content
type and the content subtype, that make up a pair that describes a file
content type. In multimedia, mime types are used to describe the media
streamtype . In GStreamer, obsiously, we use mimetypes in the same way.
They are part of a GstCaps, that describes a media stream. Besides a
mimetype, a GstCaps also contains stream properties (GstProps), which
are combinations of key/value pairs, and a name.

An example of a mimetype is 'video/mpeg'. A corresponding GstCaps could
be created using:
GstCaps *caps = gst_caps_new("video_mpeg_type",
                             "video/mpeg",
                             gst_props_new("width",  GST_PROPS_INT(384),
                                           "height", GST_PROPS_INT(288),
                                           NULL));
or using a macro:
GstCaps *caps = GST_CAPS_NEW("video_mpeg_type",
                             "video/mpeg",
                               "width",  GST_PROPS_INT(384),
                               "height", GST_PROPS_INT(288)
                            );

Obviously, mimetypes and their corresponding properties are of major
importance in GStreamer for uniquely identifying media streams.

Official MIME media types are assigned by the IANA.  Current
assignments are at http://www.iana.org/assignments/media-types/.

2) The problems
---------------
Some streams may have mimetypes or GstCaps that do not fully describe
the stream. In most cases, this is not a problem, though. For a stream
that contains Ogg/Vorbis data, we don't need to know the samplerate of
the raw audio stream, for example, since we can't play it back anyway.
The samplerate _is_ important for _raw_ audio, so a decoder would need
to retrieve the samplerate from the Ogg/Vorbis stream headers (that are
part of the bytestream) in order to pass it on in the GstCaps that
belongs to the decoded audio ('audio/raw').
However, other plugins *might* want to know such properties, even for
compressed streams. One such example is an AVI muxer, which does want
to know the samplerate of an audio stream, even when it is compressed.

Another problem is that many media types can be defined in multiple ways.
For example, MJPEG video can be defined as video/jpeg, video/mjpeg,
image/jpeg, video/avi with a compression of (fourcc) MJPG, etc. None of
these is really official, since there isn't an official mimetype for
encoded MJPEG video.

The main focus of this document is to propose a standardized set of
mimetypes and properties that will be used by the GStreamer plugins.

3) Different types of streams
-----------------------------
There are several types of media streams. The most important distinction
will be container formats, audio codecs and video codecs. Container
formats are bytestreams that contain one or more substreams inside it,
and don't provide any direct media data itself. Examples are Quicktime,
AVI or MPEG System Stream. They mostly contain of a set of headers that
define the media stream(s) that is packed inside the container and the
media data itself.
Video codecs and audio codecs describe encoded audio or video data.
Examples are MPEG-1 video, DivX video, MPEG-1 layer 3 (MP3) audio or
Ogg/Vorbis audio. Actually, Ogg is a container format too (for Vorbis
audio), but these are usually used in conjunction with each other.

3a) Container formats
---------------------
1 - AVI (Microsoft RIFF/AVI)
  mimetype: video/avi

2 - Quicktime (Apple)
  mimetype: video/quicktime

3 - MPEG (MPEG LA)
  mimetype: video/mpeg
  properties: 'systemstream' = TRUE (BOOLEAN)

4 - ASF (Microsoft)
  mimetype: video/x-asf

5 - WAV (PCM)
  mimetype: audio/x-wav

6 - RealMedia (Real)
  mimetype: video/x-pn-realvideo
  properties: 'systemstream' = TRUE (BOOLEAN)

7 - DV (Digital Video)
  mimetype: video/x-dv
  properties: 'systemstream' = TRUE (BOOLEAN)

8 - Ogg (Xiph)
  mimetype: application/ogg

9 - Matroska
  mimetype: video/x-mkv

10 - Shockwave (Macromedia)
  mimetype: application/x-shockwave-flash

11 - AU audio (Sun)
  mimetype: audio/x-au

12 - Mod audio
  mimetype: audio/x-mod

13 - FLX video (?)
  mimetype: video/x-fli

14 - Monkeyaudio
  mimetype: application/x-ape

15 - AIFF audio
  mimetype: audio/x-aiff

16 - SID audio
  mimetype: audio/x-sid

Please note that we try to keep these mimetypes as similar as possible
to what's used as standard mimetypes in Gnome (Gnome-VFS/Nautilus) and
KDE (Konqueror).

Current problems: there's a very thin line between audio codecs and
audio containers (take mp3 vs. sid, etc.) - this is just a per-case
thing right now and needs to be documented further.

3b) Video codecs
For convenience, the fourcc codes used in the AVI container format will be
listed along with the mimetype and optional properties.

Preface - (optional) properties for all video formats:
  'width' = X (INT)
  'height' = X (INT)
  'pixel_width' and 'pixel_height' = X (2xINT, together aspect ratio)
  'framerate' = X (FLOAT)

1 - Raw Video (YUV/YCbCr)
  mimetype: video/x-raw-yuv
  properties: 'format' = 'XXXX' (fourcc)
  known fourccs: YUY2, I420, Y41P, YVYU, UYVY, etc.
  properties 'width' and 'height' are required

  Note: some raw video formats have implicit alignment rules.  We should
        discuss this more.
  Note: some formats have multiple fourccs (e.g. IYUV/I420 or YUY2/YUYV).
        For each of these, we only use one (e.g. I420 and YUY2).

  Currently recognized formats:
  YUY2: packed, Y-U-Y-V order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
  YVYU: packed, Y-V-Y-U order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
  UYVY: packed, U-Y-V-Y order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
  Y41P: packed, UYVYUYVYYYYY order, U/V hor 4x subsampled (YUV-4:1:1, 12 bpp)

  Y42B: planar, Y-U-V order, U/V hor 2x subsampled (YUV-4:2:2, 16 bpp)
  YV12: planar, Y-V-U order, U/V hor+ver 2x subsampled (YUV-4:2:0, 12 bpp)
  I420: planar, Y-U-V order, U/V hor+ver 2x subsampled (YUV-4:2:0, 12 bpp)
  Y41B: planar, Y-U-V order, U/V hor 4x subsampled (YUV-4:1:1, 12bpp)
  YUV9: planar, Y-U-V order, U/V hor+ver 4x subsampled (YUV-4:1:0, 9bpp)
  YVU9: planar, Y-V-U order, U/V hor+ver 4x subsampled (YUV-4:1:0, 9bpp)

  Y800: one-plane (Y-only, YUV-4:0:0, 8bpp)

  See http://www.fourcc.org/ for more information.

  Note: YUV-4:4:4 (both planar and packed, in multiple orders) are missing.

2) Raw Video (RGB)
-------------------
  mimetype: video/x-raw-rgb
  properties: 'endianness' = 1234/4321 (INT) <- endianness
              'depth' = 15/16/24 (INT) <- bits per pixel (depth)
              'bpp' = 16/24/32 (INT) <- bits per pixel (in memory)
              'red_mask' = bitmask (0x..) (INT) <- red pixel mask
              'green_mask' = bitmask (0x..) (INT) <- green pixel mask
              'blue_mask' = bitmask (0x..) (INT) <- blue pixel mask
  properties 'width' and 'height' are required

  'bpp' is the number of bits of memory used for each pixel.  'depth'
  is the color depth.

  24 and 32 bit RGB should always be specified as big endian, since
  any little endian format can be transformed into big endian by
  rearranging the color masks.  15 and 16 bit formats should generally
  have the same byte order as the cpu.

  Color masks are interpreted by loading 'bpp' number of bits using
  'endianness' rule, and masking and shifting by each color mask.
  Loading a 24-bit value cannot be done directly, but one can perform
  an equivalent operation.

  Examples:
             msb .. lsb
    - memory: RRRRRRRR GGGGGGGG BBBBBBBB RRRRRRRR GGGGGGGG ...
              'bpp'        = 24
              'depth'      = 24
              'endianness' = 4321 (G_BIG_ENDIAN)
              'red_mask'   = 0xff0000
              'green_mask' = 0x00ff00
              'blue_mask'  = 0x0000ff

    - memory: xRRRRRGG GGGBBBBB xRRRRRGG GGGBBBBB xRRRRRGG ...
              'bpp'        = 16
              'depth'      = 15
              'endianness' = 4321 (G_BIG_ENDIAN)
              'red_mask'   = 0x7c00
              'green_mask' = 0x03e0
              'blue_mask'  = 0x003f

    - memory: GGGBBBBB xRRRRRGG GGGBBBBB xRRRRRGG GGGBBBBB ...
              'bpp'        = 16
              'depth'      = 15
              'endianness' = 1234 (G_LITTLE_ENDIAN)
              'red_mask'   = 0x7c00
              'green_mask' = 0x03e0
              'blue_mask'  = 0x003f

3 - MPEG-1, -2 and -4 video (ISO/LA MPEG)
  mimetype: video/mpeg
  properties: 'systemstream' = FALSE (BOOLEAN)
              'mpegversion' = 1/2/4 (INT)
  known fourccs: MPEG, MPGI

4 - DivX 3.x, 4.x and 5.x video (divx.com)
  mimetype: video/x-divx
  optional properties: 'divxversion' = 3/4/5 (INT)
  known fourccs: DIV3, DIV4, DIV5, DIVX, DX50, DIVX, divx

5 - Microsoft MPEG 4.1, 4.2 and 4.3
  mimetype: video/x-msmpeg
  optional properties: 'msmpegversion' = 41/42/43 (INT)
  known fourccs: MPG4, MP42, MP43

6 - Motion-JPEG (official and extended)
  mimetype: video/x-jpeg
  known fourccs: MJPG (YUY2 MJPEG), JPEG (any), PIXL (Pinnacle/Miro), VIXL

7 - Sorensen (Quicktime - SVQ1/SVQ3)
  mimetypes: video/x-svq
  properties: 'svqversion' = 1/3 (INT)

8 - H263 and related codecs
  mimetype: video/x-h263
  known fourccs: H263, i263, M263, x263, VDOW, VIVO

9 - RealVideo (Real)
  mimetype: video/x-pn-realvideo
  properties: 'systemstream' = FALSE (BOOLEAN)
  known fourccs: RV10, RV20, RV30

10 - Digital Video (DV)
  mimetype: video/x-dv
  properties: 'systemstream' = FALSE (BOOLEAN)
  known fourccs: DVSD, dvsd

11 - Windows Media Video 1 and 2 (WMV)
  mimetype: video/x-wmv
  properties: 'wmvversion' = 1/2 (INT)

12 - XviD (xvid.org)
  mimetype: video/x-xvid
  known fourccs: xvid, XVID

13 - 3IVX (3ixv.org)
  mimetype: video/x-3ivx
  known fourccs: 3IV0, 3IV1, 3IV2

14 - Ogg/Tarkin (Xiph)
  mimetype: video/x-tarkin

15 - VP3
  mimetype: video/x-vp3

16 - Ogg/Theora (Xiph, VP3-like)
  mimetype: video/x-theora

17 - Huffyuv
  mimetype: video/x-huffyuv
  known fourccs: HFYU

18 - FF Video 1 (FFMPEG)
  mimetype: video/x-ffv
  properties: 'ffvversion' = 1 (INT)

19 - H264
  mimetype: video/x-h264

20 - Indeo 3 (Intel)
  mimetype: video/x-indeo
  properties: 'indeoversion' = 3 (INT)

21 - Portable Network Graphics (PNG)
  mimetype: video/x-png

TODO: subsampling information for YUV?

TODO: colorspace identifications for MJPEG? How?

TODO: how to distinguish MJPEG-A/B (Quicktime) and lossless JPEG?

TODO: divx4/divx5/xvid/3ivx/mpeg-4 - how to make them overlap? (all
      ISO MPEG-4 compatible)

3c) Audio Codecs
----------------
for convenience, the two-byte hexcodes (as are being used for identification
in AVI files) are also given

Preface - (optional) properties for all audio formats:
  'rate' = X (int) <- sampling rate
  'channels' = X (int) <- number of audio channels

1 - Raw Audio (integer format)
  mimetype: audio/x-raw-int
  properties: 'width' = X (INT) <- memory bits per sample
              'depth' = X (INT) <- used bits per sample
              'signed' = X (BOOLEAN)
              'endianness' = 1234/4321 (INT)

2 - Raw Audio (floating point format)
  mimetype: audio/x-raw-float
  properties: 'depth' = X (INT) <- 32=float, 64=double
              'endianness' = 1234/4321 (INT) <- use G_BIG/LITTLE_ENDIAN!
              'slope' = X (FLOAT, normally 1.0)
              'intercept' = X (FLOAT, normally 0.0)

3 - Alaw Raw Audio
  mimetype: audio/x-alaw

4 - Mulaw Raw Audio
  mimetype: audio/x-mulaw

5 - MPEG-1 layer 1/2/3 audio
  mimetype: audio/mpeg
  properties: 'mpegversion' = 1 (INT)
              'layer' = 1/2/3 (INT)

6 - Ogg/Vorbis
  mimetype: audio/x-vorbis

7 - Windows Media Audio 1 and 2 (WMA)
  mimetype: audio/x-wma
  properties: 'wmaversion' = 1/2 (INT)

8 - AC3
  mimetype: audio/x-ac3

9 - FLAC (Free Lossless Audio Codec)
  mimetype: audio/x-flac

10 - MACE 3/6 (Quicktime audio)
  mimetype: audio/x-mace
  properties: 'maceversion' = 3/6 (INT)

11 - MPEG-4 AAC
  mimetype: audio/mpeg
  properties: 'mpegversion' = 4 (INT)

12 - (IMA) ADPCM (Quicktime/WAV/Microsoft/4XM)
  mimetype: audio/x-adpcm
  properties: 'layout' = "quicktime"/"wav"/"microsoft"/"4xm" (STRING)

  Note: the difference between each of these is the number of
        samples packaed together per channel. For WAV, for
        example, each sample is 4 bit, and 8 samples are packed
        together per channel in the bytestream. For the others,
        refer to technical documentation.
        We probably want to distinguish these differently, but
        I don't know how, yet.

13 - RealAudio (Real)
  mimetype: audio/x-pn-realaudio
  properties: 'bitrate' = 14400/28800 (INT)

14 - DV Audio
  mimetype: audio/x-dv

15 - GSM Audio
  mimetype: audio/x-gsm

16 - Speex audio
  mimetype: audio/x-speex

TODO: adpcm/dv needs confirmation from someone with knowledge...

3d) Plugin Guidelines
---------------------
So, a short bit on what plugins should do. Above, I've stated that
audio properties like "channels" and "rate" or video properties like
"width" and "height" are all optional. This doesn't mean you can
just simply omit them and everything will still work!

An example is the best way to explain all this. AVI needs the width,
height, rate and channels for the AVI header. So if these properties
are missing, avimux cannot work. On the other hand, MPEG doesn't have
such properties in its header and would thus need to parse the stream
in order to find them out; we don't want that either (a plugin does
one job). So normally, mpegdemux and avimux wouldn't allow transcoding.
To solve this problem, there are stream parser elements (such as
mpegaudioparse, ac3parse and mpeg1videoparse).

Conclusions to draw from here: a plugin gives info it can provide as
seen from its own task/job. If it can't, other elements might still
need it and a stream parser needs to be written if it doesn't already
exist.

On properties that can be described by one of these (properties such
as 'width', 'height', 'fps', etc.): they're forbidden and should be
handled using filtered caps.

4) Status of this document
---------------------------
Not all plugins strictly follow these guidelines yet, but these are the
official types. Plugins not following these specs either use extensions
that should be documented, or are buggy (and should be fixed).

Blame Ronald Bultje <rbultje@ronald.bitfreak.net> aka BBB for any mistakes
in this document.