diff --git a/docs/design/design-decodebin.txt b/docs/design/design-decodebin.txt index 34919455df..52b343a0f5 100644 --- a/docs/design/design-decodebin.txt +++ b/docs/design/design-decodebin.txt @@ -13,9 +13,9 @@ Description: _ a GstTypeFindElement connected to the single sink pad - _ optionnaly a demuxer/parser + _ optionally a demuxer/parser - _ optionnaly one or more DecodeGroup + _ optionally one or more DecodeGroup * Autoplugging @@ -203,3 +203,87 @@ differences: controlled by the element. This means that a buffer cannot be pushed to a non-linked pad any sooner than buffers in any other stream which were received before it. + + +===================================== + Parsers, decoders and auto-plugging +===================================== + +This section has DRAFT status. + +Some media formats come in different "flavours" or "stream formats". These +formats differ in the way the setup data and media data is signalled and/or +packaged. An example for this is H.264 video, where there is a bytestream +format (with codec setup data signalled inline and units prefixed by a sync +code and packet length information) and a "raw" format where codec setup +data is signalled out of band (via the caps) and the chunking is implicit +in the way the buffers were muxed into a container, to mention just two of +the possible variants. + +Especially on embedded platforms it is common that decoders can only +handle one particular stream format, and not all of them. + +Where there are multiple stream formats, parsers are usually expected +to be able to convert between the different formats. This will, if +implemented correctly, work as expected in a static pipeline such as + + ... ! parser ! decoder ! sink + +where the parser can query the decoder's capabilities even before +processing the first piece of data, and configure itself to convert +accordingly, if conversion is needed at all. + +In an auto-plugging context this is not so straight-forward though, +because elements are plugged incrementally and not before the previous +element has processes some data and decided what it will output exactly +(unless the template caps are completely fixed, then it can continue +right away, this is not always the case here though, see below). A +parser will thus have to decide on *some* output format so auto-plugging +can continue. It doesn't know anything about the available decoders and +their capabilities though, so it's possible that it will choose a format +that is not supported by any of the available decoders, or by the preferred +decoder. + +If the parser had sufficiently concise but fixed source pad template caps, +decodebin could continue to plug a decoder right away, allowing the +parser to configure itself in the same way as it would with a static +pipeline. This is not an option, unfortunately, because often the +parser needs to process some data to determine e.g. the format's profile or +other stream properties (resolution, sample rate, channel configuration, etc.), +and there may be different decoders for different profiles (e.g. DSP codec +for baseline profile, and software fallback for main/high profile; or a DSP +codec only supporting certain resolutions, with a software fallback for +unusual resolutions). So if decodebin just plugged the most highest-ranking +decoder, that decoder might not be be able to handle the actual stream later +on, which would yield in an error (this is a data flow error then which would +be hard to intercept and avoid in decodebin). In other words, we can't solve +this issue by plugging a decoder right away with the parser. + +So decodebin need to communicate to the parser the set of available decoder +caps (which would contain the relevant capabilities/restrictions such as +supported profiles, resolutions, etc.), after the usual "autoplug-*" signal +filtering/sorting of course. + +This could be done in multiple ways, e.g. + + - plug a capsfilter element right after the parser, and construct + a set of filter caps from the list of available decoders (one + could append at the end just the name(s) of the caps structures + from the parser pad template caps to function as an 'ANY other' + caps equivalent). This would let the parser negotiate to a + supported stream format in the same way as with the static + pipeline mentioned above, but of course incur some overhead + through the additional capsfilter element. + + - one could add a filter-caps equivalent property to the parsers + (and/or GstBaseParse class) (e.g. "prefered-caps" or so). + + - one could add some kind of "fixate-caps" or "fixate-format" + signal to such parsers + +Alternatively, one could simply make all decoders incorporate parsers, so +that always all formats are supported. This is problematic for other reasons +though (e.g. we would not be able to detect the profile in all cases then +before plugging a decoder, which would make it hard to just play the audio +part of a stream and not the video if a suitable decoder was missing, for +example).