gstreamer/subprojects/gst-docs/markdown/additional/design/playback-gapless.md

321 lines
13 KiB
Markdown
Raw Normal View History

Gapless and instant URI switching in playback elements
===
This document explains the various changes and improvements to the playback
elements in order to support gapless playback and instantaneous URI switching.
Last Update: November 23rd 2022
# Background
The new `playbin3` element and its components (`uridecodebin3`, `decodebin3` and
`urisourcebin`) are replacements to the legacy `playbin2` and `decodebin2`
elements.
The goals of these new elements are to both allow new use-cases and improve
performance (lower memory/cpu/io usage, lower latency). One of the key
principles is also to re-use elements as much as possible. For example, when
switching audio tracks the decoder can be re-used (if compatible).
The separation of roles was also more clearly split up into various new elements
(from lowest-level to highest-level):
* `urisourcebin` handles choosing the right source elements for the given URI,
and handles buffering (via `queue2`) if needed (for network sources for example).
* `parsebin` takes an input stream and figures out which demuxer, parsers and/or
payloaders are needed to provide timed elementary streams.
* `decodebin3` internally uses `parsebin` to handle any input stream and will
handle the decoding, inter-stream muxing interleave, stream selection and
switching. It can also handle multiple inputs (such as an audio/video file and
a separate subtitle file).
* `uridecodebin3` wraps `urisourcebin`s and `decodebin3` for any use-cases where
one wishes to have decoded streams from given URIs.
* Finally `playbin3` combines `uridecodebin3` and `playsink` for providing a
high-level convenience pipeline for playing back content.
This design has received many improvements over time:
* `decodebin3` was able to detect input changes (caps changes) and reconfigure
the associated `parsebin` if incompatible. This allows use-cases where
upstream is an HLS/DASH stream where codecs are different across bitrates. The
playback remains seamless if the decoders are compatible.
* `decodebin3` was able to bypass the usage of `parsebin` altogether if the
incoming stream is pull-based, provides a `GstStreamCollection` and is
compatible with the decoders or output caps.
* `urisourcebin` can handle sources that handle buffering internally, avoiding
dual-buffering.
* A new core query `GST_QUERY_SELECTABLE` was added so that (source) elements
could notify `decodebin3` that they can handle stream selection and switching
themselves.
* Several improvements were made to `playbin3` to allow complete stream type
changes (such as going from playing audio+video to just audio or just video,
and back), This allows temporarily disabling whole chains of elements when not
needed.
# Limitation/Issue
Two limitations existed though, which are both related:
* Changing URI required bringing `playbin3` (and all contained elements) down to
`GST_STATE_READY`, setting the uri, and then bringing all elements back to
`GST_STATE_PAUSED`.
* This meant that all elements contained within were either discarded
(decoders, demuxers, parsers, sources, ...) or reset (sinks)... despite
potentially being 100% compatible (ex: going from h264/aac to h264/aac).
* Gapless playback (i.e. automatically switching from one source to another, and
removing any potential gap in the data arriving to the sinks) was implemented by
pre-rolling a full `uridecodebin3` for the next item to play and switching the
inputs to `playsink` when the original `uridecodebin3` was EOS.
* This meant that none of the existing elements (demuxers, parsers, decoders,
..) contained in the original `uridecodebin3` were re-used.
Those two use-cases are the same thing: We want to change the URI
(i.e. `urisourcebin`) but re-use as much as possible of existing elements
(i.e. `decodebin3` and `playsink`). The only difference between the two
use-cases is that changing URI should happen instantaneously in the first case,
whereas in the second case it happens when the initial source is done (EOS).
Fixing this will allow:
* Reducing memory and cpu usage (no duplicate elements)
* Lowering latency (no longer re-instantiate/reconfigure elements and re-use
compatible ones as fast as possible).
Another issue which is related, is figuring out the *optimal* time at which the
next item should be prepared so that it has enough data to playback immediately:
* This shouldn't be too early, some URIs expire after a given time, or the user
might change their mind in between
* This shouldn't be too late, otherwise we risk not having enough data to
playback seamlessly.
# Changes
## parsebin in urisourcebin
In order to figure out the *optimal* time at which a switch should happen
(i.e. a given amount of "time" before the end of the previous play entry), this
can only be done on "timed" data (i.e. parsed elementary streams).
There is therefore a new option on `urisourcebin` : `parse-streams`, which if
set to `TRUE` (non-default) will add a `parsebin` (if and where needed) so that
`urisourcebin` only outputs elementary streams. A `multiqueue` will also be
present to handle any interleave present (i.e. only queue up what is needed to
offer coherent streams downstream).
If buffering is activated on `urisourcebin`, the `multiqueue` present after the
`parsebin` will be configured in order to handle it (and post the appropriate
buffering messages).
This offers the following benefits:
* `about-to-finish` can be emitted by `urisourcebin` as soon as `EOS` enters
those `multiqueue`, which will be more precise than the previous usage (before
`queue2` on non-timed data)
* buffering is much closer to the actual buffering amount (in time) which is
specified on the properties.
* *ALL* scheduling downstream of `urisourcebin` is push-based, removing a lot of
issues when trying to change scheduling modes (push vs pull) dynamically.
The `parse-streams` property is set to `TRUE` when used in `uridecodebin3`
## Only use a single uridecodebin3 in playbin3
Only a single `uridecodebin3` is in use in `playbin3` and the source pads it
provides are directly linked to `playsink`.
There can only be at most one stream of each stream type (audio, video, text) on
the output side of `uridecodebin3`. The exception to this is if the user/application
configured a specific multi-sinkpad combiner element for a given stream type,
in which case all streams of that given stream type are linked to that.
All uri-related properties are forwarded directly to `uridecodebin3`, which will
handle switching the sources to the single `decodebin3` it contains.
## uridecodebin3 URI and source handling
The URI for a given entry are handled in a `GstPlayItem` structure which
controls (via intermediary structures):
* The `urisourcebin` associated with the specified URI (and optional subtitle
URI)
* The pads provided by those sources, and which states they are in (eos,
blocked, ...) and the associated GstStream (if present)
* The buffering messages posted by those sources.
At any given point there is:
* A `input_play_item`, which is the play item currently feeding data into
`decodebin3`
* A `output_play_itm`, which is the play item currently being outputted by
`decodebin3`
Most of the time those two will be the same. But when switching play items
(going from one URI to another, whether gapless or not) this switch will happen
asynchronously.
## Switching inputs to decodebin3
The high-level goal is to add to `uridecodebin3` the capability of being able to
change `GstPlayItem` with the same `decodebin3` either:
* When the previous `GstPlayItem` has finished and there is a pending next
`GstPlayItem`. This is the "gapless" scenario.
* Or immediately switch to the given `GstPlayItem` *without* having to change
state. This is the "instantaneous URI switch" scenario.
For this, the following points need to be solved:
1. both scenarios: Add a way for "next" `GstPlayItem` to be pre-rolled
2. gapless: Determining when the switch can happen
3. instant-uri: pre-roll next `GstPlayItem` and flush downstream (to make the
switch as quick as possile)
4. both scenarios: Do the actual switch
### pre-rolling play items
In order to be able to re-use the same decoders (within `decodebin3`) as much as
possible from the outside, we need to ensure that we feed the ideal
"replacement" stream to the same `decodebin3` sink pad.
For example, if we are switching from an audio+video HLS source to another
audio+video DASH source, we want to make sure we link the new `urisourcebin`
source pad providing video to the `decodebin3` pad that was previously consuming
the old video stream.
In order to do this, the `urisourcebin` we wish to switch to needs to be
pre-rolled (set to PAUSED, new pads are set to be blocked, and we wait for a
buffer/GAP to arrive on at least one of the pads).
At that point we will know the streams which are present in the new and old
`urisourcebin`s and can unlink/relink compatible pads. If new sink pads are
required they will be requested, and if old pads are no longer needed (for
example switching from two streams to a single one) they will be removed.
> Note: Doing this also has the benefit that "replacing" the inputs to
> `decodebin3` are done from a new streaming thread, and not the old
> `urisourcebin` streaming thread which could cause deadlocks.
> Note: This "waiting" is only done when "switching", i.e. on sources which
> aren't in the current input play item. If the pads are from the current play
> entry they are linked/unlinked as soon as they are added/removed.
The moment at which the next play item is pre-rolled is done:
* When the current play item has posted `about-to-finish` and the
user/application has set a new play item.
* When a new play item has been set and the `instant-uri` property has been set
to TRUE.
When a play item is pre-rolled, it is marked as "active". There can only be one
"active" play item in addition to the input play item.
### gapless: determining when the switch can happen
For gapless use-cases, we want to know the earliest time we can switch from one
play item to another.
Since all streams coming from `urisourcebin parse-streams=True` are push-based,
this is when the last EOS has been pushed through all pads of the source.
### Instantaneous URI switching
In order to be able to switch URI as soon as possible while re-using as many
existing elements as possible, there is a new `instant-uri` boolean property on
`uridecodebin3`/`playbin3`. The default value is FALSE.
If it is set to TRUE, the following happens whenever the `uri` property is set:
* On all pads of the current input play item:
* `FLUSH_START` is sent to the downstream peer pads
* The pad is made blocking
* The pad is marked as EOS (i.e. as if EOS had been seen)
* And then again on all pads:
* `FLUSH_STOP` is sent to the downstream peer pads
* Finally the new play item for the new URI is activated (pre-rolled).
* Once it is pre-rolled it will switch over
This ensures all downstream elements are kept and are ready to receive the new
data.
### Switching play items
Switching play items requires special attention since it needs to be done
"atomically". We need to ensure it is done by a single thread. This is done by
having a lock (`play_items_lock`) which is taken whenever we need to modify the
list of play items and which play item is the current input/output.
We need to ensure the streaming thread(s) that were previously used are
stopped. Since we are only dealing with push-based sources this is simple: we
wait for the moment EOS is pushed on the last pad of the play item.
Another important consideration is that we need to ensure the thread that does
the switch is not the previous streaming thread (it needs to be stopped).
In order to solve those issues, the actual replacement of the inputs will always
happen from the streaming thread of the *new* play item, i.e. the one we wish to
make the current input. This is done in a pad block probe on the new item source
pad. Whenever a buffer (or GAP event) is received, we check whether we can
switch:
* If the current input play item is completely EOS, the switch can happen
immediately. This will always be the case in instant-uri scenario and if the
current input play item is pull-based.
* If the current input play item is not completely EOS, the probe waits on the
`GCond input_source_drained`. This is the case that will commonly happen in
gapless push-based scenarios, since we are waiting for the current input play
item to be finished.
Once the switch can happen, we unlink all pads from `decodebin3` and attempt to
match compatible new source pads from `urisourcebin` to `decodebin3`. If new
sink pads are required they are requested, and if some sink pads are no longer
needed or do not match they are released.
Once all pads are linked, the new play item is set as the current play item.
## uridecodebin3 handles `about-to-finish` signalling
In regards to gapless playback, the API does not change. Users are still
expected to listen to `about-to-finish` and set the next URI to play back.
One thing that needs to be taken care of is making sure we don't emit
`about-to-finish` for play items which aren't currently used. This would end up
in a situation where `about-to-finish` would cause a snowball effect of pending
play items emitting it, which would cause a future entry to be created,
prerolled and emitting it again.
For that reason, if a play item emits that signal but isn't the input or output
play item, then it is just stored and not propagated upstream. When that play
entry becomes the new input entry it will be propagated.