# Adaptive Demuxers v2

The existing adaptive demuxer support in `gst-plugins-bad` has several pitfalls
that prevents improving it easily. The existing design uses a model where an
adaptive streaming element (`dashdemux`, `hlsdemux`) downloads multiplexed
fragments of media, but then relies on other components in the pipeline to
provide download buffering, demuxing, elementary stream handling and decoding.


The problems with the old design include:

1. An assumption that fragment streams (to download) are equal to output
   (elementary) streams.

   * This made it hard to expose `GstStream` and `GstStreamCollection`
     describing the available media streams, and by extension it is difficult to
     provide efficient stream selection support

2. By performing download buffering outside the adaptive streaming element,
   the download scheduling has no visibility into the presentation timeline.

   * This made it impossible to handle more efficient variant selection and
     download strategy

3. Several issues with establishing accurate timing/duration of fragments due to
   not dealing with parsed data

   * Especially with HLS, which does not provide detailed timing information
     about the underlying media streams to the same extent that DASH does.

4. Aging design that grew organically since initial adaptive demuxers and miss
   better understanding of how they should work in 2020

   * The code is complicated and interwoven in ways that are hard to follow
     and reason about.

5. Use of GStreamer pipeline sources for downloading.

   * An internal download pipeline that contains a `httpsrc -> queue2 -> src`
     chain makes download management, bandwidth estimation and stream parsing
     more difficult, and uses a new thread for each download.


# New design

## High-level overview of the new AdaptiveDemux base class:

* Buffering is handled inside the adaptive streaming element, based on
  elementary streams (i.e. de-multiplexed from the downloaded fragments) and
  stored inside the `adaptivedemux`-based element.

* The download strategy has full visibility on bitrates, bandwidth, per-stream
  queueing level (in time and bytes), playback position, etc. This opens up the
  possibility of much more intelligent adaptive download strategies.

* Output pads are not handled directly by the subclasses. Instead subclasses
  specify which `tracks` of elementary streams they can provide and what
  "download streams" can provide contents for those tracks. The baseclass
  handles usage and activation of the `tracks` based on application
  `select-streams` requests, and activation of the `stream` needed to feed each
  selected `track`.

* Output is done from a single thread, with the various elementary streams
  packets being output in time order (i.e. behaving like a regular demuxer, with
  interleaving reduced to its minimum). There is minimal buffering downstream
  in the pipeline - only the amount required to perform decode and display.

* The adaptive streaming element only exposes `src` pads for the selected
  `GstStream`s. Typically, there will be one video track, one audio track and
  perhaps one subtitle track exposed at a time, for example.

* Stream selection is handled by the element directly. When switching on a
  new media stream, the output to the relevant source pad is switched once
  there is enough content buffered on the newly requested stream,
  providing rapid and seamless stream switching.

* Only 3 threads are used regardless of the number of streams/tracks. One is
  dedicated to download, one for output, and one for scheduling and feeding
  contents to the tracks.


The main components of the new adaptive demuxers are:

* `GstAdaptiveDemuxTrack` : end-user meaningful elementary streams. Those can be
  selected by the user. They are provided by the subclasses based on the
  manifest.
  
  * They each correspond to a `GstStream` of a `GstStreamCollection`
  * They are unique by `GstStreamType` and any other unique identifier specified
    by the manifest (ex: language)
  * The caps *can* change through time

* `OutputSlot` : A track being exposed via a source pad. This is handled by the
  parent class.

* `GstAdaptiveDemuxStream` : implementation-specific download stream. This is
  linked to one or more `GstAdaptiveDemuxTrack`. The contents of that stream
  will be parsed (via `parsebin`) and fed to the target tracks.
  
  * What tracks are provided by a given `GstAdaptiveDemuxStream` is specified by
    the subclass. But can also be discovered at runtime if the manifest did not
    provide enough information (for example with HLS).

* Download thread : Receives download requests from the scheduling thread that
  can be queried and interrupted. Performs all download servicing in a
  single dedicated thread that can estimate download bandwidth across all
  simultaneous requests.

* Scheduling thread : In charge of deciding what new downloads should be started
  based on overall position, track buffering levels, selected tracks, current
  time ... It is also in charge of handling completed downloads. Fragment
  downloads are sent to dedicated `parsebin` elements that feed the parsed
  elementary data to `GstAdaptiveDemuxTrack`

* Output thread : In charge of deciding which track should be
  outputted/removed/switched (via `OutputSlot`) based on requested selection and
  track levels. 


## Track(s) and Stream(s)

Adaptive Demuxers provide one or more *Track* of elementary streams. They are
each unique in terms of:

* Their type (audio, video, text, ..). Ex : `GST_STREAM_TYPE_AUDIO`
* Optional: Their codec. Ex : `video/x-h264`
* Optional: Their language. ex : `GST_TAG_LANGUAGE : "fr"`
* Optional: Their number of channels (ex: stereo vs 5.1). ex
  `audio/x-vorbis,channels=2`
* Any other feature which would make the stream "unique" either because of their
  nature (ex: video angle) or specified by the manifest as being "unique".

But tracks can vary over time by:

* bitrate
* profile or level
* resolution

They correspond to what an end-user might want to select (i.e. will be exposed
in a `GstStreamCollection`). They are each identified by a `stream_id` provided
by the subclass.

> **Note:** A manifest *can* specify that tracks that would normally be separate
> based on the above rules (for example different codecs or channels) are
> actually the same "end-user selectable stream" (i.e. track). In such case only
> one track is provided and the nature of the elementary stream can change
> through time.

Adaptive Demuxers subclasses also need to provide one or more *Download Stream*
(`GstAdaptiveDemuxStream`) which are the implementation-/manifest-specific
"streams" that each feed one or more *Track*. Those streams can also vary over
time by bitrate/profile/resolution/... but always target the same tracks.

The downloaded data from each of those `GstAdaptiveDemuxStream` is fed to a
`parsebin` element which will put the output in the associated
`GstAdaptiveDemuxTrack`.

The tracks have some buffering capability, handled by the baseclass.


This separation allows the base-class to:

* decide which download stream(s) should be (de)activated based on the current
  track selection
* decide when to (re)start download requests based on buffering levels, positions and
  external actions.
* Handle buffering, output and stream selection.

The subclass is responsible for deciding:

* *Which* next download should be requested for that stream based on current
  playback position, the provided encoded bitrates, estimates of download
  bandwidth, buffering levels, etc..


Subclasses can also decide, before passing the downloaded data over, to:

* pre-parse specific headers (ex: ID3 and webvtt headers in HLS, MP4 fragment
  position, etc..).

* pre-parse actual content if needed because a position estimation is needed
  (ex: HLS missing accurate positioning of fragments in the overall timeline)

* rewrite the content altogether (for example webvtt fragments which require
  timing to be re-computed)


## Timeline, position, playout

Adaptive Demuxers decide what to download based on a *Timeline* made of one or
more *Tracks*.

The output of that *Timeline* is synchronized (each *Track* pushes downstream at
more or less the same position in time). That position is the "Global Output
Position".

The *Timeline* should have sufficient data in each track to allow all tracks to
be decoded and played back downstream without introducing stalls. It is the goal
of the *Scheduling thread* of adaptive demuxers to determine which fragment of
data to download and at which moment, in order for:

* each track to have sufficient data for continuous playback downstream
* the overall buffering to not exceed specified limits (in time and/or bytes)
* the playback position to not stray away in case of live sources and
  low-latency scenarios.

Which *Track* is selected on that *Timeline* is either:

 * decided by the element (default choices)
 * decided by the user (via `GST_EVENT_SELECT_STREAMS`)

The goal of an Adaptive Demuxer is to establish *which* fragment to download and
*when* based on:

* The selected *Tracks*
* The current *Timeline* output position
* The current *Track* download position (i.e. how much is buffered)
* The available bandwidth (calculated based on download speed)
* The bitrate of each fragment stream provided
* The current time (for live sources)

In the future, an Adaptive Demuxer will be able to decide to discard a fragment
if it estimates it can switch to a higher/lower variant in time to still satisfy
the above requirements.


## Download helper and thread

Based on the above, each Adaptive Demuxer implementation specifies to a
*Download Loop* which fragment to download next and when.

Multiple downloads can be requested at the same time on that thread. It is the
responsibility of the *Scheduler thread* to decide what to do when a download is
completed.

Since all downloads are done in a dedicated thread without any blocking, it can
estimate current bandwidth and latency, which the element can use to switch
variants and improve buffering strategy.

> **Note**: Unlike the old design, the `libsoup` library is used directly for
> downloading, and not via external GStreamer elements. In the future, this
> could be made modular so that other HTTP libraries can be used instead.


## Stream Selection

When sending `GST_EVENT_STREAM_COLLECTION` downstream, the adaptive demuxer also
specifies on the event that it can handle stream-selection. Downstream elements
(i.e. `decodebin3`) won't attempt to do any selection but will
handle/decode/expose all the streams provided by the adaptive demuxer (including
streams that get added/removed at runtime).

When handling a `GST_EVENT_SELECT_STREAMS`, the adaptive demuxer will:

* mark the requested tracks as `selected` (and no-longer requested ones as not
selected)
* instruct the streams associated to no-longer selected tracks to stop
* set the current output position on streams associated to newly selected
  tracks and instruct them to be started
* return

The actual changes in output (because of a stream selection change) are done in
the output thread

* If a track is no longer selected and there are no candidate replacement tracks
  of the same type, the associated output/pad is removed and the track is
  drained.

* If a track is selected and doesn't have a candidate replacement slot of the
  same type, a new output/pad is added for that track

* If a track is selected and has a candidate replacement slot, it will only be
  switched if the track it is replacing is empty *OR* when it has enough
  buffering so the switch can happen without re-triggering buffering.

## Periods

The number and type of `GstAdaptiveDemuxTrack` and `GstAdaptiveDemuxStream` can
not change once the initial manifests are parsed.

In order to change that (for example in the case of a new DASH period), a new
`GstAdaptiveDemuxPeriod` must be started.

All the tracks and streams that are created at any given time are associated to
the current `input period`. The streams of the input period are the ones that
are active (i.e. downloading), and by extension the tracks of that input period
are the ones that are being filled (if selected).

That period *could* also be the `output period`. The (selected) tracks of that
period are the ones that are used for output by the output thread.

But due to buffering, the input and output period *could* be different, the
baseclass will automatically handle switch over.

The only requirement for subclasses is to ask the parent class to start a new
period when needed and then create the new tracks and streams.


## Responsibility split

The `GstAdaptiveDemux2` base class is in charge of:

* helper for all downloads.
* helper for parsing (using `parsebin` and custom parsing functions) stream data.
* provides *parsed* elementary content for each fragment (note: could be more
  than one output stream for a given fragment)
* helper for providing `Tracks` that can be filled by subclasses.
* dealing with stream selection and output, including notifying subclasses which
  of those *are* active or not
* handling buffering and deciding when to request new data from associated stream

Subclasses are in charge of:

* specifying which `GstAdaptiveDemuxTrack` and `GstAdaptiveDemuxStream` they
  provide (based on the manifest) and their relationship.
* when requested by the base class, specify which `GstAdaptiveDemuxFragment`
  should be downloaded next for a given (selected) stream.