Scheduling
By now, you've seen several example applications. All of them would set
up a pipeline and call gst_bin_iterate () to start
media processing. You might have started wondering what happens during
pipeline iteration. This whole process of media processing is called
scheduling. Scheduling is considered one of the most complex parts of
&GStreamer;. Here, we will do no more than give a global overview of
scheduling, most of which will be purely informative. It might help in
understanding the underlying parts of &GStreamer;.
The scheduler is responsible for managing the plugins at runtime. Its
main responsibilities are:
Managing data throughput between pads and elements in a pipeline.
This might sometimes imply temporary data storage between elements.
Calling functions in elements that do the actual data processing.
Monitoring state changes and enabling/disabling elements in the
chain.
Selecting and distributing the global clock.
The scheduler is a pluggable component; this means that alternative
schedulers can be written and plugged into GStreamer. There is usually
no need for interaction in the process of choosing the scheduler, though.
The default scheduler in &GStreamer; is called opt
. Some
of the concepts discussed here are specific to opt.
Managing elements and data throughput
To understand some specifics of scheduling, it is important to know
how elements work internally. Largely, there are four types of elements:
_chain ()-based elements, _loop
()-based elements, _get ()-based
elements and decoupled elements. Each of those have a set of features
and limitations that are important for how they are scheduled.
_chain ()-based elements are elements that
have a _chain ()-function defined for each of
their sinkpads. Those functions will receive data whenever input
data is available. In those functions, the element can
push data over its source pad(s) to peer
elements. _chain ()-based elements cannot
pull additional data from their sinkpad(s).
Most elements in &GStreamer; are _chain
()-based.
_loop ()-based elements are elements that have
a _loop ()-function defined for the whole
element. Inside this function, the element can pull buffers from
its sink pad(s) and push data over its source pad(s) as it sees fit.
Such elements usually require specific control over their input.
Muxers and demuxers are usually _loop ()-based.
_get ()-based elements are elements with only
source pads. For each source pad, a _get
()-function is defined, which is called whenever the peer
element needs additional input data. Most source elements are, in
fact, _get ()-based. Such an element cannot
actively push data.
Decoupled elements are elements whose source pads are
_get ()-based and whose sink pads are
_chain ()-based. The _chain
()-function cannot push data over its source pad(s),
however. One such element is the queue
element,
which is a thread boundary element. Since only one side of such
elements are interesting for one particular scheduler, we can
safely handle those elements as if they were either
_get ()- or _chain
()-based. Therefore, we will further omit this type
of elements in the discussion.
Obviously, the type of elements that are linked together have
implications for how the elements will be scheduled. If a get-based
element is linked to a loop-based element and the loop-based element
requests data from its sinkpad, we can just call the get-function and
be done with it. However, if two loop-based elements are linked to
each other, it's a lot more complicated. Similarly, a loop-based
element linked to a chain-based element is a lot easier than two
loop-based elements linked to each other.
The default &GStreamer; scheduler, opt
, uses a concept
of chains and groups. A group is a series of elements that can that
do not require any context switches or intermediate data stores to
be executed. In practice, this implies zero or one loop-based elements,
one get-based element (at the beginning) and an infinite amount of
chain-based elements. If there is a loop-based element, then the
scheduler will simply call this elements loop-function to iterate.
If there is no loop-based element, then data will be pulled from the
get-based element and will be pushed over the chain-based elements.
A chain is a series of groups that depend on each other for data.
For example, two linked loop-based elements would end up in different
groups, but in the same chain. Whenever the first loop-based element
pushes data over its source pad, the data will be temporarily stored
inside the scheduler until the loop-function returns. When it's done,
the loop-function of the second element will be called to process this
data. If it pulls data from its sinkpad while no data is available,
the scheduler will emulate
a get-function and, in this
function, iterate the first group until data is available.
The above is roughly how scheduling works in &GStreamer;. This has
some implications for ideal pipeline design. An pipeline would
ideally contain at most one loop-based element, so that all data
processing is immediate and no data is stored inside the scheduler
during group switches. You would think that this decreases overhead
significantly. In practice, this is not so bad, however. It's something
to keep in the back of your mind, nothing more.