Scheduling

Scheduling By now, you've seen several example applications. All of them would set up a pipeline and call gst_bin_iterate () to start media processing. You might have started wondering what happens during pipeline iteration. This whole process of media processing is called scheduling. Scheduling is considered one of the most complex parts of &GStreamer;. Here, we will do no more than give a global overview of scheduling, most of which will be purely informative. It might help in understanding the underlying parts of &GStreamer;. The scheduler is responsible for managing the plugins at runtime. Its main responsibilities are: Managing data throughput between pads and elements in a pipeline. This might sometimes imply temporary data storage between elements. Calling functions in elements that do the actual data processing. Monitoring state changes and enabling/disabling elements in the chain. Selecting and distributing the global clock. The scheduler is a pluggable component; this means that alternative schedulers can be written and plugged into GStreamer. There is usually no need for interaction in the process of choosing the scheduler, though. The default scheduler in &GStreamer; is called opt. Some of the concepts discussed here are specific to opt. Managing elements and data throughput To understand some specifics of scheduling, it is important to know how elements work internally. Largely, there are four types of elements: _chain ()-based elements, _loop ()-based elements, _get ()-based elements and decoupled elements. Each of those have a set of features and limitations that are important for how they are scheduled. _chain ()-based elements are elements that have a _chain ()-function defined for each of their sinkpads. Those functions will receive data whenever input data is available. In those functions, the element can push data over its source pad(s) to peer elements. _chain ()-based elements cannot pull additional data from their sinkpad(s). Most elements in &GStreamer; are _chain ()-based. _loop ()-based elements are elements that have a _loop ()-function defined for the whole element. Inside this function, the element can pull buffers from its sink pad(s) and push data over its source pad(s) as it sees fit. Such elements usually require specific control over their input. Muxers and demuxers are usually _loop ()-based. _get ()-based elements are elements with only source pads. For each source pad, a _get ()-function is defined, which is called whenever the peer element needs additional input data. Most source elements are, in fact, _get ()-based. Such an element cannot actively push data. Decoupled elements are elements whose source pads are _get ()-based and whose sink pads are _chain ()-based. The _chain ()-function cannot push data over its source pad(s), however. One such element is the queue element, which is a thread boundary element. Since only one side of such elements are interesting for one particular scheduler, we can safely handle those elements as if they were either _get ()- or _chain ()-based. Therefore, we will further omit this type of elements in the discussion. Obviously, the type of elements that are linked together have implications for how the elements will be scheduled. If a get-based element is linked to a loop-based element and the loop-based element requests data from its sinkpad, we can just call the get-function and be done with it. However, if two loop-based elements are linked to each other, it's a lot more complicated. Similarly, a loop-based element linked to a chain-based element is a lot easier than two loop-based elements linked to each other. The default &GStreamer; scheduler, opt, uses a concept of chains and groups. A group is a series of elements that can that do not require any context switches or intermediate data stores to be executed. In practice, this implies zero or one loop-based elements, one get-based element (at the beginning) and an infinite amount of chain-based elements. If there is a loop-based element, then the scheduler will simply call this elements loop-function to iterate. If there is no loop-based element, then data will be pulled from the get-based element and will be pushed over the chain-based elements. A chain is a series of groups that depend on each other for data. For example, two linked loop-based elements would end up in different groups, but in the same chain. Whenever the first loop-based element pushes data over its source pad, the data will be temporarily stored inside the scheduler until the loop-function returns. When it's done, the loop-function of the second element will be called to process this data. If it pulls data from its sinkpad while no data is available, the scheduler will emulate a get-function and, in this function, iterate the first group until data is available. The above is roughly how scheduling works in &GStreamer;. This has some implications for ideal pipeline design. An pipeline would ideally contain at most one loop-based element, so that all data processing is immediate and no data is stored inside the scheduler during group switches. You would think that this decreases overhead significantly. In practice, this is not so bad, however. It's something to keep in the back of your mind, nothing more.