gstreamer/docs/manual/advanced-schedulers.xml

<chapter id="chapter-scheduler">
  <title>Scheduling</title>
  <para>
    By now, you've seen several example applications. All of them would set
    up a pipeline and call <function>gst_bin_iterate ()</function> to start
    media processing. You might have started wondering what happens during
    pipeline iteration. This whole process of media processing is called
    scheduling. Scheduling is considered one of the most complex parts of
    &GStreamer;. Here, we will do no more than give a global overview of
    scheduling, most of which will be purely informative. It might help in
    understanding the underlying parts of &GStreamer;.
  </para>
  <para>
    The scheduler is responsible for managing the plugins at runtime. Its
    main responsibilities are:
    <itemizedlist>
      <listitem>
        <para>
          Managing data throughput between pads and elements in a pipeline.
          This might sometimes imply temporary data storage between elements.
        </para>
      </listitem>
      <listitem>
        <para>
          Calling functions in elements that do the actual data processing.
        </para>
      </listitem>
      <listitem>
        <para>
	  Monitoring state changes and enabling/disabling elements in the
	  chain.
        </para>
      </listitem>
      <listitem>
        <para>
	  Selecting and distributing the global clock.
          <!-- FIXME: is this still true? -->
        </para>
      </listitem>
    </itemizedlist>
  </para>
  <para>
    The scheduler is a pluggable component; this means that alternative
    schedulers can be written and plugged into GStreamer. There is usually
    no need for interaction in the process of choosing the scheduler, though.
    The default scheduler in &GStreamer; is called <quote>opt</quote>. Some
    of the concepts discussed here are specific to opt.
  </para>

  <sect1 id="section-scheduler-manage">
    <title>Managing elements and data throughput</title>
    <para>
      To understand some specifics of scheduling, it is important to know
      how elements work internally. Largely, there are four types of elements:
      <function>_chain ()</function>-based elements, <function>_loop
      ()</function>-based elements, <function>_get ()</function>-based
      elements and decoupled elements. Each of those have a set of features
      and limitations that are important for how they are scheduled.
    </para>
    <itemizedlist>
      <listitem>
        <para>
          <function>_chain ()</function>-based elements are elements that
          have a <function>_chain ()</function>-function defined for each of
          their sinkpads. Those functions will receive data whenever input
          data is available. In those functions, the element can
          <emphasis>push</emphasis> data over its source pad(s) to peer
          elements. <function>_chain ()</function>-based elements cannot
          <emphasis>pull</emphasis> additional data from their sinkpad(s).
          Most elements in &GStreamer; are <function>_chain
          ()</function>-based.
        </para>
      </listitem>
      <listitem>
        <para>
          <function>_loop ()</function>-based elements are elements that have
          a <function>_loop ()</function>-function defined for the whole
          element. Inside this function, the element can pull buffers from
          its sink pad(s) and push data over its source pad(s) as it sees fit.
          Such elements usually require specific control over their input.
          Muxers and demuxers are usually <function>_loop ()</function>-based.
        </para>
      </listitem>
      <listitem>
        <para>
          <function>_get ()</function>-based elements are elements with only
          source pads. For each source pad, a <function>_get
          ()</function>-function is defined, which is called whenever the peer
          element needs additional input data. Most source elements are, in
          fact, <function>_get ()</function>-based. Such an element cannot
          actively push data.
        </para>
      </listitem>
      <listitem>
        <para>
          Decoupled elements are elements whose source pads are
          <function>_get ()</function>-based and whose sink pads are
          <function>_chain ()</function>-based. The <function>_chain
          ()</function>-function cannot push data over its source pad(s),
          however. One such element is the <quote>queue</quote> element,
          which is a thread boundary element. Since only one side of such
          elements are interesting for one particular scheduler, we can
          safely handle those elements as if they were either
          <function>_get ()</function>- or <function>_chain
          ()</function>-based. Therefore, we will further omit this type
          of elements in the discussion.
        </para>
      </listitem>
    </itemizedlist>
    <para>
      Obviously, the type of elements that are linked together have
      implications for how the elements will be scheduled. If a get-based
      element is linked to a loop-based element and the loop-based element
      requests data from its sinkpad, we can just call the get-function and
      be done with it. However, if two loop-based elements are linked to
      each other, it's a lot more complicated. Similarly, a loop-based
      element linked to a chain-based element is a lot easier than two
      loop-based elements linked to each other.
    </para>
    <para>
      The default &GStreamer; scheduler, <quote>opt</quote>, uses a concept
      of chains and groups. A group is a series of elements that can that
      do not require any context switches or intermediate data stores to
      be executed. In practice, this implies zero or one loop-based elements,
      one get-based element (at the beginning) and an infinite amount of
      chain-based elements. If there is a loop-based element, then the
      scheduler will simply call this elements loop-function to iterate.
      If there is no loop-based element, then data will be pulled from the
      get-based element and will be pushed over the chain-based elements.
    </para>
    <para>
      A chain is a series of groups that depend on each other for data.
      For example, two linked loop-based elements would end up in different
      groups, but in the same chain. Whenever the first loop-based element
      pushes data over its source pad, the data will be temporarily stored
      inside the scheduler until the loop-function returns. When it's done,
      the loop-function of the second element will be called to process this
      data. If it pulls data from its sinkpad while no data is available,
      the scheduler will <quote>emulate</quote> a get-function and, in this
      function, iterate the first group until data is available.
    </para>
    <para>
      The above is roughly how scheduling works in &GStreamer;. This has
      some implications for ideal pipeline design. An pipeline would
      ideally contain at most one loop-based element, so that all data
      processing is immediate and no data is stored inside the scheduler
      during group switches. You would think that this decreases overhead
      significantly. In practice, this is not so bad, however. It's something
      to keep in the back of your mind, nothing more.
    </para>
  </sect1>
</chapter>