Add a `tuning` feature which adds counters that help with performance
evaluation. The only counter added so far accumulates the duration a
Scheduler has been parked, which is pretty accurate an indication of
CPU usage of the Scheduler.
- Reworked buffer push.
- Reworked stats.
- Make first elements logs stand out. This make it possible to
follow what's going on with pipelines containing 1000s of
elements.
- Actually handle EOS.
- Use more significant defaults.
- Allow building without `clap` feature.
Using callgrind with the standalone test showed opportunities for
improvements for sub tasks addition and drain.
All sub task additions were performed after making sure we were
operating on a Context Task. The Context and Task were checked
again when adding the sub task.
Draining sub tasks was perfomed in a loop on every call places,
checking whether there were remaining sub tasks first. This
commit implements the loop and checks directly in
`executor::Task::drain_subtasks`, saving one `Mutex` lock and
one `thread_local` access per iteration when there are sub
tasks to drain.
The `PadSink` functions wrapper were performing redundant checks
on the `Context` presence and were adding the delayed Future only
when there were already sub tasks.
Implement a test that initializes pipelines with minimalistic
theadshare src and sink. This can help with the evaluation of
changes to the threadshare runtime or with element
implementation details. It makes it easy to run flamegraph or
callgrind and to focus on the threadshare runtime overhead.
The threadshare executor was based on a modified version of tokio
which implemented the throttling strategy in the BasicScheduler.
Upstream tokio codebase has significantly diverged from what it
was when the throttling strategy was implemented making it hard
to follow. This means that we can hardly get updates from the
upstream project and when we cherry pick fixes, we can't reflect
the state of the project on our fork's version. As a consequence,
tools such as cargo-deny can't check for RUSTSEC fixes in our fork.
The smol ecosystem makes it quite easy to implement and maintain
a custom async executor. This MR imports the smol parts that
need modifications to comply with the threadshare model and implements
a throttling executor in place of the tokio fork.
Networking tokio specific types are replaced with Async wrappers
in the spirit of [smol-rs/async-io]. Note however that the Async
wrappers needed modifications in order to use the per thread
Reactor model. This means that higher level upstream networking
crates such as [async-net] can not be used with our Async
implementation.
Based on the example benchmark with ts-udpsrc, performances seem on par
with what we achieved using the tokio fork.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/issues/118
Related to https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/604