Contrary to the existing Task Sink, the Async and Sync Mutex Sinks
handle buffers in the `PadSinkHandler` directly. The Async Mutex
Sink uses an async Mutex for the `PadSinkHandlerInner` while the
Sync Mutex Sink uses... a sync Mutex.
All Sinks share the same settings and stats manager.
Use the `--sink` command line option to select the sink (default is
`sync-mutex` since it allows evaluating the framework with as little
overhead as possible.
Also apply various fixes:
- Only keep the segment start instead of the full `Segment`. This
helps with cache locality (`Segment` is a plain struct with many
fields) and avoids downcasting the generic `Segment` upon each
buffer handling.
- Box the `Stat`s. This should improve cache locality a bit.
- Fix EOS handling which took ages for no benefits in this
particular use case.
- Use a macro to raise log level in the main element.
- Move error handling during item processing in `handle_loop_error`.
This function was precisely designed for this and it should reduce
the `handle_item`'s Future size.
- Reworked buffer push.
- Reworked stats.
- Make first elements logs stand out. This make it possible to
follow what's going on with pipelines containing 1000s of
elements.
- Actually handle EOS.
- Use more significant defaults.
- Allow building without `clap` feature.
Using callgrind with the standalone test showed opportunities for
improvements for sub tasks addition and drain.
All sub task additions were performed after making sure we were
operating on a Context Task. The Context and Task were checked
again when adding the sub task.
Draining sub tasks was perfomed in a loop on every call places,
checking whether there were remaining sub tasks first. This
commit implements the loop and checks directly in
`executor::Task::drain_subtasks`, saving one `Mutex` lock and
one `thread_local` access per iteration when there are sub
tasks to drain.
The `PadSink` functions wrapper were performing redundant checks
on the `Context` presence and were adding the delayed Future only
when there were already sub tasks.
Implement a test that initializes pipelines with minimalistic
theadshare src and sink. This can help with the evaluation of
changes to the threadshare runtime or with element
implementation details. It makes it easy to run flamegraph or
callgrind and to focus on the threadshare runtime overhead.