Subtasks are used when current async processing needs to execute
a `Future` via a sync function (eg. a call to a C function).
In this case `Context::block_on` would block the whole `Context`,
leading to a deadlock.
The main use case for this is the `Pad{Src,Sink}` functions:
when we `PadSrc::push` and the peer pad is a `PadSink`, we want
`PadSrc::push` to complete after the async function on the
`PadSink` completes. In this case the `PadSink` async function
is added as a subtask of current scheduler task and
`PadSrc::push` only returns when the subtask is executed.
In `runtime::Task` (`Task` here is the execution Task with a
state machine, not a scheduler task), we used to spawn state
transition actions and iteration loop (leading to a new
scheduler Task). At the time, it seemed convenient for the user
to automatically drain sub tasks after a state transition action
or an iteration. User wouldn't have to worry about this, similarly
to the `Pad{Src,Sink}` case.
In current implementation, the `Task` state machine now operates
directly on the target `Context`. State transtions actions and
the iteration loop are no longer spawned. It seems now useless to
abstract the subtasks draining from the user. Either they
transitively use a mechanism such as `Pad{Src,Sink}` which already
handles this automatically, or they add substasks on purpose, in
which case they know better when subtasks must be drained.
... so that it can be reused on current thread for subsequent
Scheduler instantiations (e.g. block_on) without the need to
reallocate internal data structures.
This commit improves threadshare timers predictability
by better making use of current time slice.
Added a dedicate timer BTreeMap for after timers (those
that are guaranteed to fire no sooner than the expected
instant) so as to avoid previous workaround which added
half the max throttling duration. These timers can now
be checked against the reactor processing instant.
Oneshot timers only need to be polled as `Future`s when
intervals are `Stream`s. This also reduces the size for
oneshot timers and make user call `next` on intervals.
Intervals can also implement `FusedStream`, which can help
when used in features such as `select!`.
Also drop the `time` module, which was kepts for
compatibility when the `executor` was migrated from tokio
based to smol-like.
Add a `tuning` feature which adds counters that help with performance
evaluation. The only counter added so far accumulates the duration a
Scheduler has been parked, which is pretty accurate an indication of
CPU usage of the Scheduler.
- Reworked buffer push.
- Reworked stats.
- Make first elements logs stand out. This make it possible to
follow what's going on with pipelines containing 1000s of
elements.
- Actually handle EOS.
- Use more significant defaults.
- Allow building without `clap` feature.
jitterbuffer tests crash on Windows CI sometimes. Activating logs
showed time values which are probably not expected in a regular
environment, but which can happen there. Adding extra robustness
to `next_wakeup` computation seems to fix the problem judging by
the few runs I triggered.
Using callgrind with the standalone test showed opportunities for
improvements for sub tasks addition and drain.
All sub task additions were performed after making sure we were
operating on a Context Task. The Context and Task were checked
again when adding the sub task.
Draining sub tasks was perfomed in a loop on every call places,
checking whether there were remaining sub tasks first. This
commit implements the loop and checks directly in
`executor::Task::drain_subtasks`, saving one `Mutex` lock and
one `thread_local` access per iteration when there are sub
tasks to drain.
The `PadSink` functions wrapper were performing redundant checks
on the `Context` presence and were adding the delayed Future only
when there were already sub tasks.
Implement a test that initializes pipelines with minimalistic
theadshare src and sink. This can help with the evaluation of
changes to the threadshare runtime or with element
implementation details. It makes it easy to run flamegraph or
callgrind and to focus on the threadshare runtime overhead.
By moving sync on buffer ts to `try_next`, the resulting delay
can be cancelled when a state transition occurs.
To prevent item loss, this requires first peeking the incoming
item from the channel without popping it. After the delay has
elasped, we can pop the item as the last await point in
`try_next`: either it will be cancelled before popping or the
popped item will be passed on to `handle_item`.
Also add `flush` which was missing from `stop` and `flush_start`
transition actions.
Previous Task iteration model suffered from the following
shortcomings:
- When an iteration was engaged it could be cancelled at
await points by Stop or Flush state transitions,
which could lead to inconsistent states.
- When an iteration was engaged it could not be cancelled
by a Pause state transition so as to prevent data loss.
This meant we couldn't block on the Pause request because
the mechanism couldn't guarantee Paused would be reached
in a timely manner.
This commit split the Task iteration into:
- `try_next`: this function returns a future that awaits
for a new iteration to begin. The regular use case is
to return an item to process. The item can be left to
`()` if `try_next` acts as a tick generator. It can
also return an error. This function can be cancelled at
await points when a state transition request occurs.
- `handle_item`: this function is called with the item
returned by `try_next` and is guaranteed to run to
completion even if a transition request is received.
Note that this model plays well with the common Future
cancellation pitfalls in Rust.
warning: this expression borrows a value the compiler would automatically borrow
--> generic/threadshare/src/runtime/executor/async_wrapper.rs:402:19
|
402 | match (&mut *self).get_mut().read(buf) {
| ^^^^^^^^^^^^ help: change this to: `(*self)`
|
= note: `#[warn(clippy::needless_borrow)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
State transitions request functions hid the synchronization
details to the caller:
- If the transition was requested from a Context, a subtask was
added or the transition ack was not awaited if the new transition
was requested from a running transition or an iteration function.
- If the transition was not requested from a Context, current
thread was blocked until the ack was received.
This strategy facilitated code in elements, but it suffered from
the following shortcomings:
- The `prepare` transition request didn't comply with the above
strategy and would always return an `Async` form. This was
designed to accomodate the `Prepare` function for elements
such as `ts-tcpclientsrc` which takes times due to the
TCP socket connection delays. The idea was that the actual
transition result would be available after calling `start`.
This was a disadvantage for elements which would prefer to
error immediately in the event of a preparation failure.
- Hidding the transition request synchronization to the caller
meant that they had no options but relying on the internal
mechanism. E.g.: it was not possible to `start` from another
async runtime without blocking. Also it was not possible
to request a transition and choose not to await for the
ack.
This commit introduces a more flexible API for state
transitions requests:
- The transition request function now return a `TransitionStatus`,
which is a Future.
- When an error occurs immediately (e.g. the transition
request is not autorized due to current state of the Task),
the `TransitionStatus` is resolved immediately and can be
`check`ed for errors. This is useful for functions such as
`pepare` in the case of `ts-tcpclientsrc` (see above).
This is also useful for `pause`, because in current design,
the transition is always async. Note however, that `pause` is
forseen to adhere to the same behaviour as the other transition
requests in the near future [1].
- If the caller chooses to await for the ack and they don't know
if they are running on a ts Context (e.g. in `Pad{Src,Sink}`
handlers), they can call `await_maybe_on_context`. This is mostly
the same behaviour as the one that used to be performed internaly.
- If the caller knows for sure they can't possibly block an async
executor, they can call `block_on` which is more explicite, but
will nonetheless make sure no ts Context is being blocked. This
last check was introduced as it was considered low overhead
while it should ease preventing missues in cases where the above
functions should be used.
[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/793#note_1464400
Task state machines used to execute in an executor from the Futures
crate. State transitions actions and iteration functions were then
spawned on the target threadshare Context.
This commit directly spawns the task state machine on the threadshare
Context. This simplifies code a bit and paves the way for the changes
described in [1].
Also introduces struct `StateMachineHandle`, which gather together
fields to communicate and synchronize with the StateMachine. Renamed
`StateMachine::run` as `spawn` and return `StateMachineHandle`.
[1]: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/793#note_1464400
The scheduler is awaken when aborting a task loop, but not after
a triggering event is pushed. This can cause throttling to induce
long state transitions for pipelines with many streams.
Observed for Unprepare with:
GST_DEBUG=ts-benchmark:4 ../../target/debug/examples/benchmark 2000 ts-udpsrc 2 20 5000
During MR !793, the socket configuration mechanism was changed to
use commands passed to the Task via a channel. This worked properly
for user changes via settings and signals, however the default
clients setting was not used.
A simple solution could have been to send a command at initialization
to add the default clients, but it was considered a better solution
to just wait for the Task preparation to configure the sockets based
on the value of settings.clients at that time, thus avoiding
unnecessary successive removals and additions of clients which could
have happened before preparation.
Of course, users can still add or remove clients as before, before
and after Task preparation.
See also https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/793
The way the runtime::Task is implemented, UdpSinkTask is available
as a mutable ref in all the TaskImpl functions, which offers the
opportunity to avoid using Mutexes.
Main higlights:
- Removed the back and forth calls between UdpSinkPadHandler
and UdpSinkTask.
- Udp sockets are now part of UdpSinkTask, which is also in
charge of preparing them instead of leaving this to UdpSink.
This removed the need for Context::enter since
TaskImpl::prepare already operates under the target Context.
- In order for the clients list to be visible from the UdpSink,
this list was maintained by UdpSinkPadHandler which was also
in charge of (un)configuring the Udp sockets. The sockets are
now part of UdpSinkTask, which is also in charge of the
(un)configuration. Add/remove/replace requests are passed as
commands to the UdpSinkTask via a channel.
- The clients list visible to the UdpSink is now part of the
Settings (it is also a read/write property). Since the actual
socket (un)configuration is asynchronously handled by the Task,
the clients list is updated by the add/remove/replace signals
and set_property("clients", ..). Should a problem occur during
the async (un)configuration, and only in this case, the
UdpSinkTask would update the clients lists in Settings
accordingly so that it stays consistent with the internal state.
- The function clear_clients was renamed as replace_with_clients.
- clients is now based on a BTreeSet instead of a Vec. All the
managing functions perform some sort of lookup prior to updating
the collection. It also ease implementation.
- Removed the UdpSinkPadHandler RwLock. Using flume channels, we
are able to clone the Receiver so it can be stored in UdpSink
and reused when preparing the UdpSinkTask.
The I/O handle was dropped prior to removing it from the reactor,
which caused `Poller::delete` to fail due to an invalid file
descriptor. This used to happen silently unless the same fd was
added again, e.g. by changing states in the pipeline as follow:
Null -> Playing -> Null -> Playing.
In which case `Poller::add` failed due to an already existing file.
This commit makes sure the fd is removed from the reactor prior to
dropping the handle. In order to achieve this, a new task is spawned
on the `Context` on which the I/O was originally registered, allowing
it to access the proper `Reactor`. The I/O can then safely be dropped.
Because the I/O handle is moved to the spawned future, this solution
requires adding the `Send + 'static` bounds to the I/O handle used
within the `Async` wrapper. This appears not too restrictive for
existing implementations though. Other attempts were considered,
but they would cause deadlocks.
This new approach also solves a potential race condition where a
fd could be re-registered in a `Reactor` before it was removed.
The internal (C) jitterbuffer needs to know about the configured
latency when calculating a PTS, as it otherwise may consider that
the packet is too late, trigger a resync and cause the element to
discard the packet altogether.
I could not identify when this was broken, but the net effect was
that in the current state, ts-jitterbuffer was discarding up to
half of all the incoming packets.
Previous version used the Context::block_on_or_add_sub_task which
spawns a full-fledged executor with timer and io Reactor for no
reason when we just need to wait for a Receiver or JoinHandle.
When the iteration loop is throttling, the call to `abort` on the
`loop_abort_handle` returns immediately, but the actual `Future`
for the iteration loop is aborted only when the scheduler throttling
completes. State transitions which requires the loop to be aborted &
which are serialized at the pipeline level can incur long delays.
This commit makes sure the Task Context's scheduler is awaken as soon
as the task loop is aborted.
Previous version relied on a plain loop / match / break because
I experimented different strategies. The while variant is better
for the final solution.
The function `enter` is executed in a blocking way from the caller's
point of view. This means that we can guaranty that the provided
function and its output will outlive the underlying Scheduler Task
execution. This requires an unsafe call to
`async_task::spawn_unchecked`. See:
https://docs.rs/async-task/latest/async_task/fn.spawn_unchecked.html
The threadshare executor was based on a modified version of tokio
which implemented the throttling strategy in the BasicScheduler.
Upstream tokio codebase has significantly diverged from what it
was when the throttling strategy was implemented making it hard
to follow. This means that we can hardly get updates from the
upstream project and when we cherry pick fixes, we can't reflect
the state of the project on our fork's version. As a consequence,
tools such as cargo-deny can't check for RUSTSEC fixes in our fork.
The smol ecosystem makes it quite easy to implement and maintain
a custom async executor. This MR imports the smol parts that
need modifications to comply with the threadshare model and implements
a throttling executor in place of the tokio fork.
Networking tokio specific types are replaced with Async wrappers
in the spirit of [smol-rs/async-io]. Note however that the Async
wrappers needed modifications in order to use the per thread
Reactor model. This means that higher level upstream networking
crates such as [async-net] can not be used with our Async
implementation.
Based on the example benchmark with ts-udpsrc, performances seem on par
with what we achieved using the tokio fork.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/issues/118
Related to https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/604
The time driver for the threadshare runtime assigns the timer
entries to the nearest throttling time frame so that the timer
fires as close as possible to the expected instant. This means
that the timer might fire before or after the expected instant
(at most `wait / 2` away).
In some cases, we don't want the timer to fire early. The new
function `delay_for_at_least` ensures that the timer is assigned
to the time frame after the expected instant.