Until now, do_expected_timeout() was shortly dropping the JBUF_LOCK in order
to push RTX event event without causing deadlock. As a side effect, some
CPU hung would happen as the timerqueue would get filled while looping over
the due timers. To mitigate this, we were processing the lost timer first and
placing into a queue the remainign to be processed later.
In the gap caused by an unlock, we could endup receiving one of the seqnum
present in the pending timers. In that case, the timer would not be found and
a new one was created. When we then update the expected timer, the seqnum
would already exist and the updated timer would be lost.
In this patch we remove the unlock from do_expected_timeout() and place all
pending RTX event into a queue (instead of pending timer). Then, as soon as
we have selected a timer to wait (or if there is no timer to wait for) we send
all the upstream RTX events. As we no longer unlock, we no longer need to pop
more then one timer from the queue, and we do so with the lock held, which
blocks any new colliding timers from being created.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/merge_requests/616>
When a GST_EVENT_FLUSH_START reaches the jitterbuffer, there is a chance that
our task is currently blocking waiting for a timer.
There was two problems:
* That wait wasn't checking for flushing situations
* The flushing handling wasn't waking up that conditional (to check whether it
should abort)
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/merge_requests/608>
As we override the GLib item with our own structure, we cannot use any
function from GList or GQueue that would try to free the RTPJitterBufferItem.
In this patch, we move away from g_queue_new() which forces using
g_queue_free(). This this function could use g_slice_free() if there is any items
left in the queue. Passing the wrong size to GSLice may cause data corruption
and crash.
A better approach would be to use a proper intrusive linked list
implementation but that's left as an exercise for the next person
running into crashes caused by this.
Be ware that this regression was introduced 6 years ago in the following
commit [0], the call to flush() looked useless, as there was a g_queue_free()
afterward.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
[0] 479c7642fd
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/merge_requests/573>
The problem was this:
Due to the highly irregular arrival of RTX-packet the max-misorder variable
could be pushed very low. (-10).
If you then at some point get a big in the sequence-numbers (62 in the
test) you end up sending RTX-requests for some of those packets, and then
if the sender answers those requests, you are going to get a bunch of
RTX-packets arriving. (-13 and then 5 more packets in the test)
Now, if max-misorder is pushed very low at this point, these RTX-packets
will trigger the handle_big_gap_buffer() logic, and because they arriving
so neatly in order, (as they would, since they have been requested like
that), the gst_rtp_jitter_buffer_reset() will be called, and two things
will happen:
1. priv->next_seqnum will be set to the first RTX packet
2. the 5 RTX-packet will be pushed into the chain() function
However, at this point, these RTX-packets are no longer valid, the
jitterbuffer has already pushed lost-events for these, so they will now
be dropped on the floor, and never make it to the waiting loop-function.
And, since we now have a priv->next_seqnum that will never arrive
in the loop-function, the jitterbuffer is now stalled forever, and will
not push out another buffer.
The proposed fixes:
1. Don't use RTX in calculation of the packet-rate.
2. Don't use RTX in large-gap logic, as they are likely to be dropped.
RTP session starts a new thread for RTCP and names it
"rtpsession-rtcp-thread" which happens to be longer than the maximum 16B
allowed by pthread_setname_np and causes the naming to fail.
See docs for more details.
This commit simply shortens the thread's name so it can actually be set.
Changing the types from boolean to guint due to the ++ operand used on
them, and only call JBUF_SIGNAL_QUEUE after settling down,
or else you end up signaling the waiting code in chain() for every buffer
pushed out.
This is a concept that only applies when a buffer arrives in the chain
function, and it has already been scheduled as part of a "multi"-lost
timer.
However, "multi"-lost timers are now a thing of the past, making this
whole concept superflous, and this buffer is now simply counted as "late",
having already been pushed out (albeit as a lost-event).
There is a problem with the code today, where a single timer will
be scheduled for a series of lost packets, and then if the first packet
in that series arrives, it will cause a rescheduling of that timer, going
from a "multi"-timer to a single-timer, causing a lot of the packets
in that timer to be unaccounted for, and creating a situation in where
the jitterbuffer will never again push out another packet.
This patch solves the problem by instead of scheduling those lost packets
as another timer, it instead asks to have that lost-event pushed straight
out.
This very much goes with the intent of the code here: These packets are
so desperately late that no cure exists, and we might as well get the
lost-event out of the way and get on with it.
This change has some interesting knock-on effect being presented in
later commits. It completely removes the concept of "already-lost", so
that is why that test has been disabled in this commit, to be
removed later.
When calling gst_rtp_jitter_buffer_reset you pass in a seqnum.
This is considered the starting-point for a new stream.
However, the old behavior would unref this buffer, basically lying to
the thread that is pushing out buffers saying that it can expect
this buffer, when it would never arrive. The resulting effect being no
more buffer pushed out of the jitterbuffer, and it would buffer
incoming data indefinitely.
By instead inserting the buffer in the gap_packets queue, the _reset()
function will take responsibility for using that as the first buffer
of the new stream.
Fixes#703
We do not have a way to know the format modifiers to use with string
functions provided by the system. G_GUINT64_FORMAT and other string
modifiers only work for glib string formatting functions. We cannot
use them for string functions provided by the stdlib. See:
https://developer.gnome.org/glib/stable/glib-Basic-Types.html#glib-Basic-Types.description
```
../gst/rtpmanager/gstrtpjitterbuffer.c: In function 'gst_jitter_buffer_sink_parse_caps':
../gst/rtpmanager/gstrtpjitterbuffer.c:1523:32: error: unknown conversion type character 'l' in format [-Werror=format=]
|| sscanf (mediaclk, "direct=%" G_GUINT64_FORMAT, &clock_offset) != 1)
^~~~~~~~~~
In file included from /home/nirbheek/cerbero/build/dist/windows_x86/include/glib-2.0/glib/gtypes.h:32,
from /home/nirbheek/cerbero/build/dist/windows_x86/include/glib-2.0/glib/galloca.h:32,
from /home/nirbheek/cerbero/build/dist/windows_x86/include/glib-2.0/glib.h:30,
from /home/nirbheek/cerbero/build/dist/windows_x86/include/gstreamer-1.0/gst/gst.h:27,
from /home/nirbheek/cerbero/build/dist/windows_x86/include/gstreamer-1.0/gst/rtp/gstrtpbuffer.h:27,
from ../gst/rtpmanager/gstrtpjitterbuffer.c:108:
/home/nirbheek/cerbero/build/dist/windows_x86/lib/glib-2.0/include/glibconfig.h:69:28: note: format string is defined here
#define G_GUINT64_FORMAT "llu"
^
../gst/rtpmanager/gstrtpjitterbuffer.c:1523:32: error: too many arguments for format [-Werror=format-extra-args]
|| sscanf (mediaclk, "direct=%" G_GUINT64_FORMAT, &clock_offset) != 1)
^~~~~~~~~~
```
See also: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/379
* Organize GstRtpFunnelPad and GstRtpFunnel separately
* Use G_GNUC_UNUSED instead of (void) casts
* Don't call an event "caps"
* Use semicolons after GST_END_TEST (helps gst-indent)
The proper way of capping on max-streams is to do it in rtpssrcdemux.
This patch uses the newly introduced property on rtpssrcdemux. Previous
behavior would not prevent rtpssrcdemux spawning new pads for every new
ssrc and potentialy causing performance trouble during teardown.
When used for processing bundled media streams within rtpbin the rtpssrcdemux element may
receive bad RTP and RTCP packets, these should not be treated as a fatal error.
The property is useful against atacks when the sender changes SSRC for
every RTP packet. The property with the same name introduced in rtpbin
was not enough, because we still can end up with thousands of pads
allocated in rtpssrcdemux.
When connected to an upstream rtpfunnel element, payload-type,
ssrc and clock-rate will not be present in the received caps.
rtprtxsend can already deal with only the clock rate being
present there, a new property is exposed to allow users to
provide a payload-type -> clock-rate map, this enables the
use of the max-size-time property for bundled streams.
The key is to make sure the jitterbuffer is set to NULL *before* the
ptdemux.
The race that existed would basically happen when ptdemux had reached
READY, and the jitterbuffer would then push a buffer, triggering a new
pad with a new payloadtype being added and ghosted to the rtpbin itself.
However, the srcpad of the ptdemux would now be inactive, and all the
sticky-event pushed on it would be swallowed, not allowing any to reach
the ghost-pad. Then the buffer in-flight would come to the ghostpad,
and we would assert that a buffer arrived before the necessary
events.
By simply re-ordering the state-changes, we ensure that there will be
no buffer racing into the ptdemux while its state is being changed,
and the problem disappears completely.
Notice also that there is not point in disconnecting the signals on the
ptdemux before this point, since we need the push-thread to settle
down before we can do this in a non-racy way.
By passing `NULL` to `g_signal_new` instead of a marshaller, GLib will
actually internally optimize the signal (if the marshaller is available
in GLib itself) by also setting the valist marshaller. This makes the
signal emission a bit more performant than the regular marshalling,
which still needs to box into `GValue` and call libffi in case of a
generic marshaller.
Note that for custom marshallers, one would use
`g_signal_set_va_marshaller()` with the valist marshaller instead.
The do_expected_timeout() function may release the JBUF_LOCK, so we need
to check if nothing wanted the timer thread to exit after this call.
The side effect was that we may endup going back into waiting for a timer
which will cause arbitrary delay on tear down (or deadlock when test
clock is used).
Fixes#653
JBUF_WAIT_QUEUE drops the JBUF_LOCK, which means the stop condition
for the chain function may have changed (change_state to NULL). Check
this immediately after the wait so that we don't delay shutting down.
One of the jitterbuffers functions is to try and make sense of weird
network behavior.
It is quite unhelpful for the jitterbuffer to start dropping packets
itself when what you are trying to achieve is better network resilience.
In the case of a skew, this could often mean the sender has restarted
in some fashion, and then dropping the very first buffer of this "new"
stream could often mean missing valuable information, like in the case
of video and I-frames.
This patch simply reverts back to the old behavior, prior to 8d955fc32b
and includes the simplest test I could write to demonstrate the behavior,
where a single packet arrives "perfectly", then a 50ms gap happens,
and then two more packets arrive in perfect order after that.
# Conflicts:
# tests/check/elements/rtpjitterbuffer.c
This commit adds custom element messages for when gstrtpjitterbuffer
drops an incoming rtp packets due to for example arriving too late.
Applications can listen to these messages on the bus which enables
actions to be taken when packets are dropped due to for example high
network jitter.
Two properties has been added, one to enable posting drop messages and
one to set a minimum time between each message to enable throttling the
posting of messages as high drop rates.
When the queue is full (and adding more packets would risk a seqnum
roll-over), the best approach is to just start pushing out packets
from the other side. Just pushing out the packets results in the
timers being left hanging with old seqnums, so it's safer to just
execute them immediately in this case. It does limit the timer space
to the time it takes to receiver about 32k packets, but without
extended sequence number, this is the best RTP can do.
This also results in the test no longer needed to have timeouts or
timers as pushing packets in drives everything.
Fixes#619