Mainly generalize all the latest tests that have found various stalls
in the jitterbuffer, so that they only consist of a series of packets
with various seqnum/rtptime/rtx combinations, arriving at a specific time.
This means future tests can be more easily written to prove certain
behavior does not cause stalls.
Also fix the warning on windows:
warning C4244: 'initializing': conversion from 'double' to 'gint', possible loss of data
This is a concept that only applies when a buffer arrives in the chain
function, and it has already been scheduled as part of a "multi"-lost
timer.
However, "multi"-lost timers are now a thing of the past, making this
whole concept superflous, and this buffer is now simply counted as "late",
having already been pushed out (albeit as a lost-event).
There is a problem with the code today, where a single timer will
be scheduled for a series of lost packets, and then if the first packet
in that series arrives, it will cause a rescheduling of that timer, going
from a "multi"-timer to a single-timer, causing a lot of the packets
in that timer to be unaccounted for, and creating a situation in where
the jitterbuffer will never again push out another packet.
This patch solves the problem by instead of scheduling those lost packets
as another timer, it instead asks to have that lost-event pushed straight
out.
This very much goes with the intent of the code here: These packets are
so desperately late that no cure exists, and we might as well get the
lost-event out of the way and get on with it.
This change has some interesting knock-on effect being presented in
later commits. It completely removes the concept of "already-lost", so
that is why that test has been disabled in this commit, to be
removed later.
When calling gst_rtp_jitter_buffer_reset you pass in a seqnum.
This is considered the starting-point for a new stream.
However, the old behavior would unref this buffer, basically lying to
the thread that is pushing out buffers saying that it can expect
this buffer, when it would never arrive. The resulting effect being no
more buffer pushed out of the jitterbuffer, and it would buffer
incoming data indefinitely.
By instead inserting the buffer in the gap_packets queue, the _reset()
function will take responsibility for using that as the first buffer
of the new stream.
Fixes#703
One of the jitterbuffers functions is to try and make sense of weird
network behavior.
It is quite unhelpful for the jitterbuffer to start dropping packets
itself when what you are trying to achieve is better network resilience.
In the case of a skew, this could often mean the sender has restarted
in some fashion, and then dropping the very first buffer of this "new"
stream could often mean missing valuable information, like in the case
of video and I-frames.
This patch simply reverts back to the old behavior, prior to 8d955fc32b
and includes the simplest test I could write to demonstrate the behavior,
where a single packet arrives "perfectly", then a 50ms gap happens,
and then two more packets arrive in perfect order after that.
# Conflicts:
# tests/check/elements/rtpjitterbuffer.c
This commit adds custom element messages for when gstrtpjitterbuffer
drops an incoming rtp packets due to for example arriving too late.
Applications can listen to these messages on the bus which enables
actions to be taken when packets are dropped due to for example high
network jitter.
Two properties has been added, one to enable posting drop messages and
one to set a minimum time between each message to enable throttling the
posting of messages as high drop rates.
When the queue is full (and adding more packets would risk a seqnum
roll-over), the best approach is to just start pushing out packets
from the other side. Just pushing out the packets results in the
timers being left hanging with old seqnums, so it's safer to just
execute them immediately in this case. It does limit the timer space
to the time it takes to receiver about 32k packets, but without
extended sequence number, this is the best RTP can do.
This also results in the test no longer needed to have timeouts or
timers as pushing packets in drives everything.
Fixes#619
If the jitterbuffer head change, there is no need to systematically
wakeup the timer thread. The timer thread will be waken up on if
an earlier timeout has been pushed. This prevent some more spurious
wakeup when the system is loaded. As a side effect, cranking the clock
may set the clock at an earlier position.
Implement a single timer queue for all timers. The goal is to always use
ordered queues for storing timers. This way, extracting timers for
execution becomes O(1). This also allow separating the clock wait
scheduling from the timer itself and ensure that we only wake up the
timer thread when strictly needed.
The knew data structure is still O(n) on insertions and reschedule,
but we now use proximity optimization so that normal cases should be
really fast. The GList structure is also embeded intot he RtpTimer
structure to reduce the number of allocations.
In the case of reordered packets, calculating skew would cause
pts values to be off. Only calculate skew when packets come
in as expected. Also, late RTX packets should not trigger
clock skew adjustments.
Fixes#612
If, say, a rtx-packet arrives really late, this can have a dramatic
effect on the jitterbuffer clock-skew logic, having it being reset
and losing track of the current dts-to-pts calculations, directly affecting
the packets that arrive later.
This is demonstrated in the test, where a RTX packet is pushed in really
late, and without this patch the last packet will have its PTS affected
by this, where as a late RTX packet should be redundant information, and
not affect anything.
This patch corrects the delay set on EXPECTED timers that are added when
processing gaps. Previously the delay could be too small so that
'timout + delay' was much less than 'now', causing the following retries
to be scheduled too early. (They were sent earlier than
rtx-retry-timeout after the previous timeout.)
Turns out that the "big-gap"-logic of the jitterbuffer has been horribly
broken.
For people using lost-events, an RTP-stream with a gap in sequencenumbers,
would produce exactly that many lost-events immediately.
So if your sequence-numbers jumped 20000, you would get 20000 lost-events
in your pipeline...
The test that looks after this logic "test_push_big_gap", basically
incremented the DTS of the buffer equal to the gap that was introduced,
so that in fact this would be more of a "large pause" test, than an
actual gap/discontinuity in the sequencenumbers.
Once the test was modified to not increment DTS (buffer arrival time) with
a similar gap, all sorts of crazy started happening, including adding
thousands of timers, and the logic that should have kicked in, the
"handle_big_gap_buffer"-logic, was not called at all, why?
Because the number max_dropout is calculated using the packet-rate, and
the packet-rate logic would, in this particular test, report that
the new packet rate was over 400000 packets per second!!!
I believe the right fix is to don't try and update the packet-rate if
there is any jumps in the sequence-numbers, and only do these calculations
for nice, sequential streams.
When the EOS event is received, run all timers immediately and avoid
pushing the EOS downstream before this has been run. This ensures that
the lost packet statistics are accurate.
The teardown of the pads checks the refcount, but there are timers
inside the jitterbuffer that can push things, so if we're not lucky,
things could be pushed while the pads are being shut down. Putting the
jitterbuffer to NULL first avoids this.
* Changed PCMU->TEST for common macros
* Changed verify-functions (lost & rtx) into macros.
* Remove option to add marker-bit for test-buffers (not used anywhere)
* Add new push_test_buffer function that makes sure there are correlation
between dts and the time on the clock. (classic test-mistake)
* Established a generic starting-point for tests with the
construct_deterministic_initial_state function and use it where
applicable, which removes lots of "boilerplate" everywhere.
* Add basic lost-event test
* Remove as much "magic constants" as possible.
* Remove 3 tests that no longer are testing anything that others don't,
and was completely unmaintainable.
* Remove unnecessary use of the testclock
* Verify each test is testing what it actually says it does (and modify
where it doesn't)
In general, make the tests much smaller, better, more maintainable and
readable.
https://bugzilla.gnome.org/show_bug.cgi?id=774409
When doing rtx, the jitterbuffer will always add an rtx-timer for the next
sequence number.
In the case of the packet corresponding to that sequence number arriving,
that same timer will be reused, and simply moved on to wait for the
following sequence number etc.
Once an rtx-timer expires (after all retries), it will be rescheduled as
a lost-timer instead for the same sequence number.
Now, if this particular sequence-number now arrives (after the timer has
become a lost-timer), the reuse mechanism *should* now set a new
rtx-timer for the next sequence number, but the bug is that it does
not change the timer-type, and hence schedules a lost-timer for that
following sequence number, with the result that you will have a very
early lost-event for a packet that might still arrive, and you will
never be able to send any rtx for this packet.
Found by Erlend Graff - erlend@pexip.comhttps://bugzilla.gnome.org/show_bug.cgi?id=773891
The lost-event was using a different time-domain (dts) than the outgoing
buffers (pts). Given certain network-conditions these two would become
sufficiently different and the lost-event contained timestamp/duration
that was really wrong. As an example GstAudioDecoder could produce
a stream that jumps back and forth in time after receiving a lost-event.
The previous behavior calculated the pts (based on the rtptime) inside the
rtp_jitter_buffer_insert function, but now this functionality has been
refactored into a new function rtp_jitter_buffer_calculate_pts that is
called much earlier in the _chain function to make pts available to
various calculations that wrongly used dts previously
(like the lost-event).
There are however two calculations where using dts is the right thing to
do: calculating the receive-jitter and the rtx-round-trip-time, where the
arrival time of the buffer from the network is the right metric
(and is what dts in fact is today).
The patch also adds two tests regarding B-frames or the
“rtptime-going-backwards”-scenario, as there were some concerns that this
patch might break this behavior (which the tests shows it does not).
The new timeout is always going to be (timeout + delay), however, the
old behavior compared the current timeout to just (timeout), basically
being (delay) off.
This would happen if rtx-delay == rtx-retry-timeout, with the result that
a second rtx attempt for any buffers would be scheduled immediately instead
of after rtx-delay ms.
Simply calculate (new_timeout = timeout + delay) and then use that instead.
https://bugzilla.gnome.org/show_bug.cgi?id=773905
The basic idea is this:
1. For *larger* rtx-rtt, weigh a new measurement as before
2. For *smaller* rtx-rtt, be a bit more conservative and weigh a bit less
3. For very large measurements, consider them "outliers"
and count them a lot less
The idea being that reducing the rtx-rtt is much more harmful then
increasing it, since we don't want to be underestimating the rtt of the
network, and when using this number to estimate the latency you need for
you jitterbuffer, you would rather want it to be a bit larger then a bit
smaller, potentially losing rtx-packets. The "outlier-detector" is there
to prevent a single skewed measurement to affect the outcome too much.
On wireless networks, these are surprisingly common.
https://bugzilla.gnome.org/show_bug.cgi?id=769768
Assuming equidistant packet spacing when that's not true leads to more
loss than necessary in the case of reordering and jitter. Typically this
is true for video where one frame often consists of multiple packets
with the same rtp timestamp. In this case it's better to assume that the
missing packets have the same timestamp as the last received packet, so
that the scheduled lost timer does not time out too early causing the
packets to be considered lost even though they may arrive in time.
https://bugzilla.gnome.org/show_bug.cgi?id=769768
There is no need to schedule another EXPECTED timer if we're already
past the retry period. Under normal operation this won't happen, but if
there are more timers than the jitterbuffer is able to process in
real-time, scheduling more timers will just make the situation worse.
Instead, consider this packet as lost and move on. This scenario can
occur with high loss rate, low rtt and high configured latency.
https://bugzilla.gnome.org/show_bug.cgi?id=769768
This patch fixes an issue with the estimated gap duration when there is
a gap immediately after a lost timer has been processed. Previously
there was a discrepancy beteen the gap in seqnum and gap in dts which
would cause wrong calculated duration. The issue would only be seen with
retranmission enabled since when it's disabled lost timers are only
created when a packet is received and the actual gap length and last dts
is known.
https://bugzilla.gnome.org/show_bug.cgi?id=769768
Stats should also be collected for unsuccessful packets.
rtx-rtt is very important for determining the necessary configured
latency on the jitterbuffer. It's especially important to be able to
increase the latency when retransmitted packets arrive too late and are
considered lost. This patch includes these late packets in the
calculation of the various rtx stats, making them more correct and
useful.
Also in the case where the original packet arrives after a NACK is sent,
the received RTX packet should update the stats since it provides useful
information about RTT.
The RTT is only updated if and only if all requested retranmissions are
received. That way the RTT is guaranteed to make sense. If not we don't
know which request the packet is a response to and the RTT may be bogus.
A consequence of this patch is that RTT is not updated for a request
when one of the RTX packets for that seqnum is lost, but that since
measured RTT will be more accurate.
The implementation store the RTX information from the timed out timers
and use this when the retransmitted packet arrives. For performance
these timers are stored separately from the "normal" timers in order to
not impact performance (see attached performance test).
https://bugzilla.gnome.org/show_bug.cgi?id=769768
Need to set max-misorder-time and max-dropout-time to 0 so the
jitterbuffer does not base them on packet rate calculations.
If it does, out gap is big enough to be considered a new stream and
we wait for a few consecutive packets just to be sure
https://bugzilla.gnome.org/show_bug.cgi?id=751311
When a packet arrives that has already been considered lost as part of a
large gap the "lost timer" for this will be cancelled. If the remaining
packets of this large gap never arrives, there will be missing entries
in the queue and the loop function will keep waiting for these packets
to arrive and never push another packet, effectively stalling the
pipeline.
The proposed fix conciders parts of a large gap definitely lost (since
they are calculated from latency) and ignores the late arrivals.
In practice the issue is rare since large gaps are scheduled immediately,
and for the stall to happen the late arrival needs to be processed
before this times out.
https://bugzilla.gnome.org/show_bug.cgi?id=765933
Use fail_unless and friends instead of g_assert
Factor seq-num checking out to separate function
Check more return-values from push and crank and others
https://bugzilla.gnome.org/show_bug.cgi?id=762254
Replace static constants with macros to make gcc happy
CC elements/elements_rtpjitterbuffer-rtpjitterbuffer.o
elements/rtpjitterbuffer.c:387:1: error: initializer element is not constant
static const GstClockTime PCMU_BUF_DURATION = PCMU_BUF_MS * GST_MSECOND;
^
elements/rtpjitterbuffer.c:388:1: error: initializer element is not constant
static const guint PCMU_BUF_SIZE = 64000 * PCMU_BUF_MS / 1000;
^
elements/rtpjitterbuffer.c:390:5: error: initializer element is not constant
PCMU_BUF_CLOCK_RATE * PCMU_BUF_MS / 1000;