Various audio formats require an audio lead-in to decode it properly.
Most parsers would take care of it, but when a container like matroska is
involved, the demuxer handles the seeking and without its own lead-in
handling would never even pass the lead-in data to the parser.
This commit provides an initial implementation of that for audio/mpeg,
audio/x-ac3 and audio/x-eac3 by calculating the worst case lead-in time
needed from known samplerate, potential lead-in frames need and the
maximum blocksize possible for the format (as we don't parse that out
exactly in matroskademux) and seeking that much earlier in case of
accurate seeks. This is especially important for NLE use-cases with GES.
If accurate seeking to a position that happens to have a video keyframe,
it'll go back to the previous keyframe than needed, but with typical
video files that's the best we can do anyway without falling back to
scanning the clusters, as typically only keyframes are indexed in
Cueing Data.
If the media doesn't have a CUE, then we bisect for the cluster to seek
to with the same modified time as well in case of accurate seeking,
ensuring sufficient lead-in. This code path is typically hit only with
(suboptimal) audio-only matroska files, e.g. when created with ffmpeg,
which doesn't add a CUE for audio-only mkv muxing.
The send path in rtpsession processes the buffer list along the way,
sharing info and stats between packets in the same list, because it
assumes that all packets in a buffer list are from the same frame.
However, in the receiving path packets can arrive in all sorts of
arrangements:
- different sources,
- different frames (different timestamps),
- different types (multiplexed RTP and RTCP, invalid RTP packets).
so a more general approach should be used to correctly support buffer
lists in the receive path.
It turns out that it's simpler and more robust to process buffers
individually inside the rtpsession element even if they come in a buffer
list, and then reassemble a new buffer list when pushing the buffers
downstream.
This avoids complicating the existing code to make all functions
buffer-list-aware with the risk of introducing regressions,
To support buffer lists in the receive path and reduce the "push
overhead" in the pipeline, a new private field named processed_list is
added to GstRtpSessionPrivate, it is set in the chain_list handler and
used in the process_rtp callback; this is to achieve the following:
- iterate over the incoming buffer list;
- process the packets one by one;
- add the valid ones to a new buffer list;
- push the new buffer list downstream.
The processed_list field is reset before pushing a buffer list to be on
the safe side in case a single buffer was to be pushed by upstream
at some later point.
NOTE:
The proposed modifications do not change the behavior of the send path.
The process_rtp callback is called in rtpsource.c by the push_rtp
callback (via source_push_rtp) only when the source is not internal.
So even though push_rtp is also called in the send path, it won't end up
using process_rtp in this case because the source would be internal in
the send path.
The reasoning from above may suggest a future refactoring: push_rtp
might be split to better differentiate the send and receive path.
In push mode (streaming), if the last audio payload chunk is less than the segment rate buffer size, it would be ignored since the plugin waits until it has at least segment rate bufer size of audio.
The fix is to introduce a flushing flag that indicates that no more audio will be available so that the plugin can recognize this condition and flush the data is has even if it is less
than the desired segment rate buffer size.
This is useful to support the ONVIF case: when is-live is set to
FALSE and onvif-rate-control is no, the client can control the
rate of delivery and arrange for the server to block and still
keep sending when unblocked, without requiring back and forth
PAUSE / PLAY requests. This enables, amongst other things, fast
frame stepping on the client side.
When is-live is FALSE, we don't use a manager at all. This case
was actually already pretty well handled by the current code. The
standard manager, rtpbin, is simply no longer needed in this case.
Applications can instantiate a downloadbuffer after rtspsrc if
needed.
Refactor the code for parsing and generating the Range, taking
advantage of existing API in GstRtspTimeRange.
Only use the TCP protocol in that mode, as per the specification.
Generate an accurate segment when in that mode, and signal to the
depayloader that it should not generate its own segment, through
the "onvif-mode" field in the caps, see
<https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/merge_requests/328>
for more information.
Translate trickmode seek flags to their ONVIF representation
Expose an onvif-rate-control property
Forwarding a single segment event from the pad that first gets
chained is incorrect: when that first event was sent by an element
such as x264enc, with its offset start, we end pushing out of segment
buffers for the other pad(s).
Instead, everytime the active pad changes, forward the appropriate
segment event.
Fixes https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/1028
When it is not clear yet if a packet relative to a source should be
pushed, the packet is put into a queue, this happens in two cases:
- the source is still in probation;
- there is a large jump in seqnum, and it is not clear what
the cause is, future packets will help making a guess.
In either case stats about received packets are not updated at all; and
even if they were, when init_seq() is called it resets all receiver
stats, effectively loosing any possible stat about previously received
packets.
Fix this by taking into account the queued packets and update the stats
when calling init_seq().
Since commit c971d1a9a (rtpsource: refactor bitrate estimation,
2010-03-02) bytes_received filed in RTPSourceStats is set but then never
used again, expose it so that it can be used by user code to verify how
many bytes have been received.
According to RFC3550 lower-level headers should be considered for
bandwidth calculation.
See https://tools.ietf.org/html/rfc3550#section-6.2 paragraph 4:
Bandwidth calculations for control and data traffic include
lower-layer transport and network protocols (e.g., UDP and IP) since
that is what the resource reservation system would need to know.
Fix the source data to accommodate that.
Assume UDPv4 over IP for now, this is a simplification but it's good
enough for now.
While at it define a constant and use that instead of a magic number.
NOTE: this change basically reverts the logic of commit 529f443a6
(rtpsource: use payload size to estimate bitrate, 2010-03-02)
adjust/port from rtph264pay and allow sending the configuration data at
every IDR
The payloader was stripping the configuration data when the
config-interval was set to 0. The code was written in such a way !(a >
0) that it stripped the config when it was set at -1 (send config_data
as soon as possible).
This resulted in some MPEG4 streams where no GOP/VOP-I was detected to
be sent out without configuration.
In reverse playback, we don't want to rely on the position of the current
keyframe to decide a stream is EOS: the last GOP we push will start with
a keyframe, which position is likely to be outside of the segment.
Instead, let the normal seek_to_previous_keyframe mechanism do its job,
it works just fine.
If a key unit seek is performed with a time position that matches
the offset of a keyframe, but not its actual PTS, we need to
adjust the segment nevertheless.
For example consider the following case:
* stream starts with a keyframe at 0 nanosecond, lasting 40 milliseconds
* user does a key unit seek at 20 milliseconds
* we don't adjust the segment as the time position is "over" a keyframe
* we push a segment that starts at 20 milliseconds
* we push a buffer with PTS == 0
* an element downstream (eg rtponviftimestamp) tries to calculate the
stream time of the buffer, fails to do so and drops it
When the seek event contains a (newly-added) trickmode interval,
and TRICKMODE_KEY_UNITS was requested, only let through keyframes
separated with the required interval
The primary video stream is used to select fragment cut points
at keyframe boundaries. Auxilliary video streams may be
broken up at any packet - so fragments may not start with a keyframe
for those streams.
The time_position field of the stream is offset by the media_start
of its QtDemuxSegment compared to the start of the GstSegment of
the demuxer, take it into account when making comparisons.
If the conflict is detected when sending a packet, then also send an
upstream event to tell the source to reconfigure itself.
Also ignore the collision if we see more than one collision from the same
remote source to avoid problems on loops.
Add a new property "do-aggregate"* to the H.264 RTP payloader which
enables STAP-A aggregation as per [RFC-6184][1]. With aggregation enabled,
packets are bundled instead of sent immediately, up until the MTU size.
Bundles also end at access unit boundaries or when packets have to be
fragmented.
*: The property-name is kept generic since it might apply more widely,
e.g. STAP-B or MTAP.
[1]: https://tools.ietf.org/html/rfc6184#section-5.7
Closes https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/434