2004-06-14 15:21:19 +00:00
|
|
|
ELEMENTS (v4lsrc, alsasrc, osssrc)
|
|
|
|
--------
|
|
|
|
- capturing elements should not do fps/sample rate correction themselves
|
|
|
|
they should timestamp buffers according to "a clock", period.
|
|
|
|
|
|
|
|
- if the element is the clock provider:
|
|
|
|
- timestamp buffers based on the internals of the clock it's providing,
|
|
|
|
without calling the exposed clock functions
|
|
|
|
- do this by getting a measure of elapsed time based on the internal clock
|
|
|
|
that is being wrapped. Ie., count the number of samples the *device*
|
|
|
|
has processed/dropped/...
|
|
|
|
If there are no underruns, the produced buffers are a contiguous data
|
|
|
|
stream.
|
|
|
|
- possibilities:
|
|
|
|
- the device has a method to query for the absolute time related to
|
|
|
|
a buffer you're about to capture or just have captured:
|
|
|
|
Use that time as the timestamp on the capture buffer
|
|
|
|
(it's important that this time is related to the capture buffer;
|
|
|
|
ie. it's a time that "stands still" if you're not capturing)
|
|
|
|
- since you're providing the clocking, but don't have the previous method,
|
|
|
|
you should open the device with a given rate and continuously read
|
|
|
|
samples from it, even in PAUSED. This allows you to update an internal
|
|
|
|
clock.
|
|
|
|
You use this internal clock as well to timestamp the buffers going out,
|
|
|
|
so you again form a contiguous set of buffers.
|
|
|
|
The only acceptable way to continuously read samples then is in a private
|
|
|
|
thread.
|
|
|
|
- as long as no underruns happen, the flow being output is a perfect stream:
|
|
|
|
the flow is data-contiguous and time-contiguous.
|
2004-06-17 11:00:20 +00:00
|
|
|
- underruns should be handled like this:
|
|
|
|
- if the code can detect how many samples it dropped, it should just
|
|
|
|
send the next buffer with the new correct offset. Ie, it produced
|
|
|
|
a data gap, and since it provides the clock, it produces a perfect
|
|
|
|
data gap (the timestamp will be correctly updated too).
|
|
|
|
- if it cannot detect how many samples it dropped, there's a fallback
|
|
|
|
algorithm. The element uses another GstClock (for example, system clock)
|
|
|
|
on which it corrects the skew and drift continuously as long as it
|
|
|
|
doesn't drop. When it detected a drop, it can get the time delta
|
|
|
|
on the other GstClock since the last time it captured and the current
|
|
|
|
time, and use that delta to guesstimate the number of samples dropped.
|
2004-06-14 15:21:19 +00:00
|
|
|
|
|
|
|
- if the element is not the clock provider
|
|
|
|
- the element should always respect the clock it is given.
|
|
|
|
- the element should timestamp outgoing buffers based on time given by
|
|
|
|
the provided clock, by querying for the time on that clock, and
|
|
|
|
comparing to the base time.
|
|
|
|
- the element should NOT drop/add frames. Rather, it should just
|
|
|
|
- timestamp the buffers with the current time according to the provided
|
|
|
|
clock
|
|
|
|
- set the duration according to the *theoretical/nominal* framerate
|
|
|
|
- when underruns happen (the device has lost capture data because our
|
|
|
|
element is not handling them quickly enough), this should be detectable
|
|
|
|
by the element through the device. On underrun, the offset of your
|
|
|
|
next buffer will not match the end_offset of your previous one
|
|
|
|
(ie, the data flow is no longer contiguous).
|
|
|
|
If the exact number of samples dropped is detectable, this is the
|
|
|
|
difference between new offset and old offset_end.
|
|
|
|
If it's not detectable, it should be guessed based on the elapsed time
|
|
|
|
between now and the last capture.
|
|
|
|
|
|
|
|
- a second element can be responsible for making the stream time-contiguous.
|
|
|
|
(ie, T1 + D1 = T2 for all buffers). This way they are made
|
|
|
|
acceptible for gapless presentation (which is useful for audio).
|
|
|
|
- The element treats the incoming stream as data-contiguous but not
|
|
|
|
necessarily time-contiguous.
|
|
|
|
- If the timestamps are contiguous as well, then everything is fine and
|
|
|
|
nothing needs to be done. This is the case where a file is being read
|
|
|
|
from disk, or capturing was done by an element that provided the clock.
|
|
|
|
- If they are not contiguous, then this element must make them so.
|
|
|
|
Since it should respect the nominal framerate, it has to stretch or
|
|
|
|
shorten the incoming data to match the timestamps set on the data.
|
|
|
|
For audio and video, this means it could interpolate or add/drop samples.
|
|
|
|
For audio, resampling/interpolation is preferred.
|
|
|
|
For video, a simple mechanism that chooses the frame with a timestamp as
|
|
|
|
close as possible to the theoretical timestamp could be used.
|
|
|
|
- When it receives a new buffer that is not data-contiguous with the
|
|
|
|
previous one, the capture element dropped samples/frames.
|
|
|
|
The adjuster can correct this by sending out as much "no-signal" data
|
|
|
|
(for audio, e.g. silence or background noise; for video, sending out
|
|
|
|
black frames) as it wants, since a data discontinuity is unrepairable.
|
|
|
|
So it can use these to catch up more aggressively.
|
|
|
|
It should just make sure that the next buffer it gets again goes
|
|
|
|
back to respecting the nominal framerate.
|
|
|
|
|
|
|
|
- To achieve the best possible long-time capture, the following can be done:
|
|
|
|
- audiosrc captures audio and provides the clock. It does contiguous
|
|
|
|
timestamping by default.
|
|
|
|
- videosrc captures video timestamped with the audiosrc's clock. This data
|
|
|
|
feed doesn't match the nominal framerate. If there is an encoding format
|
|
|
|
that supports storing the actual timestamps instead of pretending the
|
|
|
|
data flow respects the nominal framerate, this can be corrected after
|
|
|
|
recording.
|
|
|
|
- at the end of recording, the absolute length in time of both streams,
|
|
|
|
measured against a common clock, is the same or can be made the same by
|
|
|
|
chopping off data.
|
|
|
|
- the nominal rate of both audio and video is also known.
|
|
|
|
- given the length and the nominal rate, we have an evenly spaced list
|
|
|
|
of theoretical sampling points.
|
|
|
|
- video frames can now be matched to these theoretical sampling points by
|
|
|
|
interpolating or reusing/dropping frames. It can choose the best
|
|
|
|
possible algorithm for this to decrease the visible effects
|
|
|
|
(interpolating results in blur, add/drop frames results in jerkiness).
|
|
|
|
- with the video resampled at the theoretical framerate, and the audio
|
|
|
|
already correct, the recording can now be muxed correctly into a format
|
|
|
|
that implicitly assumes a data rate matching the nominal framerate.
|
|
|
|
- One possibility is to use the GDP to store the recording, because that
|
|
|
|
retains all of the timestamping information.
|
|
|
|
- The process is symmetrical; if you want to use the clock provided by
|
|
|
|
the video capturer, you can stretch/shrink the audio at the end of
|
|
|
|
recording to match.
|
|
|
|
|
|
|
|
TERMINOLOGY
|
|
|
|
-----------
|
|
|
|
- nominal rate
|
|
|
|
the framerate/samplerate
|
|
|
|
exposed in the caps; ie. the theoretical framerate of the
|
|
|
|
data flow. This is the fps reported by the device or set for the encoder,
|
|
|
|
or the sampling rate of the audio device.
|
|
|
|
- contiguous data flow
|
|
|
|
offset_end of old buffer matches offset of new buffer
|
|
|
|
for audio, this is a more important requirement, since you configure
|
|
|
|
output devices for a contiguous data flow.
|
|
|
|
- contiguous time flow
|
|
|
|
T1 + D1 = T2
|
|
|
|
for video, this is a more important requirement, because the sampling
|
|
|
|
period is bigger, so it is more important to match the presentation time
|
|
|
|
- "perfect stream"
|
|
|
|
data and time are contiguous and match the nominal rate
|
|
|
|
videotestsrc, sinesrc, filesrc ! decoder produce this
|
|
|
|
|
|
|
|
NETWORK
|
|
|
|
-------
|
|
|
|
- elements can be synchronized by writing a NTP clock subclass that listens
|
|
|
|
to an ntp server, and tries to match its own clock against the NTP server
|
|
|
|
by doing gradual rate adjustment, compared with the own system clock.
|
2004-06-17 11:00:20 +00:00
|
|
|
- sending audio and video over the network using tcpserversink is possible
|
|
|
|
when the streams are made to be perfect streams and synchronized.
|
|
|
|
Since the streams are perfect and synchronized, the timestamps transmitted
|
|
|
|
along with the buffers can be trusted. The client just has to make
|
|
|
|
sure that it respects the timestamps.
|
|
|
|
- One good way of doing that is to make an element that provides a clock
|
|
|
|
based on the timestamps of the data stream, interpolating using another
|
|
|
|
GstClock inbetween those time points. This allows you to create
|
|
|
|
a perfect network stream player (one that doesn't lag (increasing buffers))
|
|
|
|
or play too fast (having an empty network queue).
|
|
|
|
- On the client side, a GStreamer-ish way to do that is to cut the playback
|
|
|
|
pipeline in half, and have a decoupled element that converts
|
|
|
|
timestamps/durations (by resampling/interpolating/...) so that the sinks
|
|
|
|
consume data at the same rate the tcp sources provide it.
|
|
|
|
tcpclientsrc ! theoradec ! clocker name=clocker { clocker. ! xvimagesink }
|
|
|
|
|
|
|
|
SYNCHRONISATION
|
|
|
|
---------------
|
|
|
|
- low rate source with high rate source:
|
|
|
|
the high rate source can drop samples so it starts with the same phase
|
|
|
|
as the low rate source. This could be done in a synchronizer element.
|
|
|
|
example:
|
|
|
|
- audio, 8000 Hz, and video, 5 fps
|
|
|
|
- pipeline goes to playing
|
|
|
|
- video src does capture and receives its first frame 50 ms after playing
|
|
|
|
-> phase is -90 or 270 degrees
|
|
|
|
- to compensate, the equivalent of 150 ms of audio could be dropped so
|
|
|
|
that the first videoframe's timestamp coincides with the timestamp of
|
|
|
|
the first audio buffer
|
|
|
|
- this should be done in the raw audio domain since it's typically not
|
|
|
|
possible to chop off samples in the encoded domain
|
|
|
|
|
|
|
|
- two low rate sources:
|
|
|
|
not possible to do this correctly, maybe something in the middle can be
|
|
|
|
found ?
|
|
|
|
|
|
|
|
IMPROVING QUALITY
|
|
|
|
-----------------
|
|
|
|
- video src can capture at a higher framerate than will be encoded
|
|
|
|
- this gives the corrector more frames to choose from or interpolate with
|
|
|
|
to match the target framerate, reducing jerkiness.
|
|
|
|
e.g. capturing at 15 fps for 5 fps framerate.
|
|
|
|
|
|
|
|
LIVE CHANGES IN PIPELINE
|
|
|
|
------------------------
|
|
|
|
- case 1: video recording for some time, user wants to add audio recording on
|
|
|
|
the fly
|
|
|
|
- user sets complete pipeline to paused
|
|
|
|
- user adds element for audio recording
|
|
|
|
- new element gets same base time as video element
|
|
|
|
- on PLAYING, new element will be in sync and the first buffer produced
|
|
|
|
will have a non-zero timestamp that is the same as the first new video
|
|
|
|
buffer
|
|
|
|
|
|
|
|
- case 2: video recording for some time, user wants to add in an audio file
|
|
|
|
from disk.
|
|
|
|
- two possible expectations:
|
|
|
|
A) user expects the audio file to "start playing now" and be muxed
|
|
|
|
together with the current video frames
|
|
|
|
B) user expects the audio file to "start playing from the point where the
|
|
|
|
video currently is" (ie, video is at 10 seconds, so mux with audio
|
|
|
|
starting from 10 secs)
|
|
|
|
- case A):
|
|
|
|
- complete pipeline gets paused
|
|
|
|
- filesrc ! dec added
|
|
|
|
- both get base_time same as video element
|
|
|
|
- pipeline to playing
|
|
|
|
- all elements receive new "now" as base_time so timestamps are reset
|
|
|
|
- muxer will receive synchronized data from both
|
|
|
|
- case B):
|
|
|
|
nothing gets paused
|
|
|
|
- filesrc ! dec added
|
|
|
|
- both get base_time that is the current clock time
|
|
|
|
- pipeline to playing
|
|
|
|
- core sets
|
|
|
|
1) - new audio part starts sending out data with timestamp 0 from start
|
|
|
|
of file
|
|
|
|
- muxer receives a whole set of frames from the audio side that are late
|
|
|
|
(since the timestamps start at 0), so keeps dropping until it has
|
|
|
|
caught up with the current set).
|
|
|
|
OR
|
|
|
|
2) - audio part does clock query
|
|
|
|
|
|
|
|
THINGS TO DIG UP
|
|
|
|
----------------
|
|
|
|
- is there a better way to get at "when was this frame captured" then doing
|
|
|
|
a clock query after capturing ?
|
|
|
|
Imagine a video device with a hardware buffer of four frames. If you
|
|
|
|
haven't asked for a frame from it in a while, three frames could be
|
|
|
|
queued up. So three consecutive frame gets result in immediate returns
|
|
|
|
with pretty much the same clock query for each of them.
|
|
|
|
So we should find a way to get "a comparable clock time" corresponding
|
|
|
|
to the captured frame.
|
|
|
|
|
|
|
|
- v4l2 api returns a gettimeofday() timestamp with each buffer.
|
|
|
|
Given that, you can timestamp the buffer by subtracting the delta
|
|
|
|
between the buffer's clock timestamp with the current system clock time,
|
|
|
|
from the current time reported by the provided clock.
|