From 452269537cd330e463a087cd5dbde51e568ed6e1 Mon Sep 17 00:00:00 2001 From: Thomas Vander Stichele Date: Mon, 14 Jun 2004 15:21:19 +0000 Subject: [PATCH] notes on capturing Original commit message from CVS: notes on capturing --- docs/random/thomasvs/capturing | 124 +++++++++++++++++++++++++++++++++ 1 file changed, 124 insertions(+) create mode 100644 docs/random/thomasvs/capturing diff --git a/docs/random/thomasvs/capturing b/docs/random/thomasvs/capturing new file mode 100644 index 0000000000..19a6a2b2a8 --- /dev/null +++ b/docs/random/thomasvs/capturing @@ -0,0 +1,124 @@ +ELEMENTS (v4lsrc, alsasrc, osssrc) +-------- +- capturing elements should not do fps/sample rate correction themselves + they should timestamp buffers according to "a clock", period. + +- if the element is the clock provider: + - timestamp buffers based on the internals of the clock it's providing, + without calling the exposed clock functions + - do this by getting a measure of elapsed time based on the internal clock + that is being wrapped. Ie., count the number of samples the *device* + has processed/dropped/... + If there are no underruns, the produced buffers are a contiguous data + stream. + - possibilities: + - the device has a method to query for the absolute time related to + a buffer you're about to capture or just have captured: + Use that time as the timestamp on the capture buffer + (it's important that this time is related to the capture buffer; + ie. it's a time that "stands still" if you're not capturing) + - since you're providing the clocking, but don't have the previous method, + you should open the device with a given rate and continuously read + samples from it, even in PAUSED. This allows you to update an internal + clock. + You use this internal clock as well to timestamp the buffers going out, + so you again form a contiguous set of buffers. + The only acceptable way to continuously read samples then is in a private + thread. + - as long as no underruns happen, the flow being output is a perfect stream: + the flow is data-contiguous and time-contiguous. + +- if the element is not the clock provider + - the element should always respect the clock it is given. + - the element should timestamp outgoing buffers based on time given by + the provided clock, by querying for the time on that clock, and + comparing to the base time. + - the element should NOT drop/add frames. Rather, it should just + - timestamp the buffers with the current time according to the provided + clock + - set the duration according to the *theoretical/nominal* framerate + - when underruns happen (the device has lost capture data because our + element is not handling them quickly enough), this should be detectable + by the element through the device. On underrun, the offset of your + next buffer will not match the end_offset of your previous one + (ie, the data flow is no longer contiguous). + If the exact number of samples dropped is detectable, this is the + difference between new offset and old offset_end. + If it's not detectable, it should be guessed based on the elapsed time + between now and the last capture. + +- a second element can be responsible for making the stream time-contiguous. + (ie, T1 + D1 = T2 for all buffers). This way they are made + acceptible for gapless presentation (which is useful for audio). + - The element treats the incoming stream as data-contiguous but not + necessarily time-contiguous. + - If the timestamps are contiguous as well, then everything is fine and + nothing needs to be done. This is the case where a file is being read + from disk, or capturing was done by an element that provided the clock. + - If they are not contiguous, then this element must make them so. + Since it should respect the nominal framerate, it has to stretch or + shorten the incoming data to match the timestamps set on the data. + For audio and video, this means it could interpolate or add/drop samples. + For audio, resampling/interpolation is preferred. + For video, a simple mechanism that chooses the frame with a timestamp as + close as possible to the theoretical timestamp could be used. + - When it receives a new buffer that is not data-contiguous with the + previous one, the capture element dropped samples/frames. + The adjuster can correct this by sending out as much "no-signal" data + (for audio, e.g. silence or background noise; for video, sending out + black frames) as it wants, since a data discontinuity is unrepairable. + So it can use these to catch up more aggressively. + It should just make sure that the next buffer it gets again goes + back to respecting the nominal framerate. + +- To achieve the best possible long-time capture, the following can be done: + - audiosrc captures audio and provides the clock. It does contiguous + timestamping by default. + - videosrc captures video timestamped with the audiosrc's clock. This data + feed doesn't match the nominal framerate. If there is an encoding format + that supports storing the actual timestamps instead of pretending the + data flow respects the nominal framerate, this can be corrected after + recording. + - at the end of recording, the absolute length in time of both streams, + measured against a common clock, is the same or can be made the same by + chopping off data. + - the nominal rate of both audio and video is also known. + - given the length and the nominal rate, we have an evenly spaced list + of theoretical sampling points. + - video frames can now be matched to these theoretical sampling points by + interpolating or reusing/dropping frames. It can choose the best + possible algorithm for this to decrease the visible effects + (interpolating results in blur, add/drop frames results in jerkiness). + - with the video resampled at the theoretical framerate, and the audio + already correct, the recording can now be muxed correctly into a format + that implicitly assumes a data rate matching the nominal framerate. + - One possibility is to use the GDP to store the recording, because that + retains all of the timestamping information. + - The process is symmetrical; if you want to use the clock provided by + the video capturer, you can stretch/shrink the audio at the end of + recording to match. + +TERMINOLOGY +----------- +- nominal rate + the framerate/samplerate + exposed in the caps; ie. the theoretical framerate of the + data flow. This is the fps reported by the device or set for the encoder, + or the sampling rate of the audio device. +- contiguous data flow + offset_end of old buffer matches offset of new buffer + for audio, this is a more important requirement, since you configure + output devices for a contiguous data flow. +- contiguous time flow + T1 + D1 = T2 + for video, this is a more important requirement, because the sampling + period is bigger, so it is more important to match the presentation time +- "perfect stream" + data and time are contiguous and match the nominal rate + videotestsrc, sinesrc, filesrc ! decoder produce this + +NETWORK +------- +- elements can be synchronized by writing a NTP clock subclass that listens + to an ntp server, and tries to match its own clock against the NTP server + by doing gradual rate adjustment, compared with the own system clock.