From 452269537cd330e463a087cd5dbde51e568ed6e1 Mon Sep 17 00:00:00 2001
From: Thomas Vander Stichele <thomas@apestaart.org>
Date: Mon, 14 Jun 2004 15:21:19 +0000
Subject: [PATCH] notes on capturing

Original commit message from CVS:
notes on capturing
---
 docs/random/thomasvs/capturing | 124 +++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)
 create mode 100644 docs/random/thomasvs/capturing

diff --git a/docs/random/thomasvs/capturing b/docs/random/thomasvs/capturing
new file mode 100644
index 0000000000..19a6a2b2a8
--- /dev/null
+++ b/docs/random/thomasvs/capturing
@@ -0,0 +1,124 @@
+ELEMENTS (v4lsrc, alsasrc, osssrc)
+--------
+- capturing elements should not do fps/sample rate correction themselves
+  they should timestamp buffers according to "a clock", period.
+
+- if the element is the clock provider:
+  - timestamp buffers based on the internals of the clock it's providing,
+    without calling the exposed clock functions
+  - do this by getting a measure of elapsed time based on the internal clock
+    that is being wrapped.  Ie., count the number of samples the *device*
+    has processed/dropped/...
+    If there are no underruns, the produced buffers are a contiguous data
+    stream.  
+  - possibilities:
+    - the device has a method to query for the absolute time related to
+      a buffer you're about to capture or just have captured:
+      Use that time as the timestamp on the capture buffer
+      (it's important that this time is related to the capture buffer;
+       ie. it's a time that "stands still" if you're not capturing)
+    - since you're providing the clocking, but don't have the previous method,
+      you should open the device with a given rate and continuously read
+      samples from it, even in PAUSED.  This allows you to update an internal
+      clock.
+      You use this internal clock as well to timestamp the buffers going out,
+      so you again form a contiguous set of buffers.
+      The only acceptable way to continuously read samples then is in a private
+      thread.
+  - as long as no underruns happen, the flow being output is a perfect stream:
+    the flow is data-contiguous and time-contiguous.
+
+- if the element is not the clock provider
+  - the element should always respect the clock it is given.
+  - the element should timestamp outgoing buffers based on time given by
+    the provided clock, by querying for the time on that clock, and
+    comparing to the base time.
+  - the element should NOT drop/add frames.  Rather, it should just
+    - timestamp the buffers with the current time according to the provided
+      clock
+    - set the duration according to the *theoretical/nominal* framerate
+    - when underruns happen (the device has lost capture data because our
+      element is not handling them quickly enough), this should be detectable
+      by the element through the device.  On underrun, the offset of your
+      next buffer will not match the end_offset of your previous one
+      (ie, the data flow is no longer contiguous).
+      If the exact number of samples dropped is detectable, this is the
+      difference between new offset and old offset_end.
+      If it's not detectable, it should be guessed based on the elapsed time
+      between now and the last capture.
+
+- a second element can be responsible for making the stream time-contiguous.
+  (ie, T1 + D1 = T2 for all buffers).  This way they are made
+  acceptible for gapless presentation (which is useful for audio).
+  - The element treats the incoming stream as data-contiguous but not
+    necessarily time-contiguous.
+  - If the timestamps are contiguous as well, then everything is fine and
+    nothing needs to be done.  This is the case where a file is being read
+    from disk, or capturing was done by an element that provided the clock.
+  - If they are not contiguous, then this element must make them so.
+    Since it should respect the nominal framerate, it has to stretch or
+    shorten the incoming data to match the timestamps set on the data.
+    For audio and video, this means it could interpolate or add/drop samples.
+    For audio, resampling/interpolation is preferred.
+    For video, a simple mechanism that chooses the frame with a timestamp as
+    close as possible to the theoretical timestamp could be used.
+  - When it receives a new buffer that is not data-contiguous with the
+    previous one, the capture element dropped samples/frames.
+    The adjuster can correct this by sending out as much "no-signal" data
+    (for audio, e.g. silence or background noise; for video, sending out
+    black frames) as it wants, since a data discontinuity is unrepairable.
+    So it can use these to catch up more aggressively.
+    It should just make sure that the next buffer it gets again goes
+    back to respecting the nominal framerate.
+
+- To achieve the best possible long-time capture, the following can be done:
+  - audiosrc captures audio and provides the clock.  It does contiguous
+    timestamping by default.
+  - videosrc captures video timestamped with the audiosrc's clock.  This data
+    feed doesn't match the nominal framerate.  If there is an encoding format
+    that supports storing the actual timestamps instead of pretending the
+    data flow respects the nominal framerate, this can be corrected after
+    recording.
+  - at the end of recording, the absolute length in time of both streams,
+    measured against a common clock, is the same or can be made the same by
+    chopping off data.
+  - the nominal rate of both audio and video is also known.
+  - given the length and the nominal rate, we have an evenly spaced list
+    of theoretical sampling points.
+  - video frames can now be matched to these theoretical sampling points by
+    interpolating or reusing/dropping frames.  It can choose the best
+    possible algorithm for this to decrease the visible effects
+    (interpolating results in blur, add/drop frames results in jerkiness).
+  - with the video resampled at the theoretical framerate, and the audio
+    already correct, the recording can now be muxed correctly into a format
+    that implicitly assumes a data rate matching the nominal framerate.
+  - One possibility is to use the GDP to store the recording, because that
+    retains all of the timestamping information.
+  - The process is symmetrical; if you want to use the clock provided by
+    the video capturer, you can stretch/shrink the audio at the end of
+    recording to match.
+
+TERMINOLOGY
+-----------
+- nominal rate
+  the framerate/samplerate
+  exposed in the caps; ie. the theoretical framerate of the
+  data flow.  This is the fps reported by the device or set for the encoder,
+  or the sampling rate of the audio device.
+- contiguous data flow
+  offset_end of old buffer matches offset of new buffer
+  for audio, this is a more important requirement, since you configure
+  output devices for a contiguous data flow.
+- contiguous time flow
+  T1 + D1 = T2
+  for video, this is a more important requirement, because the sampling
+  period is bigger, so it is more important to match the presentation time
+- "perfect stream"
+  data and time are contiguous and match the nominal rate
+  videotestsrc, sinesrc, filesrc ! decoder produce this
+
+NETWORK
+-------
+- elements can be synchronized by writing a NTP clock subclass that listens
+  to an ntp server, and tries to match its own clock against the NTP server
+  by doing gradual rate adjustment, compared with the own system clock.