These notes cover time coordinates in GES, time effects, time translations. It also goes into why keyframes will not work with non-linear time effects. Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-editing-services/-/merge_requests/177>
24 KiB
Time Notes
Some notes on time coordinates and time effects in GES.
Time Coordinate Definitions
A timeline will have a single time coordinate, which runs from 0
onwards in GstClockTime
. Each track in the timeline will share the same time.
For a given track, at any given timeline time time
, we have a stack of GESTrackElement
s whose interval [start, start + duration]
contains time
. The elements are linked in order or priority. Each element will have four time coordinates per each unique stack it is part of:
- external sink coordinates: the coordinates used at the boundary between the upstream element and itself. This is the external source coordinates of the upstream element minus
(downstream-start - upstream-start)
. If it has no upstream element, these coordinates do not exist. - external source coordinates: the coordinates used at the boundary between the downstream element and itself. This is the external sink coordinates of the downstream element minus
(upstream-start - downstream-start)
. If it has no downstream element, these coordinates can be translated to the timeline coordinates by adding thestart
of the element. - internal sink coordinates: the coordinates used for the sink of the first internal
GstElement
. This is the external sink coordinates plusin-point
. - internal source coordinates: the coordinates used at the source of the last internal
GstElement
. This will differ from the internal sink coordinates if one of theGstElement
s applies a rate-changing effects. This is the external source coordinates plusin-point
. Note that an element that changes the consumption rate should always have its in-point set to0
. This is because nleghostpad is not able to 'undo' this shift byin-point
at the opposite pad.
The following diagram shows where these coordinates are used, and how they are transformed. Below we have a GESSource
, followed by a GESOperation
that does not perform any rate-changing effect, followed by a GESEffect
that does apply a rate-changing effect (and so its in-point
is 0
).
time coordinate coordinate
coords object transformation transformation
used upwards downwards
______________________________________________________________________________________
| source (1) |
int. src |---------------------------|
| | + in-point-1 - in-point-1
ex. src '==========================='
- start-1 + start-2 + start-1 - start-2
ex. sink .===========================.
| | - in-point-2 + in-point-2
int. sink |---------------------------|
| operation (2) | identity () identity ()
int. src |---------------------------|
| | + in-point-2 - in-point-2
ex. src '==========================='
- start-2 + start-3 + start-2 - start-3
ex. sink .===========================.
| | - 0 + 0
int. sink |---------------------------|
| time effect (3) | f () f^-1 ()
int. src |---------------------------|
| | + 0 - 0
ex. src '==========================='
- start-3 + start-3
timeline +++++++++++++++++++++++++++++
timeline
The given function f
summarises how a seek will transform as it goes from from the source to the sink of the internal GstElement
, and f^-1
summarises how a segment stream time will transform.
In particular, f
will be a function
f: [0, MAX] -> [0, G_MAXUINT64 - 1]
where MAX
is some guint64
. For what follows, we will only fully support time effects whose function f
:
- is monotonically increasing. This would exclude time effects that play some later content, and then jump back to earlier content.
- is 'continuously reversible'. We define
T_d
for the timet
as the set of all the timest'
such that|f (t') - t| <= d
. This property requires that, for anyt
betweenf
's minimum and maximum values, we can choose a smalld
such thatT_d
is not empty, is small and has no gaps. The word "small" refers to an unnoticeable difference (the times are in nanoseconds). This means thatf
can be approximately reversed at all points between its minimum and maximum, which means thatf^-1
can act as a close inverse off
. For a monotonically increasing function, this means thatf
is steadily increasing.
For example, if f
simply doubles the time, then for time t = 501
, we can choose d=1
, and T_d
would be {250, 251}
.
This would exclude a time effect which has a large jump, because there would be a time t
between this jump, whose T_d
would be empty for all small d
.
This would also exclude a time effect that creates a freeze-frame effect by always seeking to the same spot, because at the time t
of this freeze-frame, T_d
would be large for all d
.
- obeys
f (0) = 0
. This would exclude a time effect that introduces an initial shift in the requested source time. - has a
MAX
that is large enough. For example, 24 hours would be fine for a timeline. This would exclude a rate effect with a very large speedup. - does not depend on any property outside of the effect element, or on the data it receives. This would exclude a time effect that, say, goes faster if there is more red in the image.
In what follows, a time effect that breaks one of these can still be used, but not all the features will work.
Translations Handled by nleobject
Pads
An nleobject
source pad will translate outgoing times by applying
time-out = time-in + start - in-point
This will translate from the internal source coordinates to the timeline coordinates if it is the most downstream element. Similarly, an nleobject
sink pad will translate incoming times by applying
time-out = time-in - start + in-point
If we have two nle-object
s, object-up
and object-down
, that have their pads linked, then a time time-up
from object-up
's internal GstElement
, would be translated at the link to
time-down
= time-up + start-up - in-point-up - start-down + in-point-down
So the pads will overall translate from the internal source coordinates of the upstream element to the internal sink coordinates of the downstream element.
Undefined Translations
Note that the coordinate transformation from the timeline time to an upstream time may be undefined depending on the configuration of elements in the timeline. For example, consider the earlier example stack, with the operation starting later than the time effect, such that
d = (start-2 - start-3) > 0
And we choose the time
time = start-2
= start-3 + d
Then, when time
is transformed to the external source coordinates of the operation, we have
operation-source-time = f (time - start-3) - start-2 + start-3
= f (d) - d
If the time effect slows down the consumption rate, then f (d) < d
, which would make the time undefined in the external source coordinates (we can not have a negative GstClockTime
). Basically, the effect is trying to access content that is before the operation.
We can similarly have an effect that tries to access content that is later than the operation, but this wouldn't lead to an underflow of the time. It can however lead to a request for data that is outside the internal content of the operation.
Mismatched Coordinates
The coordinates of an element are only defined relative to the stack that they are in. However, if we have no time effects, these coordinates will line up. Consider the following source and operation configuration.
| source (1) |
|---------------------------|
| |
'==========================='
.===========================.
| |
|---------------------------|
| operation (2) |
|---------------------------|
| |
'==========================='
+++++++++++++++++++++++++++++++++++++++++
timeline
This gives us three stacks
| (1) | | (1) |
|---------------| |-----------|
| | | |
'===============. '==========='
0 (s1-s2+d1) (s1-s2+d1) d2
0 (s2-s1) (s2-s1) d1
.===========. .===============.
| | | |
|-----------| |---------------|
| (2) | | (2) |
|-----------| |---------------|
| | | |
'===========' '==============='
0 (s2-s1) (s2-s1) d1
+++++++++++++ +++++++++++++++++ +++++++++++++
s1 s2 s2 (s1+d1) (s1+d1) (s2+d2)
where we have written in times in the external coordinates of the elements, where s1
and d1
are the start
and duration
of the source, and similarly for s2
and d2
for the operation. We can see that the edge times of all coordinates match up with their neighbours. Therefore, for both elements, there coordinates across each stack can be combined into a single coordinate system.
Consider that instead of the operation we have a time effect, then we would have
| (1) | | (1) |
|---------------| |-----------|
| | | |
'===============. '==========='
f(s2-s1) f(d1) (s1-s2+d1) d2
-(s2-s1) -(s2-s1)
f(0) f(s2-s1) f(s2-s1) f(d1)
.===========. .===============.
| | | |
|-----------| |---------------|
| (2) | | (2) |
|-----------| |---------------|
| | | |
'===========' '==============='
0 (s2-s1) (s2-s1) d1
+++++++++++++ +++++++++++++++++ +++++++++++++
s1 s2 s2 (s1+d1) (s1+d1) (s2+d2)
We can see that the coordinates of the source now start at f(s2-s1) - (s2-s1)
, rather than 0
. We can also see that the external source coordinates of the source jump by (d1 - f(d1))
when the time effect ends. Therefore, most time effects will prevent the coordinates from different stacks from being combined. This can lead to counter-intuitive behaviour.
A further example would be a rate effect with rate=3
that covers two sources that are side by side. The rate effect will not treat this as playing the sources concatenated, at triple speed. Instead, it would play the first source at triple speed, and once it reaches the starting timeline time of the second source, it will start playing the second source instead, but starting from the internal source coordinates
3 * (source-start - rate-start) - (source-start - rate-start) + source-in-point
= 2 * (source-start - rate-start) + source-in-point
Note that if this was a slowed down rate, this would have been an undefined (negative) time, as we mentioned earlier.
Therefore, in general, time effects should only be placed at a higher priority than elements that share the same start
and duration
as it. Note that it is fine to place an operation with a higher priority on top of a time effect with a different start
or duration
because this will not lead to a change in the coordinates.
This is why only a GESSourceClip
can have time effects added to it.
There is a general exception to this: if a time effect obeys f (0) = 0
, then it will not introduce mismatched coordinates downstream if it has a later start
than all the elements it has a higher priority than, and its end timeline time matches all of theirs. Note that this is because the effect would only exist in a single stack, and starts by apply no change to the times it receives.
GESTimelineElement times
The start
and duration
of an element use the timeline time coordinates. in-point
and max-duration
use the internal source coordinates. These last two should be 0
and GST_CLOCK_TIME_NONE
respectively for time effects.
How to Translate Between Time Coordinates of a Clip
Consider a GESTrackElement
element
in a GESClip
clip
in a timeline. It has n
active
elements with higher priority in the same clip
and track, labelled by i=1,...,n
, where element 1 has a higher priority than element 2, and so on. Each element has an associated function f_i
that translates from its external source coordinates to its external sink coordinates. Note that for elements that apply no time effect, this will be an identity, regardless of their in-point
. We can define the function F
, such that
F(t) = f_n (f_n-1 ( ... f_1 (t)...))
Note that if each f_i
has the desired properties, then so will F
, with the exception that the maximum value it can translate may have become too small. For example, if several rate effects accumulate into a very large speedup.
Given such an F
, we can translate from the timeline time t
to the internal source coordinate time of element
using
F (t - start) + in-point
This is what is done in ges_clip_get_internal_time_from_timeline_time
.
Note that this works because all the elements in clip
share the same start
. Note that this would not work if there existed an overlapping higher priority time effect outside of the clip because the highest priority clip element would not be receiving a timeline time at its source pads. This is not a problem if there are non-time effects at higher priority because they will pass through a timeline time unchanged.
If F
has the desired properties, it will have a well defined inverse F^-1
, based on the inverses of f_i
, which we can use to reverse this translation:
F^-1 (t - in-point) + start
This is what is done in ges_clip_get_timeline_time_from_internal_time
.
duration-limit
The duration-limit
is meant to be the largest value we can set the clip's duration
to.
It would be given by the minimum
ges_clip_get_timeline_time_from_internal_time (clip, child, child-max-duration) - start
we calculate amongst all its children that have a max-duration
. Note that the implementation of _calculate_duration_limit
does not use this method directly, but it should give the same result.
Note that this would fail if max-duration
is not reachable through a seek. E.g. if the corresponding function F
of the time effects acted like
F (t) = t + max-duration + 1
then F^-1 (t)
will be undefined for t=max-duartion
because its domain will be [max-duration + 1, inf)
. Note that this function F
does not obey F (0)=0
, so is not supported in GES.
Note that duration-limit
may not be exactly the largest end time possible. If the corresponding function F
is monotonically increasing, then there is no source time below max-duration
that could give a larger value, but there may be some times beyond max-time
that would correspond to the same source time. However, these extra times will only differ from the max-time
by a small amount if F
is 'continuously reversible', and so max-time
would be close enough. Otherwise, we would not have a simple way to know which is the actual largest duration
.
Trimming a clip
Normally, trimming is meant to keep the internal content in the same position relative to the timeline. If we are applying a non-constant rate effect, it may not be possible to keep all the internal content appearing in the timeline at the same time whilst changing the start
and duration
. However, we can keep the start or end frames/samples in the same timeline position.
Trimming the start of a clip to a later time
When trimming the start edge of a clip from timeline time old-start
to new-start
, where old-start < new-start <= (old-start + duration)
, we set the in-point
of the clip's children such that the internal content that appeared at new-start
before the trim, still appears at new-start
afterwards.
This would require
new-in-point = old-in-point + F (new-start - old-start)
because this is the internal source time corresponding to new-start
.
Note that, after we have finished trimming, assuming the corresponding F
has not changed and F (0) = 0
,
ges_clip_get_internal_timeline_from_timeline_time (clip, child, new-start)
= F (new-start - new-start) + new-in-point
= new-in-point
So after trimming, new-start
will correspond to the same source position as before. Note that this would not work if the time effects changed depending on the data they receive (such as a "go faster if we have more red" time effect) because the corresponding F
would have changed after setting the in-point
. However, we already stated earlier that these are not supported in GES.
Trimming the start of a clip to an earlier time
When trimming the start edge of a clip from timeline time old-start
to new-start
, where new-start < old-start
, we set the in-point
of the clip's children such that the internal content that appeared at old-start
before the trim, still appears at old-start
afterwards.
new-in-point = old-in-point - F (old-start - new-start)
Note that this will fail if the second argument is too big, which indicates that it would be before there is any internal content.
In terms of the function F
earlier, since this is calculated using the new start
and old in-point
, the source-time
would be
Note, after we have finished trimming, assuming the corresponding F
has not changed,
ges_clip_get_internal_time_from_timeline_time (clip, child, old-start)
= F (old-start - new-start) + new-in-point
= F (old-start - new-start) + old-in-point - F (old-start - new-start)
= old-in-point
So after trimming, old-start
will correspond to the same source time as before.
Note that ges_clip_get_internal_time_from_timeline_time
will perform this same calculation if it receives a timeline time before the start
of the clip. So timeline-tree is simply able to call ges_clip_get_internal_time_from_timeline_time (clip, child, new_start, error)
in both cases.
Trimming the end of a clip
This is as simple as changing the duration
of the clip since everything will stay at the same timeline position anyway (assuming F
does not change, as required by GES). It just cannot go above the clip's duration-limit
.
Splitting a clip
The in-point
of the new clip is chosen to match the new out-point
of the split clip. This won't work well if different core children of the clip will end up with very different out-point
s. But if these differences are within half a source frame, GES will not complain. The same can happen when trimming a clip, since all the core children must share the same in-point
.
Buffer Timestamps
NOTE: As of 21 May, the recommended changes are not implemented in GES. This delves into why translations will be needed for non-linear time effects.
Currently, the nleobject
pads will leave the buffer times unchanged. Which means that an nlesource
will send out buffer timestamped using its internal source coordinates.
This is in contrast to a pitch
within an nleoperation
, which would translate the buffer times from its internal sink coordinates to its internal source coordinates.
Since the internal source coordinates of the nlesource
do not match the internal sink coordinates of the nleoperation
(they will differ by in-point
), this will result in buffer times that are not in any coordinates.
This will make it difficult to use control bindings which are to be given in stream time, which is linked to the buffer timestamps.
We can explore this in more detail. According to the GStreamer design docs, the stream time is used for
- report the POSITION query in the pipeline
- the position used in seek events/queries
- the position used to synchronize controller values
Therefore, in our case, we can say that the stream time at a given position in an nle stack should match the corresponding seek time.
If we have no applied rate, which shouldn't be the case for a normal uses of a timeline, the stream-time
is given by
stream-time = buffer.timestamp - seg.start + seg.time
Thus, the stream time is basically the internal source or sink coordinates. In GESTrackElement
control sources are meant to be given in the internal source coordinates.
We will now look at what these time values currently are set to in a stack of an nlesource
and an nleoperation
that share the same start
and duration
.
Currently, the nleobject
pads will only change the seg.time
of the segments it receives by adding or subtracting (start - in-point)
.
We will assume that the GstElement
that the nleoperation
wraps is applying its time effect to seg.time
, seg.start
and buffer.timestamp
, which is given by the function g
. Note that this is what pitch
currently does. Its not clear to me what videorate
does to the buffer.timestamp
, but it does transform seg.start
the same way as seg.time
.
The following is a table of what the seg.time
, seg.start
and buffer.timestamp
values are when leaving a pad. The "internal src pad" refers to the source pad of the internal GstElement
. s
is the start
of the objects, and i
is the in-point
of the nlesource
. Following these is what the corresponding stream time would be using these values. The final row is what the corresponding seek position would be coming into the pad, if were seeking to the same media time T
.
nlesrc nlesrc nleop nleop nleop
internal external external internal external
src pad src pad sink pad src pad src pad
seg.time i s 0 g (0) g (0) + s
seg.start i i i g (i) g (i)
buffer. T T T g (T) g (T)
timestamp
------------------------------------------------------------------------
stream T T T g (T) g (T)
time - i - i - g (i) - g (i)
+ s + g (0) + g (0)
+ s
------------------------------------------------------------------------
seek T T T g (T - i) g (T - i)
time - i - i + s
+ s
We can see that after the nleoperation
, the seek time and stream time will generally be out of sync.
Note that if g
corresponds to a constant rate effect, then
g (t) = r * t
for some rate r
. Then, at the nleoperation
external source pad.
stream-time = r * T - r * i + r * 0 + s
= g (T - i) + s
= seek-time
so the two will match up under the current behaviour for this special case. However, if the rate varies, this will break down.
If, instead, we translate seg.start
and buffer.timestamp
in the same way as seg.time
on the nleobject
pads, by adding or subtracting (start - in-point)
, then we will always have
seg.start = seg.time
which means that we would also have
seek-time = stream-time = buffer.timestamp
Finally, it would be a good if the convention for a time effect was to use the output stream time in gst_object_sync_values
, rather than the input stream time. This would make them compatible with GES's rule that control sources are given in the internal source coordinates. Luckily, it seems that pitch
already uses the output stream time. videorate
doesn't currently use gst_object_sync_values
.