When we receive a new alternative we want to avoid iterating out of
bounds, but the comparison between the current index and the length of
the alternative should not log an error when partial_index == length, as
Vec::drain(length..) is valid, and it is completely valid for AWS to
send us a new alternative with as many items as we have already
dequeued.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1751>
Quoting [`BehaviorVersion` documentation]:
> Over time, new best-practice behaviors are introduced. However, these
> behaviors might not be backwards compatible. For example, a change which
> introduces new default timeouts or a new retry-mode for all operations might
> be the ideal behavior but could break existing applications.
This commit uses `BehaviorVersion::v2023_11_09()`, which is the latest
major version at the moment. When a new major version is released, the method
will be deprecated, which will warn us of the new version and let us decide
when to upgrade, after any changes if required. This is safer that using
`latest()` which would silently use a different major version, possibly
breaking existing code.
[`BehaviorVersion` documentation]: https://docs.rs/aws-config/1.1.8/aws_config/struct.BehaviorVersion.html
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1520>
Prior to this commit, we were sending over words concatenated together
with no separators, for instance "Idon'twanttobeanemperor".
The translation service seems clever enough to translate the contents
anyway, but there is no reason to make its task harder than necessary,
and it didn't re-add separators when the target language was the same as
the source language, which resulted in less than ideal output.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1171>
* A queue dedicated to transcript items not intended for translation.
* A queue dedicated to transcript items intended for translation. The items are
enqueued after a separator is detected or translate-lookahead was reached.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
This commit adds an optional experimental translation tokenization feature.
It can be activated using the `translation_src_%u` pads property
`tokenization-method`. For the moment, the feature is deactivated by default.
The Translate ws accepts '<span></span>' tags in the input and adds matching
tags in the output. When an 'id' is also provided as an attribute of the
'span', the matching output tag also uses this 'id'.
In the context of close captions, the 'id's are of little use. However, we can
take advantage of the spans in the output to identify translation chunks, which
more or less reflect the rythm of the input transcript.
This commit adds simples spans (no 'id') to the input Transcript Items and
parses the resulting spans in the translated output, assigning the timestamps
and durations sequentially from the input Transcript Items. Edge cases such as
absence of spans, nested spans were observed and are handled here. Similarly,
mismatches between the number of input and output items are taken care of by
some sort of reconcialiation.
Note that this is still experimental and requires further testings.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This commit adds an optional transcript translation feature implemented as
request src Pads.
When requesting a src Pad, the user can specify the translation language code
using Pad properties 'language-code'.
The following properties are defined on the Element:
- 'transcribe-latency': formerly 'latency', defines the expected latency for
the Transcribe webservice.
- 'translate-latency': defines the expected latency for the Translate
webservice.
- 'transcript-lookahead': maximum transcript duration to send to translation
when a transcript is hitting its deadline and no punctuation was found.
When the input and output languages are the same, only the 'transcribe-latency'
is used for the Pad. Otherwise, the resulting latency is the addition of
'transcribe-latency' and 'translate-latency'.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This helps gather together the details related to the `TranscriberLoop`.
One difference with previous implementation is that the ws `Client` is
build each time the loop is started instead of being reused. With the new
approach, we don't keep the connection open after EOS and we should be
more resistant in case of a connection failure.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>
Instead of sending transcription events to the src pad loop, this commit
enqueues the transcribed buffers immediately in the ws loop, then notifies
the src pad loop. The src pad loop is only in charge of dequeuing the buffers.
This should help with upcoming evolutions.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>