Prior to this, the segment location was derived from the
multivariant playlist location and the template for the
segment was hard coded. Remove this restriction but note
that this also now requires users to specify the segment
and CMAF init track location per variant or rendition.
Do the same for media playlist location as well.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/2062>
Previously, transcriberbin only supported updating translation languages
while playing by resetting the state of the transcriber to NULL
beforehand, as for instance the speechmatics transcriber needs to
reestablish a connection to request new languages.
Now that translationbin exists, we can request new languages without
restarting the transcriber (this commit also implements support for this
in translationbin).
There is some code duplication as the old method still needs to be
supported, and not all code was trivially factorizable, but after some
refactoring most of the code for updating languages is shared
nevertheless.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/2072>
The original awstranscribe element has grown too complex when
integrating translations for reasons that in retrospect were wrong:
As awstranscribe outputs words one by one, I decided we wanted to
perform translations there with larger sentences if available, but an
alternative design where a separate translation element is composed
downstream is also possible, as long as that element accumulates words
and enough latency is set on the transcriber.
An important difference is that the new elements do not expose unsynced
pads, this use case is instead now served by simple messages on the bus.
The elements should otherwise be at feature parity with the original
element.
A higher-level bin is also provided for convenience (and usage within
transcriberbin): translationbin.
A transcriber element can be provided to this bin, which exposes an
always audio sink pad, and an always text sink pad (for the
transcripts).
Additional source pads can be requested for translations, for now the
bin always uses `awstranslate` as the translator, but this can be made
configurable.
This element is usable as a transcriber in `transcriberbin`.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/2055>
This adds support for direct encoding of common formats into ISO base media file
format.
There are unit tests for formats that are not completely supported, to
check that those functions work correctly, and to ease future extension.
End-to-end testing currently requires use of gpac to validate files.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1990>
Brings support for multiple streams of each kind to fallbacksrc.
Usage past 1video/1audio stream now requires using the stream selection
API.
fallbacksrc will expose its own collection of streams, which will be
mapped to streams from the main and fallback source automatically.
This mapping can be changed via the map-streams signal.
The amount of streams being exposed by fallbacksrc is dictated by the
main source.
CustomSource has been updated to also support multi-stream scenarios,
both for stream-aware elements and for simple bins without such
functionality.
Co-authored-by: Sebastian Dröge <sebastian@centricular.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1832>
By default the transcriber will attempt to join punctuation with the
preceding word, expose a property to control that.
As speechmatics sometimes outputs punctuation for a sentence in the
next transcript, it will sometimes arrive too late for joining. In
order to work around this behavior, a lower max-delay is used by
default, that may not always be desirable, especially if low latency is
a concern.
Expose a property to disable the hack.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1909>
As the application expects to have the bin buffer the audio stream
internally and output it again unchanged, and transcribers might
expect a set number of channels, we need to expose a property to
let the user control how to downmix the audio stream teed through
the transcriber.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1969>
With standard voices, AWS polly supports passing a max-duration
attribute.
When the element gets raw text passed in, it can wrap it as SSML and set
the max duration attribute, this to make sure synthesized speech
doesn't overlap.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1930>
This commit adds a new "synthesis-languages" property. Users can set it
to define a map of languages (typically translations) that should then
be routed through a "synthesis" bin, with its description specifiable
as the value of the map.
The output of this bin is then exposed as a new pad on the top-level
bin.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1930>