gst-plugins-rs/docs
François Laignel 299e25ab3c net/aws/transcriber: translate: optional experimental translation tokenization
This commit adds an optional experimental translation tokenization feature.
It can be activated using the `translation_src_%u` pads property
`tokenization-method`. For the moment, the feature is deactivated by default.

The Translate ws accepts '<span></span>' tags in the input and adds matching
tags in the output. When an 'id' is also provided as an attribute of the
'span', the matching output tag also uses this 'id'.

In the context of close captions, the 'id's are of little use. However, we can
take advantage of the spans in the output to identify translation chunks, which
more or less reflect the rythm of the input transcript.

This commit adds simples spans (no 'id') to the input Transcript Items and
parses the resulting spans in the translated output, assigning the timestamps
and durations sequentially from the input Transcript Items. Edge cases such as
absence of spans, nested spans were observed and are handled here. Similarly,
mismatches between the number of input and output items are taken care of by
some sort of reconcialiation.

Note that this is still experimental and requires further testings.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
2023-03-14 13:48:32 +00:00
..
plugins net/aws/transcriber: translate: optional experimental translation tokenization 2023-03-14 13:48:32 +00:00
meson.build Add a webrtcsrc element 2023-02-28 20:50:15 -03:00