Prior to this commit, we were sending over words concatenated together
with no separators, for instance "Idon'twanttobeanemperor".
The translation service seems clever enough to translate the contents
anyway, but there is no reason to make its task harder than necessary,
and it didn't re-add separators when the target language was the same as
the source language, which resulted in less than ideal output.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1171>
* A queue dedicated to transcript items not intended for translation.
* A queue dedicated to transcript items intended for translation. The items are
enqueued after a separator is detected or translate-lookahead was reached.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
This commit adds an optional experimental translation tokenization feature.
It can be activated using the `translation_src_%u` pads property
`tokenization-method`. For the moment, the feature is deactivated by default.
The Translate ws accepts '<span></span>' tags in the input and adds matching
tags in the output. When an 'id' is also provided as an attribute of the
'span', the matching output tag also uses this 'id'.
In the context of close captions, the 'id's are of little use. However, we can
take advantage of the spans in the output to identify translation chunks, which
more or less reflect the rythm of the input transcript.
This commit adds simples spans (no 'id') to the input Transcript Items and
parses the resulting spans in the translated output, assigning the timestamps
and durations sequentially from the input Transcript Items. Edge cases such as
absence of spans, nested spans were observed and are handled here. Similarly,
mismatches between the number of input and output items are taken care of by
some sort of reconcialiation.
Note that this is still experimental and requires further testings.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This commit adds an optional transcript translation feature implemented as
request src Pads.
When requesting a src Pad, the user can specify the translation language code
using Pad properties 'language-code'.
The following properties are defined on the Element:
- 'transcribe-latency': formerly 'latency', defines the expected latency for
the Transcribe webservice.
- 'translate-latency': defines the expected latency for the Translate
webservice.
- 'transcript-lookahead': maximum transcript duration to send to translation
when a transcript is hitting its deadline and no punctuation was found.
When the input and output languages are the same, only the 'transcribe-latency'
is used for the Pad. Otherwise, the resulting latency is the addition of
'transcribe-latency' and 'translate-latency'.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This helps gather together the details related to the `TranscriberLoop`.
One difference with previous implementation is that the ws `Client` is
build each time the loop is started instead of being reused. With the new
approach, we don't keep the connection open after EOS and we should be
more resistant in case of a connection failure.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>
Instead of sending transcription events to the src pad loop, this commit
enqueues the transcribed buffers immediately in the ws loop, then notifies
the src pad loop. The src pad loop is only in charge of dequeuing the buffers.
This should help with upcoming evolutions.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>
for uploaded object default content-type is set to binary/octet-stream,
which is correct.
metadata cannot be used to set content-type and content-disposition as
setting metadata add a prefix x-amz-meta to key
e.g. setting metadate "content-type=video/mp4" actually set value as
x-amz-meta-content-type. So these has to be seaprate property.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1085>
Commit ad3f1cf fixed the name of hlssink child element to be the same
for hlssink2 and hlssink3. However, we rely on element name to return
boolean in case of hlssink3 or None in case of hlssink2 as the return
value of the delete-fragment closure.
Fix this by using the factory name instead of the element name.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1076>
Commit 24b7cfc8 applied changes related to nullability as declared
by gir. One consequence was that some functions signature ended up
requiring users to pass `Some(val)` when they could use `val`
before.
This commit applies changes on `gstreamer-rs` which, will honoring
the nullability stil allow users to pass `val` for the few affected
functions.
This commit also fixes the signature for `Element::request_new_pad`
which was updated upstream.
When call_timeout is triggered, request will fail
irrespective of the retry setting. call_timeout define
max time request can take along with retry.
It can be solved by either setting call_timeout to
retry * call_attempt_timeout or not setting the call_timeout.
As per thread call_attempt and rety setting is enough.
https://github.com/awslabs/aws-sdk-rust/issues/558
A regression was introduced during the migration to AWS SDK. One used
to be able to provide credentials in multiple ways with the earlier
Rusoto ChainProvider (config file / environment variables). Now one
has to explicitly set the properties.
Use the DefaultCredentialsChain from AWS SDK to restore the previous
functionality.
See
https://docs.rs/aws-config/0.46.0/aws_config/default_provider/credentials/struct.DefaultCredentialsChain.html.
Allow specifying an endpoint to be used for S3 requests. This makes
it possible to use integrations providing object storage based on S3
API like MinIO.
When the endpoint-uri property is specified, the endpoint resolver to
use will be overridden when making S3 requests.