Commit graph

8 commits

Author SHA1 Message Date
François Laignel a870d60621 aws: improve error message logs
The `Display` and `Debug` trait for the AWS error messages are not very useful.

- `Display` only prints the high level error, e.g.: "service error".
- `Debug` prints all the fields in the error stack, resulting in hard to read
  messages with redudant or unnecessary information. E.g.:

> ServiceError(ServiceError { source: BadRequestException(BadRequestException {
> message: Some("1 validation error detected: Value 'test' at 'languageCode'
> failed to satisfy constraint: Member must satisfy enum value set: [ar-AE,
> zh-HK, en-US, ar-SA, zh-CN, fi-FI, pl-PL, no-NO, nl-NL, pt-PT, es-ES, th-TH,
> de-DE, it-IT, fr-FR, ko-KR, hi-IN, en-AU, pt-BR, sv-SE, ja-JP, ca-ES, es-US,
> fr-CA, en-GB]"), meta: ErrorMetadata { code: Some("BadRequestException"),
> message: Some("1 validation error detected: Value 'test' at 'languageCode'
> failed to satisfy constraint: Member must satisfy enum value set: [ar-AE,
> zh-HK, en-US, ar-SA, zh-CN, fi-FI, pl-PL, no-NO, nl-NL, pt-PT, es-ES, th-TH,
> de-DE, it-IT, fr-FR, ko-KR, hi-IN, en-AU, pt-BR, sv-SE, ja-JP, ca-ES, es-US,
> fr-CA, en-GB]"), extras: Some({"aws_request_id": "1b8bbafd-5b71-4ba5-8676-28432381e6a9"}) } }),
> raw: Response { status: StatusCode(400), headers: Headers { headers:
> {"x-amzn-requestid": HeaderValue { _private: H0("1b8bbafd-5b71-4ba5-8676-28432381e6a9") },
> "x-amzn-errortype": HeaderValue { _private:
> H0("BadRequestException:http://internal.amazon.com/coral/com.amazonaws.transcribe.streaming/") },
> "date": HeaderValue { _private: H0("Tue, 26 Mar 2024 17:41:31 GMT") },
> "content-type": HeaderValue { _private: H0("application/x-amz-json-1.1") },
> "content-length": HeaderValue { _private: H0("315") }} }, body: SdkBody {
> inner: Once(Some(b"{\"Message\":\"1 validation error detected: Value 'test'
> at 'languageCode' failed to satisfy constraint: Member must satisfy enum value
> set: [ar-AE, zh-HK, en-US, ar-SA, zh-CN, fi-FI, pl-PL, no-NO, nl-NL, pt-PT,
> es-ES, th-TH, de-DE, it-IT, fr-FR, ko-KR, hi-IN, en-AU, pt-BR, sv-SE, ja-JP,
> ca-ES, es-US, fr-CA, en-GB]\"}")), retryable: true }, extensions: Extensions {
> extensions_02x: Extensions, extensions_1x: Extensions } } })

This commit adopts the most informative and concise solution I could come up
with to log AWS errors. With the above error case, this results in:

> service error: Error { code: "BadRequestException", message: "1 validation
> error detected: Value 'test' at 'languageCode' failed to satisfy constraint:
> Member must satisfy enum value set: [ar-AE, zh-HK, en-US, ar-SA, zh-CN, fi-FI,
> pl-PL, no-NO, nl-NL, pt-PT, es-ES, th-TH, de-DE, it-IT, fr-FR, ko-KR, hi-IN,
> en-AU, pt-BR, sv-SE, ja-JP, ca-ES, es-US, fr-CA, en-GB]",
> aws_request_id: "a40a32a8-7b0b-4228-a348-f8502087a9f0" }

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1521>
2024-03-26 20:05:32 +01:00
Mathieu Duponchelle 5371eb52ad Port to AWS SDK 0.57/0.35
Co-authored-by: Sebastian Dröge <sebastian@centricular.com>
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1379>
2023-11-03 15:13:45 +00:00
Mathieu Duponchelle f366c20869 awstranscriber: fix what we send over for translations
Prior to this commit, we were sending over words concatenated together
with no separators, for instance "Idon'twanttobeanemperor".

The translation service seems clever enough to translate the contents
anyway, but there is no reason to make its task harder than necessary,
and it didn't re-add separators when the target language was the same as
the source language, which resulted in less than ideal output.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1171>
2023-04-10 20:47:12 +00:00
François Laignel 2b32d00589 net/aws/transcriber: use two queues for sending transcript items
* A queue dedicated to transcript items not intended for translation.
* A queue dedicated to transcript items intended for translation. The items are
  enqueued after a separator is detected or translate-lookahead was reached.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
2023-03-16 20:29:31 +01:00
François Laignel 162db2f3b9 net/aws/transcriber: fix translate lookahead
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
2023-03-16 12:39:15 +01:00
François Laignel d5d6a4daf9 net/aws/transcriber: rename prop transcript-lookahead & TranslationSrcPad
... as translate-lookahead and TranslateSrcPad.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
2023-03-16 12:37:31 +01:00
François Laignel 299e25ab3c net/aws/transcriber: translate: optional experimental translation tokenization
This commit adds an optional experimental translation tokenization feature.
It can be activated using the `translation_src_%u` pads property
`tokenization-method`. For the moment, the feature is deactivated by default.

The Translate ws accepts '<span></span>' tags in the input and adds matching
tags in the output. When an 'id' is also provided as an attribute of the
'span', the matching output tag also uses this 'id'.

In the context of close captions, the 'id's are of little use. However, we can
take advantage of the spans in the output to identify translation chunks, which
more or less reflect the rythm of the input transcript.

This commit adds simples spans (no 'id') to the input Transcript Items and
parses the resulting spans in the translated output, assigning the timestamps
and durations sequentially from the input Transcript Items. Edge cases such as
absence of spans, nested spans were observed and are handled here. Similarly,
mismatches between the number of input and output items are taken care of by
some sort of reconcialiation.

Note that this is still experimental and requires further testings.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
2023-03-14 13:48:32 +00:00
François Laignel 743e97738f net/aws/transcriber: add translation request src pads
This commit adds an optional transcript translation feature implemented as
request src Pads.

When requesting a src Pad, the user can specify the translation language code
using Pad properties 'language-code'.

The following properties are defined on the Element:

- 'transcribe-latency': formerly 'latency', defines the expected latency for
  the Transcribe webservice.
- 'translate-latency': defines the expected latency for the Translate
  webservice.
- 'transcript-lookahead': maximum transcript duration to send to translation
  when a transcript is hitting its deadline and no punctuation was found.

When the input and output languages are the same, only the 'transcribe-latency'
is used for the Pad. Otherwise, the resulting latency is the addition of
'transcribe-latency' and 'translate-latency'.

Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
2023-03-14 13:48:32 +00:00