AWS offers the option of creating "vocabularies", lists of words
that are likely to be encountered. Those can be created through
the AWS console, and are given a name. That name can then be
specified when starting a transcription job.
The element expects an array of "commands", as GstStructures,
in the form:
operation, pattern=<pattern>, ...
The only operation implemented for now is replace-all, eg:
replace-all, pattern=foo, replacement=bar
Other operations can be implemented if useful in the future,
eg. "match" could post a message to the bus when the pattern
is encountered.
The main use case for this is automatic speech recognition,
as implemented by eg awstranscribe as users may want to replace
swear words with tamer language.
Commands are applied in order.
The interface is usable through the CLI with the usual escaping
strategies, though trying to pass in actual regular expressions
through it is a bit tricky, as this introduces yet another
level of escaping.
This slightly amends the semantic of the property: prior to that
commit it represented the interval since the last accumulated buffer
after which the current line(s) had to be output even if incomplete.
After this commit, it represents the interval between "now" and the
first accumulated buffer, making it possible to report a useful
latency.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/474>
Up to now, tttocea608 supported text/utf8, and no interface to
control the positioning of closed captions apart from new lines
in the input text.
CEA 608 supports a larger set of features than that, such as
positioning CC precisely in its 32 x 15 grid, styling text,
switching from one mode to another, resetting the base row
in roll-up mode etc ..
A custom, JSON-based format is now supported by the element
(caps application/x-json, format=cea608), allowing users to
control those features in a pretty advanced manner.
A side effect of this is that the approach previously used
by the element to ensure frame-accurate CC display is now
untenable: where we knew before that an input buffer would
at most span 74 buffers and calculate a somewhat reasonable
latency based on that, this is no longer possible. Instead
we pick the approach most CC encoders seem to pick, and
accept a certain latency at display time: for example the
flipping of the back buffer to the display buffer for a
10-character text buffer will occur 7 frames after its
PTS. This has obvious benefits in terms of code complexity
and should generally be acceptable.
+ Removes a now irrelevant test, updates other tests
+ Extracts the Mode enum to the root of the crate, it will
be used by another element in a follow-up commit
In its standard mode, textwrap simply splits up text in chained
buffers into multiple lines / buffers, not keeping any state.
When accumulate-time is specified, multiple input buffers will be
wrapped together, outputting one-line buffers of text once a
sufficient width (specified by the columns property) is reached,
or the interval between two input buffers is greater than
accumulate-time.
This is useful to format the output of an element such as
awstranscribe, which outputs its transcription with one buffer
per word.
This new crate consists of two elements, jsongstenc and jsongstparse
Both these elements can deal with an ndjson based format, consisting
for now of two item types: "Buffer" and "Header"
eg:
{"Header":{"format":"foobar"}}
{"Buffer":{"pts":0,"duration":43,"data":{"foo":"bar"}}}
jsongstparse will interpret this by first sending caps
application/x-json, format=foobar, then a buffer containing
{"foo":"bar"}, timestamped as required.
Elements further downstream can then interpret the data further.
jsongstenc will simply perform the reverse operation.
style preambles look like:
|P|0|0|1|C|0|ROW| |P|1|N|0|STYLE|U|
and column preambles look like:
|P|0|0|1|C|0|ROW| |P|1|N|1|CURSR|U|
Both preambles go through eia608_row_pramble(), the value they
pass as the x parameter is supposed to hold 4 bits, either
0|STYLE
or 1|CURSR
This value then gets bit-shifted by 1 and or'd in the second byte.
The value is also and' with 0x1E to ensure it can't leak into
the upper bits.
The previous code resulted in x being a 5-bit value, 0x10 (0b10000).
This resulted in outputting a style preamble, as 0x10 << 1 & 0x1E
is 0b00000. When the indent was 0 (the usual case), this went
undetected, but with any other value it resulted in no indent being
applied, but the text getting colored or italicized.
This patch fixes x to have the correct value of 0x8 | indent.
Setting DTS on raw video buffers doesn't make sense and it's even wrong
in case of compressed video stream because PTS might be able to
go back when B frames are placed, but DTS is expected to be monotonically
increased.
We now have to run 'cbuild' and 'ctest' on each plugin individually.
Replace plugins_rep key by the source path so we can easily discard the
excluded plugins.