GStreamer plugin for speech to text using the Vosk Toolkit.

Find a file

Rafael Caricio 4e30659807 Add min-confidence property		2022-04-01 13:24:19 +02:00
src	Add min-confidence property	2022-04-01 13:24:19 +02:00
.gitignore	Initial commit	2022-03-27 11:31:47 +02:00
build.rs	Initial commit	2022-03-27 11:31:47 +02:00
Cargo.lock	Initial commit	2022-03-27 11:31:47 +02:00
Cargo.toml	Add min-confidence property	2022-04-01 13:24:19 +02:00
README.md	Add min-confidence property	2022-04-01 13:24:19 +02:00

README.md

Vosk Speech Recognition GStreamer Plugin

Transcription of speech using Vosk Toolkit. Can be used to generate subtitles for movies, live streams, lectures and interviews.

Vosk is an offline open source speech recognition toolkit. It enables speech recognition for 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi. More to come.

https://github.com/alphacep/vosk-api

This GStreamer plugin was inspired by the work of @MathieuDuponchelle in the AwsTranscriber element.

Build

Compiling this project will provide a shared library that can be used by your local GStreamer installation.

cargo build --release

The compiled shared library ./target/release/libgstvosk.dylib must be made loadable to GStreamer. One possible solution is to use the argument --gst-plugin-path= pointing to the location where the library file is every time you run gst-launch-1.0 command line tool.

Example Usage

This plugin connects via websockets protocol to the Vosk Server. The easiest way to run the Vosk server is using Docker. You can run the server locally using this command:

docker run --rm --name vosk-server -d -p 2700:2700 alphacep/kaldi-en:latest

Running the recognition server as a separated process comes with the additional benefit that you don't need to install any special software. Plus the voice recognition work load is off your GStreamer pipeline process.

This example will just print out the raw text buffers that are published out by the Vosk transcriber:

gst-launch-1.0 \
  vosk_transcriber name=tc ! fakesink sync=true dump=true \
  uridecodebin uri=https://studio.blender.org/download-source/d1/d1f3b354a8f741c6afabf305489fa510/d1f3b354a8f741c6afabf305489fa510-1080p.mp4 ! audioconvert ! tc.