Streaming transcriber with whisper
Go to file
Yuta Hayashibe d6d94329e6 v0.5.1
2022-10-12 11:22:53 +09:00
.github Fix install-poetry in CI to update snok/install-poetry@v1.3.3 2022-10-12 11:12:06 +09:00
scripts Fix for Windows 2022-10-10 00:42:06 +09:00
tests Add test 2022-10-10 00:01:31 +09:00
whispering Add test 2022-10-10 00:01:31 +09:00
.gitignore Initial commit 2022-09-23 19:20:29 +09:00
.markdownlint.json Initial commit 2022-09-23 19:20:29 +09:00
LICENSE Initial commit 2022-09-23 19:20:29 +09:00
LICENSE.whisper Initial commit 2022-09-23 19:20:29 +09:00
Makefile Fix for Windows 2022-10-10 00:42:06 +09:00
package-lock.json Bump pyright from 1.1.273 to 1.1.274 2022-10-10 17:42:29 +09:00
package.json Bump pyright from 1.1.273 to 1.1.274 2022-10-10 17:42:29 +09:00
poetry.lock Updated libraries and set poetry-core version requirement explicitly 2022-10-12 11:01:27 +09:00
pyproject.toml v0.5.1 2022-10-12 11:22:53 +09:00
README.md v0.5.1 2022-10-12 11:22:53 +09:00
setup.cfg Fix setting for isort 2022-09-23 20:05:33 +09:00

Whispering

MIT License Python Versions

CI CodeQL Typos

Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.

Setup

pip install -U git+https://github.com/shirayu/whispering.git@v0.5.1

# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

If you get OSError: PortAudio library not found in Linux, install "PortAudio".

sudo apt -y install portaudio19-dev

Example of microphone

# Run in English
whispering --language en --model tiny
  • --help shows full options
  • --model set the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.
  • --language sets the language to transcribe. The list of languages are shown with whispering -h
  • --no-progress disables the progress message
  • -t sets temperatures to decode. You can set several like -t 0.0 -t 0.1 -t 0.5, but too many temperatures exhaust decoding time
  • --debug outputs logs for debug
  • --no-vad disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period
  • --output sets output file (Default: Standard output)

Parse interval

Without --allow-padding, whispering just performs VAD for the period, and when it is predicted as "silence", it will not be passed to whisper. If you want to change the VAD interval, change -n.

If you want quick response, set small -n and add --allow-padding. However, this may sacrifice the accuracy.

whispering --language en --model tiny -n 20 --allow-padding

Example of web socket

No security mechanism. Please make secure with your responsibility.

Run with --host and --port.

Host

whispering --language en --model tiny --host 0.0.0.0 --port 8000

Client

whispering --host ADDRESS_OF_HOST --port 8000 --mode client

You can set -n, --allow-padding and other options.

License