whispering/README.md
2022-10-02 22:55:13 +09:00

2.8 KiB

Whispering

MIT License Python Versions

CI CodeQL Typos

Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.

Setup

pip install -U git+https://github.com/shirayu/whispering.git

# If you use GPU, install proper torch and torchaudio
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Example of microphone

# Run in English
whispering --language en --model tiny
  • --help shows full options
  • --model set the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.
  • --language sets the language to transcribe. The list of languages are shown with whispering -h
  • --no-progress disables the progress message
  • -t sets temperatures to decode. You can set several like -t 0.0 -t 0.1 -t 0.5, but too many temperatures exhaust decoding time
  • --debug outputs logs for debug
  • --no-vad disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period

Parse interval

If you want quick response, set small -n and add --allow-padding. However, this may sacrifice the accuracy.

whispering --language en --model tiny -n 20 --allow-padding

Without --allow-padding, whispering just performs VAD for the period, and when it is predicted as "silence", it will not be passed to whisper.

Example of web socket

No security mechanism. Please make secure with your responsibility.

Run with --host and --port.

Host

whispering --language en --model tiny --host 0.0.0.0 --port 8000

Client

whispering --host ADDRESS_OF_HOST --port 8000 --mode client

You can set -n, --allow-padding and other options.

Tips

PortAudio Error

If you get OSError: PortAudio library not found: Install portaudio

# Ubuntu
sudo apt-get install portaudio19-dev

License