mirror of https://github.com/shirayu/whispering.git synced 2024-11-22 00:41:02 +00:00

Streaming transcriber with whisper

Find a file

Yuta Hayashibe c435d50681 Make the default value of -n 20		2022-10-02 22:55:13 +09:00
.github	Updated GitHub Action workflow	2022-09-29 15:33:04 +09:00
whispering	Make the default value of -n 20	2022-10-02 22:55:13 +09:00
.gitignore	Initial commit	2022-09-23 19:20:29 +09:00
.markdownlint.json	Initial commit	2022-09-23 19:20:29 +09:00
LICENSE	Initial commit	2022-09-23 19:20:29 +09:00
LICENSE.whisper	Initial commit	2022-09-23 19:20:29 +09:00
Makefile	Renamed whisper_streaming to whispering	2022-09-26 00:32:54 +09:00
package-lock.json	Initial commit	2022-09-23 19:20:29 +09:00
package.json	Initial commit	2022-09-23 19:20:29 +09:00
poetry.lock	Merge remote-tracking branch 'origin/master' into vad	2022-10-02 19:40:56 +09:00
pyproject.toml	Merge remote-tracking branch 'origin/master' into vad	2022-10-02 19:40:56 +09:00
README.md	Make the default value of -n 20	2022-10-02 22:55:13 +09:00
setup.cfg	Fix setting for isort	2022-09-23 20:05:33 +09:00

README.md

Whispering

Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.

Setup

pip install -U git+https://github.com/shirayu/whispering.git

# If you use GPU, install proper torch and torchaudio
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Example of microphone

# Run in English
whispering --language en --model tiny

--help shows full options
--model set the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.
--language sets the language to transcribe. The list of languages are shown with whispering -h
--no-progress disables the progress message
-t sets temperatures to decode. You can set several like -t 0.0 -t 0.1 -t 0.5, but too many temperatures exhaust decoding time
--debug outputs logs for debug
--no-vad disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period

Parse interval

If you want quick response, set small -n and add --allow-padding. However, this may sacrifice the accuracy.

whispering --language en --model tiny -n 20 --allow-padding

Without --allow-padding, whispering just performs VAD for the period, and when it is predicted as "silence", it will not be passed to whisper.

Example of web socket

⚠ No security mechanism. Please make secure with your responsibility.

Run with --host and --port.

Host

whispering --language en --model tiny --host 0.0.0.0 --port 8000

Client

whispering --host ADDRESS_OF_HOST --port 8000 --mode client

You can set -n, --allow-padding and other options.

Tips

PortAudio Error

If you get OSError: PortAudio library not found: Install portaudio

# Ubuntu
sudo apt-get install portaudio19-dev

License

MIT License
Some codes are ported from the original whisper. Its license is also MIT License