Streaming transcriber with whisper
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Go to file
Yuta Hayashibe 184056ad46 Archived 5 months ago
.github Bump actions/stale from 7.0.0 to 8.0.0 6 months ago
scripts Fix for Windows 12 months ago
tests Fix bugs 12 months ago
whispering Updated messages 9 months ago
.gitignore chore: edit .gitignore 12 months ago
.markdownlint.json Initial commit 1 year ago
LICENSE Initial commit 1 year ago
LICENSE.whisper Initial commit 1 year ago
Makefile Add make style (Resolve #42) 11 months ago Archived 5 months ago
package-lock.json Bump pyright from 1.1.296 to 1.1.301 6 months ago
package.json Bump pyright from 1.1.296 to 1.1.301 6 months ago
poetry.lock Updated libraries 6 months ago
pyproject.toml Fix dependencies 7 months ago
setup.cfg Fix setting for isort 1 year ago


MIT License Python Versions

CI CodeQL Typos

Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.


This repository has been archived. There are some alternatives.


pip install -U git+

# If you use GPU, install proper torch and torchaudio
# Check
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url

If you get OSError: PortAudio library not found in Linux, install "PortAudio".

sudo apt -y install portaudio19-dev

Example of microphone

# Run in English
#  By the default, it needs to wait at least 30 seconds
whispering --language en --model tiny
  • --help shows full options
  • --model sets the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.
  • --language sets the language to transcribe. The list of languages are shown with whispering -h
  • --no-progress disables the progress message
  • -t sets temperatures to decode. You can set several like -t 0.0 -t 0.1 -t 0.5, but too many temperatures exhaust decoding time
  • --debug outputs logs for debug
  • --vad sets VAD (Voice Activity Detection) threshold. The default is 0.5. 0 disables VAD and forces whisper to analyze non-voice activity sound period. Try --vad 0 if VAD prevents transcription.
  • --output sets output file (Default: Standard output)
  • --frame: the number of minimum frames of mel spectrogram input for Whisper (default: 3000. i.e. 30 seconds)

Parse interval

By default, whispering performs VAD for every 3.75 second. This interval is determined by the value of -n and its default is 20. When an interval is predicted as "silence", it will not be passed to whisper. If you want to disable VAD, please make VAD threshold 0 by adding --vad 0.

By default, whispering does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds. This is because the original Whisper assumes that the inputs are 30 seconds segments. However, if silence segments appear 16 times (the default value of --max_nospeech_skip) after speech is detected, the analysis is performed. You can make the length of segments smaller with --frame option (default: 3000), but it sacrifices accuracy because this is not expected input for Whisper.

Example of web socket

No security mechanism. Please make secure with your responsibility.

Run with --host and --port.


whispering --language en --model tiny --host --port 8000


whispering --host ADDRESS_OF_HOST --port 8000 --mode client

You can set -n and other options.

For Developers

  1. Install Python and Node.js

  2. Install poetry to use poetry command

  3. Clone and install libraries

    # Clone
    git clone
    # With poetry
    poetry config true
    poetry install --all-extras
    poetry run pip install -U torch torchaudio --extra-index-url
    # With npm
    npm install
  4. Run test and check that no errors occur

    poetry run make -j4
  5. Make fancy updates

  6. Make style

    poetry run make style
  7. Run test again and check that no errors occur

    poetry run make -j4
  8. Check typos by using typos. Just run typos command in the root directory.

  9. Send Pull requests!