2022-09-23 10:20:11 +00:00
2022-09-25 15:29:20 +00:00
# Whispering
2022-09-23 10:20:11 +00:00
2022-09-25 15:25:44 +00:00
[![MIT License ](https://img.shields.io/apm/l/atomic-design-ui.svg? )](LICENSE)
2022-10-09 13:12:32 +00:00
![Python Versions ](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue )
2022-09-25 15:25:44 +00:00
2022-09-25 15:29:20 +00:00
[![CI ](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/ci.yml)
[![CodeQL ](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)
[![Typos ](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/typos.yml)
2022-09-23 10:20:11 +00:00
2022-09-24 00:38:40 +00:00
Streaming transcriber with [whisper ](https://github.com/openai/whisper ).
2022-09-24 19:49:17 +00:00
Enough machine power is needed to transcribe in real time.
2022-09-23 10:20:11 +00:00
2022-09-24 15:46:37 +00:00
## Setup
2022-09-23 10:20:11 +00:00
```bash
2022-10-12 02:22:53 +00:00
pip install -U git+https://github.com/shirayu/whispering.git@v0.5.1
2022-09-23 13:19:53 +00:00
2022-09-25 15:25:44 +00:00
# If you use GPU, install proper torch and torchaudio
2022-10-03 13:56:11 +00:00
# Check https://pytorch.org/get-started/locally/
2022-09-24 03:38:37 +00:00
# Example : torch for CUDA 11.6
2022-09-25 15:14:20 +00:00
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
2022-09-24 15:46:37 +00:00
```
2022-10-09 15:53:09 +00:00
If you get ``OSError: PortAudio library not found`` in Linux, install "PortAudio".
```bash
sudo apt -y install portaudio19-dev
```
2022-09-24 15:46:37 +00:00
## Example of microphone
2022-09-24 02:02:40 +00:00
2022-09-24 15:46:37 +00:00
```bash
2022-09-24 00:26:37 +00:00
# Run in English
2022-09-25 15:29:20 +00:00
whispering --language en --model tiny
2022-09-23 10:20:11 +00:00
```
2022-09-24 00:38:40 +00:00
- ``--help`` shows full options
2022-10-15 04:36:30 +00:00
- ``--model`` sets the [model name ](https://github.com/openai/whisper#available-models-and-languages ) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
2022-10-07 15:03:03 +00:00
- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
2022-09-29 12:59:12 +00:00
- ``--no-progress`` disables the progress message
2022-09-29 18:06:41 +00:00
- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
2022-09-23 13:46:27 +00:00
- ``--debug`` outputs logs for debug
2022-10-15 04:36:30 +00:00
- ``--vad`` sets VAD (Voice Activity Detection) threshold. 0 disables VAD and forces whisper to analyze non-voice activity sound period
2022-10-03 14:27:54 +00:00
- ``--output`` sets output file (Default: Standard output)
2022-09-23 13:19:53 +00:00
2022-09-24 15:46:37 +00:00
### Parse interval
2022-10-15 03:56:08 +00:00
By default, whispering performs VAD for every 3.75 second.
This interval is determined by the value of ``-n`` and its default is ``20``.
When an interval is predicted as "silence", it will not be passed to whisper.
If you want to disable VAD, please use ``--no-vad`` option.
2022-10-02 14:01:52 +00:00
2022-10-15 03:56:08 +00:00
By default, Whisper does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.
This is because Whisper is trained to make predictions for 30-second intervals.
Nevertheless, if you want to force Whisper to perform analysis even if a segment is less than 30 seconds, please use ``--allow-padding`` option like this.
2022-09-24 15:46:37 +00:00
```bash
2022-09-25 15:29:20 +00:00
whispering --language en --model tiny -n 20 --allow-padding
2022-09-24 15:46:37 +00:00
```
2022-10-15 03:56:08 +00:00
This forces Whisper to analyze every 3.75 seconds speech segment.
Using ``--allow-padding`` may sacrifice the accuracy, while you can get quick response.
The smaller value of ``-n`` with ``--allow-padding`` is, the worse the accuracy becomes.
2022-09-24 12:54:25 +00:00
## Example of web socket
2022-09-24 12:57:46 +00:00
⚠ **No security mechanism. Please make secure with your responsibility.**
2022-09-24 12:54:25 +00:00
2022-09-24 18:59:09 +00:00
Run with ``--host`` and ``--port``.
2022-09-24 15:48:27 +00:00
2022-09-24 15:55:37 +00:00
### Host
2022-09-24 12:54:25 +00:00
```bash
2022-09-25 15:29:20 +00:00
whispering --language en --model tiny --host 0.0.0.0 --port 8000
2022-09-24 12:54:25 +00:00
```
2022-09-24 15:48:27 +00:00
### Client
2022-09-24 12:54:25 +00:00
```bash
2022-09-29 13:16:51 +00:00
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
2022-09-24 12:54:25 +00:00
```
2022-10-02 13:16:16 +00:00
You can set ``-n``, ``--allow-padding`` and other options.
2022-09-24 18:59:09 +00:00
2022-09-23 10:20:11 +00:00
## License
- [MIT License ](LICENSE )
- Some codes are ported from the original whisper. Its license is also [MIT License ](LICENSE.whisper )