2022-09-23 10:20:11 +00:00
2022-09-25 15:29:20 +00:00
# Whispering
2022-09-23 10:20:11 +00:00
2022-09-25 15:25:44 +00:00
[![MIT License ](https://img.shields.io/apm/l/atomic-design-ui.svg? )](LICENSE)
2022-10-09 13:12:32 +00:00
![Python Versions ](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue )
2022-09-25 15:25:44 +00:00
2022-09-25 15:29:20 +00:00
[![CI ](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/ci.yml)
[![CodeQL ](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)
[![Typos ](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg )](https://github.com/shirayu/whispering/actions/workflows/typos.yml)
2022-09-23 10:20:11 +00:00
2022-09-24 00:38:40 +00:00
Streaming transcriber with [whisper ](https://github.com/openai/whisper ).
2022-09-24 19:49:17 +00:00
Enough machine power is needed to transcribe in real time.
2022-09-23 10:20:11 +00:00
2022-09-24 15:46:37 +00:00
## Setup
2022-09-23 10:20:11 +00:00
```bash
2022-10-21 13:11:08 +00:00
pip install -U git+https://github.com/shirayu/whispering.git@v0.6.3
2022-09-23 13:19:53 +00:00
2022-09-25 15:25:44 +00:00
# If you use GPU, install proper torch and torchaudio
2022-10-03 13:56:11 +00:00
# Check https://pytorch.org/get-started/locally/
2022-09-24 03:38:37 +00:00
# Example : torch for CUDA 11.6
2022-09-25 15:14:20 +00:00
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
2022-09-24 15:46:37 +00:00
```
2022-10-09 15:53:09 +00:00
If you get ``OSError: PortAudio library not found`` in Linux, install "PortAudio".
```bash
sudo apt -y install portaudio19-dev
```
2022-09-24 15:46:37 +00:00
## Example of microphone
2022-09-24 02:02:40 +00:00
2022-09-24 15:46:37 +00:00
```bash
2022-09-24 00:26:37 +00:00
# Run in English
2022-09-25 15:29:20 +00:00
whispering --language en --model tiny
2022-09-23 10:20:11 +00:00
```
2022-09-24 00:38:40 +00:00
- ``--help`` shows full options
2022-10-15 04:36:30 +00:00
- ``--model`` sets the [model name ](https://github.com/openai/whisper#available-models-and-languages ) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
2022-10-07 15:03:03 +00:00
- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
2022-09-29 12:59:12 +00:00
- ``--no-progress`` disables the progress message
2022-09-29 18:06:41 +00:00
- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
2022-09-23 13:46:27 +00:00
- ``--debug`` outputs logs for debug
2022-10-15 05:48:08 +00:00
- ``--vad`` sets VAD (Voice Activity Detection) threshold. The default is ``0.5``. 0 disables VAD and forces whisper to analyze non-voice activity sound period
2022-10-03 14:27:54 +00:00
- ``--output`` sets output file (Default: Standard output)
2022-09-23 13:19:53 +00:00
2022-09-24 15:46:37 +00:00
### Parse interval
2022-10-15 03:56:08 +00:00
By default, whispering performs VAD for every 3.75 second.
This interval is determined by the value of ``-n`` and its default is ``20``.
When an interval is predicted as "silence", it will not be passed to whisper.
2022-10-15 05:48:08 +00:00
If you want to disable VAD, please make VAD threshold 0 by adding ``--vad 0``.
2022-10-02 14:01:52 +00:00
2022-10-15 03:56:08 +00:00
By default, Whisper does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.
2022-10-15 05:48:08 +00:00
However, if silence segments appear 16 times (the default value of ``--max_nospeech_skip``) after speech is detected, the analysis is performed.
2022-10-15 03:56:08 +00:00
2022-09-24 12:54:25 +00:00
## Example of web socket
2022-09-24 12:57:46 +00:00
⚠ **No security mechanism. Please make secure with your responsibility.**
2022-09-24 12:54:25 +00:00
2022-09-24 18:59:09 +00:00
Run with ``--host`` and ``--port``.
2022-09-24 15:48:27 +00:00
2022-09-24 15:55:37 +00:00
### Host
2022-09-24 12:54:25 +00:00
```bash
2022-09-25 15:29:20 +00:00
whispering --language en --model tiny --host 0.0.0.0 --port 8000
2022-09-24 12:54:25 +00:00
```
2022-09-24 15:48:27 +00:00
### Client
2022-09-24 12:54:25 +00:00
```bash
2022-09-29 13:16:51 +00:00
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
2022-09-24 12:54:25 +00:00
```
2022-10-15 05:48:08 +00:00
You can set ``-n`` and other options.
2022-09-24 18:59:09 +00:00
2022-10-17 13:23:50 +00:00
## For Developers
1. Install [Python ](https://www.python.org/ ) and [Node.js ](https://nodejs.org/ )
2. [Install poetry ](https://python-poetry.org/docs/ ) to use ``poetry`` command
3. Clone and install libraries
```console
# Clone
git clone https://github.com/shirayu/whispering.git
# With poetry
poetry config virtualenvs.in-project true
poetry install --all-extras
poetry run pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
# With npm
npm install
```
4. Run test and check that no errors occur
```bash
poetry run make -j4
```
5. Make fancy updates
2022-10-21 13:08:43 +00:00
6. Make style
```bash
poetry run make style
```
7. Run test again and check that no errors occur
2022-10-17 13:23:50 +00:00
```bash
poetry run make -j4
```
2022-10-21 13:08:43 +00:00
8. Check typos by using [typos ](https://github.com/crate-ci/typos ). Just run ``typos`` command in the root directory.
2022-10-17 13:23:50 +00:00
```bash
typos
```
2022-10-21 13:08:43 +00:00
9. Send Pull requests!
2022-10-17 13:23:50 +00:00
2022-09-23 10:20:11 +00:00
## License
- [MIT License ](LICENSE )
- Some codes are ported from the original whisper. Its license is also [MIT License ](LICENSE.whisper )