whispering/README.md


# Whispering

[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](LICENSE)
![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)

[![CI](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/ci.yml)
[![CodeQL](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)
[![Typos](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/typos.yml)

Streaming transcriber with [whisper](https://github.com/openai/whisper).
Enough machine power is needed to transcribe in real time.

## Setup

```bash
pip install -U git+https://github.com/shirayu/whispering.git@v0.6.6

# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
```

If you get ``OSError: PortAudio library not found`` in Linux, install "PortAudio".

```bash
sudo apt -y install portaudio19-dev
```

## Example of microphone

```bash
# Run in English
#  By the default, it needs to wait at least 30 seconds
whispering --language en --model tiny
```

- ``--help`` shows full options
- ``--model`` sets the [model name](https://github.com/openai/whisper#available-models-and-languages) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
- ``--no-progress`` disables the progress message
- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
- ``--debug`` outputs logs for debug
- ``--vad`` sets VAD (Voice Activity Detection) threshold. The default is ``0.5``. ``0`` disables VAD and forces whisper to analyze non-voice activity sound period. Try ``--vad 0`` if VAD prevents transcription.
- ``--output`` sets output file (Default: Standard output)
- ``--frame``: the number of minimum frames of mel spectrogram input for Whisper (default: ``3000``. i.e. 30 seconds)

### Parse interval

By default, whispering performs VAD for every 3.75 second.
This interval is determined by the value of ``-n`` and its default is ``20``.
When an interval is predicted as "silence", it will not be passed to whisper.
If you want to disable VAD, please make VAD threshold 0 by adding ``--vad 0``.

By default, whispering does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.
This is because the original Whisper assumes that the inputs are 30 seconds segments.
However, if silence segments appear 16 times (the default value of ``--max_nospeech_skip``) after speech is detected, the analysis is performed.
You can make the length of segments smaller with ``--frame`` option (default: 3000), but it sacrifices accuracy because this is not expected input for Whisper.

## Example of web socket

⚠  **No security mechanism. Please make secure with your responsibility.**

Run with ``--host`` and ``--port``.

### Host

```bash
whispering --language en --model tiny --host 0.0.0.0 --port 8000
```

### Client

```bash
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
```

You can set ``-n`` and other options.

## For Developers

1. Install [Python](https://www.python.org/) and [Node.js](https://nodejs.org/)
2. [Install poetry](https://python-poetry.org/docs/) to use ``poetry`` command
3. Clone and install libraries

    ```console
    # Clone
    git clone https://github.com/shirayu/whispering.git

    # With poetry
    poetry config virtualenvs.in-project true
    poetry install --all-extras
    poetry run pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

    # With npm
    npm install
    ```

4. Run test and check that no errors occur

    ```bash
    poetry run make -j4
    ```

5. Make fancy updates
6. Make style

    ```bash
    poetry run make style
    ```

7. Run test again and check that no errors occur

    ```bash
    poetry run make -j4
    ```

8. Check typos by using [typos](https://github.com/crate-ci/typos). Just run ``typos`` command in the root directory.

    ```bash
    typos
    ```

9. Send Pull requests!

## License

- [MIT License](LICENSE)
- Some codes are ported from the original whisper. Its license is also [MIT License](LICENSE.whisper)
Initial commit 2022-09-23 10:20:11 +00:00
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`# Whispering`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-25 15:25:44 +00:00			`[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](LICENSE)`
Changed a badge 2022-10-09 13:12:32 +00:00			`![Python Versions](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)`
Updated README 2022-09-25 15:25:44 +00:00
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`[![CI](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/ci.yml)`
			`[![CodeQL](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)`
			`[![Typos](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/typos.yml)`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-24 00:38:40 +00:00			`Streaming transcriber with [whisper](https://github.com/openai/whisper).`
Updated README 2022-09-24 19:49:17 +00:00			`Enough machine power is needed to transcribe in real time.`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-24 15:46:37 +00:00			`## Setup`
Initial commit 2022-09-23 10:20:11 +00:00
			```bash
v0.6.6 2023-01-06 16:19:07 +00:00			`pip install -U git+https://github.com/shirayu/whispering.git@v0.6.6`
Add description 2022-09-23 13:19:53 +00:00
Updated README 2022-09-25 15:25:44 +00:00			`# If you use GPU, install proper torch and torchaudio`
Add pytorch link 2022-10-03 13:56:11 +00:00			`# Check https://pytorch.org/get-started/locally/`
Fix instruction 2022-09-24 03:38:37 +00:00			`# Example : torch for CUDA 11.6`
Updated setup instruction 2022-09-25 15:14:20 +00:00			`pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116`
Updated README 2022-09-24 15:46:37 +00:00			```

Moved memo 2022-10-09 15:53:09 +00:00			If you get ``OSError: PortAudio library not found`` in Linux, install "PortAudio".

			```bash
			`sudo apt -y install portaudio19-dev`
			```

Updated README 2022-09-24 15:46:37 +00:00			`## Example of microphone`
Add a setup step 2022-09-24 02:02:40 +00:00
Updated README 2022-09-24 15:46:37 +00:00			```bash
Add --language description 2022-09-24 00:26:37 +00:00			`# Run in English`
Add 2022-11-07 11:06:34 +00:00			`# By the default, it needs to wait at least 30 seconds`
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`whispering --language en --model tiny`
Initial commit 2022-09-23 10:20:11 +00:00			```

Updated README 2022-09-24 00:38:40 +00:00			- ``--help`` shows full options
Fix typos 2022-10-15 04:36:30 +00:00			- ``--model`` sets the [model name](https://github.com/openai/whisper#available-models-and-languages) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
Remove multi language feature (Close #23) 2022-10-07 15:03:03 +00:00			- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
Add progress bar and --no-progress option 2022-09-29 12:59:12 +00:00			- ``--no-progress`` disables the progress message
Fix README 2022-09-29 18:06:41 +00:00			- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
Add message 2022-09-23 13:46:27 +00:00			- ``--debug`` outputs logs for debug
Add doc 2022-11-07 10:54:48 +00:00			- ``--vad`` sets VAD (Voice Activity Detection) threshold. The default is ``0.5``. ``0`` disables VAD and forces whisper to analyze non-voice activity sound period. Try ``--vad 0`` if VAD prevents transcription.
Add 2022-10-03 14:27:54 +00:00			- ``--output`` sets output file (Default: Standard output)
Add --frame option 2022-11-08 14:42:11 +00:00			- ``--frame``: the number of minimum frames of mel spectrogram input for Whisper (default: ``3000``. i.e. 30 seconds)
Add description 2022-09-23 13:19:53 +00:00
Updated README 2022-09-24 15:46:37 +00:00			`### Parse interval`

Updated document 2022-10-15 03:56:08 +00:00			`By default, whispering performs VAD for every 3.75 second.`
			This interval is determined by the value of ``-n`` and its default is ``20``.
			`When an interval is predicted as "silence", it will not be passed to whisper.`
Removed --allow-padding and add --max_nospeech_skip option (Resolve #13) 2022-10-15 05:48:08 +00:00			If you want to disable VAD, please make VAD threshold 0 by adding ``--vad 0``.
Add description 2022-10-02 14:01:52 +00:00
Add description 2022-11-08 14:19:37 +00:00			`By default, whispering does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.`
			`This is because the original Whisper assumes that the inputs are 30 seconds segments.`
Removed --allow-padding and add --max_nospeech_skip option (Resolve #13) 2022-10-15 05:48:08 +00:00			However, if silence segments appear 16 times (the default value of ``--max_nospeech_skip``) after speech is detected, the analysis is performed.
Add --frame option 2022-11-08 14:42:11 +00:00			You can make the length of segments smaller with ``--frame`` option (default: 3000), but it sacrifices accuracy because this is not expected input for Whisper.
Updated document 2022-10-15 03:56:08 +00:00
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			`## Example of web socket`

Updated 2022-09-24 12:57:46 +00:00			`⚠ No security mechanism. Please make secure with your responsibility.`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00
Updated README 2022-09-24 18:59:09 +00:00			Run with ``--host`` and ``--port``.
Updated 2022-09-24 15:48:27 +00:00
Updated 2022-09-24 15:55:37 +00:00			`### Host`

Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```bash
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`whispering --language en --model tiny --host 0.0.0.0 --port 8000`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```

Updated 2022-09-24 15:48:27 +00:00			`### Client`

Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```bash
Removed a needless option 2022-09-29 13:16:51 +00:00			`whispering --host ADDRESS_OF_HOST --port 8000 --mode client`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```

Removed --allow-padding and add --max_nospeech_skip option (Resolve #13) 2022-10-15 05:48:08 +00:00			You can set ``-n`` and other options.
Updated README 2022-09-24 18:59:09 +00:00
Add document for developers 2022-10-17 13:23:50 +00:00			`## For Developers`

			`1. Install [Python](https://www.python.org/) and [Node.js](https://nodejs.org/)`
			2. [Install poetry](https://python-poetry.org/docs/) to use ``poetry`` command
			`3. Clone and install libraries`

			```console
			`# Clone`
			`git clone https://github.com/shirayu/whispering.git`

			`# With poetry`
			`poetry config virtualenvs.in-project true`
			`poetry install --all-extras`
			`poetry run pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116`

			`# With npm`
			`npm install`
			```

			`4. Run test and check that no errors occur`

			```bash
			`poetry run make -j4`
			```

			`5. Make fancy updates`
Add make style (Resolve #42) 2022-10-21 13:08:43 +00:00			`6. Make style`

			```bash
			`poetry run make style`
			```

			`7. Run test again and check that no errors occur`
Add document for developers 2022-10-17 13:23:50 +00:00
			```bash
			`poetry run make -j4`
			```

Add make style (Resolve #42) 2022-10-21 13:08:43 +00:00			8. Check typos by using [typos](https://github.com/crate-ci/typos). Just run ``typos`` command in the root directory.
Add document for developers 2022-10-17 13:23:50 +00:00
			```bash
			`typos`
			```

Add make style (Resolve #42) 2022-10-21 13:08:43 +00:00			`9. Send Pull requests!`
Add document for developers 2022-10-17 13:23:50 +00:00
Initial commit 2022-09-23 10:20:11 +00:00			`## License`

			`- [MIT License](LICENSE)`
			`- Some codes are ported from the original whisper. Its license is also [MIT License](LICENSE.whisper)`