whispering/README.md


# Whispering

[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](LICENSE)
![Python Versions](https://img.shields.io/badge/Python-3.8%20--%203.10-blue)

[![CI](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/ci.yml)
[![CodeQL](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)
[![Typos](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/typos.yml)

Streaming transcriber with [whisper](https://github.com/openai/whisper).
Enough machine power is needed to transcribe in real time.

## Setup

```bash
pip install -U git+https://github.com/shirayu/whispering.git@v0.5.0

# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
```

## Example of microphone

```bash
# Run in English
whispering --language en --model tiny
```

- ``--help`` shows full options
- ``--model`` set the [model name](https://github.com/openai/whisper#available-models-and-languages) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
- ``--no-progress`` disables the progress message
- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
- ``--debug`` outputs logs for debug
- ``--no-vad`` disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period
- ``--output`` sets output file (Default: Standard output)

### Parse interval

Without ``--allow-padding``, whispering just performs VAD for the period,
and when it is predicted as "silence", it will not be passed to whisper.
If you want to change the VAD interval, change ``-n``.

If you want quick response, set small ``-n`` and add ``--allow-padding``.
However, this may sacrifice the accuracy.

```bash
whispering --language en --model tiny -n 20 --allow-padding
```

## Example of web socket

⚠  **No security mechanism. Please make secure with your responsibility.**

Run with ``--host`` and ``--port``.

### Host

```bash
whispering --language en --model tiny --host 0.0.0.0 --port 8000
```

### Client

```bash
whispering --host ADDRESS_OF_HOST --port 8000 --mode client
```

You can set ``-n``, ``--allow-padding`` and other options.

## Tips

## PortAudio Error

If you get ``OSError: PortAudio library not found``: Install ``portaudio``

```bash
# Ubuntu
sudo apt-get install portaudio19-dev
```

## License

- [MIT License](LICENSE)
- Some codes are ported from the original whisper. Its license is also [MIT License](LICENSE.whisper)
Initial commit 2022-09-23 10:20:11 +00:00
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`# Whispering`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-25 15:25:44 +00:00			`[![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)](LICENSE)`
Fix README 2022-09-28 08:14:56 +00:00			`![Python Versions](https://img.shields.io/badge/Python-3.8%20--%203.10-blue)`
Updated README 2022-09-25 15:25:44 +00:00
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`[![CI](https://github.com/shirayu/whispering/actions/workflows/ci.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/ci.yml)`
			`[![CodeQL](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/codeql-analysis.yml)`
			`[![Typos](https://github.com/shirayu/whispering/actions/workflows/typos.yml/badge.svg)](https://github.com/shirayu/whispering/actions/workflows/typos.yml)`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-24 00:38:40 +00:00			`Streaming transcriber with [whisper](https://github.com/openai/whisper).`
Updated README 2022-09-24 19:49:17 +00:00			`Enough machine power is needed to transcribe in real time.`
Initial commit 2022-09-23 10:20:11 +00:00
Updated README 2022-09-24 15:46:37 +00:00			`## Setup`
Initial commit 2022-09-23 10:20:11 +00:00
			```bash
v0.5.0 2022-10-08 14:47:20 +00:00			`pip install -U git+https://github.com/shirayu/whispering.git@v0.5.0`
Add description 2022-09-23 13:19:53 +00:00
Updated README 2022-09-25 15:25:44 +00:00			`# If you use GPU, install proper torch and torchaudio`
Add pytorch link 2022-10-03 13:56:11 +00:00			`# Check https://pytorch.org/get-started/locally/`
Fix instruction 2022-09-24 03:38:37 +00:00			`# Example : torch for CUDA 11.6`
Updated setup instruction 2022-09-25 15:14:20 +00:00			`pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116`
Updated README 2022-09-24 15:46:37 +00:00			```

			`## Example of microphone`
Add a setup step 2022-09-24 02:02:40 +00:00
Updated README 2022-09-24 15:46:37 +00:00			```bash
Add --language description 2022-09-24 00:26:37 +00:00			`# Run in English`
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`whispering --language en --model tiny`
Initial commit 2022-09-23 10:20:11 +00:00			```

Updated README 2022-09-24 00:38:40 +00:00			- ``--help`` shows full options
Updated README 2022-09-25 15:50:41 +00:00			- ``--model`` set the [model name](https://github.com/openai/whisper#available-models-and-languages) to use. Larger models will be more accurate, but may not be able to transcribe in real time.
Remove multi language feature (Close #23) 2022-10-07 15:03:03 +00:00			- ``--language`` sets the language to transcribe. The list of languages are shown with ``whispering -h``
Add progress bar and --no-progress option 2022-09-29 12:59:12 +00:00			- ``--no-progress`` disables the progress message
Fix README 2022-09-29 18:06:41 +00:00			- ``-t`` sets temperatures to decode. You can set several like ``-t 0.0 -t 0.1 -t 0.5``, but too many temperatures exhaust decoding time
Add message 2022-09-23 13:46:27 +00:00			- ``--debug`` outputs logs for debug
Add --no-vad option 2022-10-02 11:38:21 +00:00			- ``--no-vad`` disables VAD (Voice Activity Detection). This forces whisper to analyze non-voice activity sound period
Add 2022-10-03 14:27:54 +00:00			- ``--output`` sets output file (Default: Standard output)
Add description 2022-09-23 13:19:53 +00:00
Updated README 2022-09-24 15:46:37 +00:00			`### Parse interval`

Add description 2022-10-02 14:01:52 +00:00			Without ``--allow-padding``, whispering just performs VAD for the period,
			`and when it is predicted as "silence", it will not be passed to whisper.`
			If you want to change the VAD interval, change ``-n``.

Updated README 2022-09-24 15:46:37 +00:00			If you want quick response, set small ``-n`` and add ``--allow-padding``.
Fix README 2022-09-24 16:14:23 +00:00			`However, this may sacrifice the accuracy.`
Updated README 2022-09-24 15:46:37 +00:00
			```bash
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`whispering --language en --model tiny -n 20 --allow-padding`
Updated README 2022-09-24 15:46:37 +00:00			```

Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			`## Example of web socket`

Updated 2022-09-24 12:57:46 +00:00			`⚠ No security mechanism. Please make secure with your responsibility.`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00
Updated README 2022-09-24 18:59:09 +00:00			Run with ``--host`` and ``--port``.
Updated 2022-09-24 15:48:27 +00:00
Updated 2022-09-24 15:55:37 +00:00			`### Host`

Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```bash
Renamed whisper_streaming to whispering 2022-09-25 15:29:20 +00:00			`whispering --language en --model tiny --host 0.0.0.0 --port 8000`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```

Updated 2022-09-24 15:48:27 +00:00			`### Client`

Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```bash
Removed a needless option 2022-09-29 13:16:51 +00:00			`whispering --host ADDRESS_OF_HOST --port 8000 --mode client`
Add README (Resolve #7) 2022-09-24 12:54:25 +00:00			```

Use --allow-padding in clients 2022-10-02 13:16:16 +00:00			You can set ``-n``, ``--allow-padding`` and other options.
Updated README 2022-09-24 18:59:09 +00:00
Initial commit 2022-09-23 10:20:11 +00:00			`## Tips`

Add README 2022-09-24 04:58:39 +00:00			`## PortAudio Error`

Initial commit 2022-09-23 10:20:11 +00:00			If you get ``OSError: PortAudio library not found``: Install ``portaudio``

			```bash
			`# Ubuntu`
			`sudo apt-get install portaudio19-dev`
			```

			`## License`

			`- [MIT License](LICENSE)`
			`- Some codes are ported from the original whisper. Its license is also [MIT License](LICENSE.whisper)`