Whisper Models
Whisper Speech Recognition Models
High-Performance Model (e.g., ggml-large-v3-turbo.bin):
- Models like
large-v3or the optimizedlarge-v3-turbooffer high accuracy for speech recognition. - However, larger models require significant computational resources (CPU, RAM, VRAM).
- This resource demand can lead to higher latency (slower processing times). The
large-v3-turbovariant is a distilled version oflarge-v3, designed to be faster with a minor trade-off in accuracy.
Improving Latency with Alternative Models:
- If lower latency (faster processing) is a priority, especially on less powerful hardware, consider using smaller or quantized Whisper models.
- These models trade some accuracy for reduced size and faster inference speed.
- Common sizes include:
tiny,base,small,medium. Quantized versions (e.g.,q5_0,q8_0) further reduce resource usage.
Where to Find Models:
You can download various pre-converted Whisper models in the ggml format from Hugging Face repositories:
- https://huggingface.co/sandrohanea/whisper.net/tree/main
- https://huggingface.co/ggerganov/whisper.cpp/tree/main
[!WARNING] If you download an alternative model from the Hugging Face links provided (i.e., any model other than the default one provided below, such as
ggml-base.bin,ggml-small.bin, or a quantized version):
- You must rename the downloaded file exactly to:
ggml-large-v3-turbo.binThis renaming step is crucial because the engine is configured to load only a file with this exact name. Failing to rename the alternative model file will likely result in the application being unable to find and load it.