Featured on Aug 4th, 2024

Voicechat2

Open source voice chat infra that rivals GPT-4o

AI voice chat infrastructure that uses WebSockets. It can achieve voice-to-voice latency as low as 300ms (what GPT-4o does) without a unified voice codec. Everything runs on a single high-end consumer GPU.
On an 7900-class AMD RDNA3 card, voice-to-voice latency is in the 1 second range:

Whisper large-v2 (Q5)
Llama 3 8B (Q4_K_M)
tts_models/en/vctk/vits (Coqui TTS default VITS models)
On a 4090, using Faster Whisper with faster-distil-whisper-large-v2 we can cut the latency down to as low as 300ms:
These installation instructions are for Ubuntu LTS and assume you've setup your ROCm or CUDA already.

I recommend you use conda or (my preferred), mamba for environment management. It will make your life easier.

Hunted by @gravedigger

krsuccess

$1.68·4 votes·0 comments

No comments yet