Transcription
AI-powered subtitles with Whisper.cpp
NixStream generates subtitles with whisper.cpp (whisper-cli). FFmpeg pulls audio from the source video, then Whisper writes VTT for the player.
How it works
- Admin triggers transcription (or auto on upload if configured)
- Job dispatched to queue worker
- FFmpeg extracts audio to WAV
whisper-clitranscribes using a selected model- VTT subtitle track attached to the video
The entire pipeline runs on your server. No audio is sent to external APIs.
Set in .env (Docker defaults shown; manual installs often use bin/whisper/build/bin/whisper-cli and bin/whisper/models):
WHISPER_CLI_PATH=/opt/whisper/whisper-cli
WHISPER_MODELS_PATH=/opt/whisper/modelsModels are built into the Docker image. For manual installs, build whisper.cpp and download models to WHISPER_MODELS_PATH.
Available models
| Model | Speed | Accuracy | RAM |
|---|---|---|---|
tiny | Fastest | Lower | ~1 GB |
base | Fast | Good | ~1 GB |
small | Moderate | Better | ~2 GB |
medium | Slower | High | ~5 GB |
large | Slowest | Best | ~10 GB |
Choose based on your hardware. Docker installs typically ship base or small by default.
Auto-transcription on upload
Turn this on in Settings > Encoding under transcription settings. When enabled, a transcription job queues after encoding finishes.
Disable auto-transcription if you prefer manual review before generating subtitles.
Verify Whisper (Docker)
docker compose exec queue /opt/whisper/whisper-cli --help
docker compose exec queue ls -la /opt/whisper/models/Manual install:
bin/whisper/build/bin/whisper-cli --help
ls -la bin/whisper/models/Output paths
Transcripts are stored alongside encoded output:
outputs/transcripts/{video-id}.wav # Extracted audio
outputs/subtitles/{lang}/sub_{lang}.vttMultiple languages require separate transcription runs with the target language set.
Admin actions
Generate creates the first transcript; Regenerate re-runs with a different model or language; Publish exposes captions in the player; Unpublish hides them without deleting files.
Published subtitles appear in the player settings menu under captions.
Player integration
The Shaka-based player loads VTT tracks from the video manifest. Viewers can toggle captions on or off. Customize default caption language in player settings.
Performance tips
- Transcription is CPU-intensive. Run during off-peak hours for large libraries.
- Scale queue workers if transcription backs up behind encoding.
- Shorter clips transcribe faster; split long files if needed.
- WAV extraction uses temporary disk; ensure adequate free space.
Troubleshooting
If transcription fails, check queue logs:
docker compose logs queue
tail -f storage/logs/laravel.logCommon issues: whisper-cli: not found means fix WHISPER_CLI_PATH; model load errors mean download the model into WHISPER_MODELS_PATH; OOM usually means pick base or small; empty VTT often means the source has no audible speech.
Ensure the Whisper binary and model files exist inside the queue container (Docker) or on the host (manual).
Encode pipeline: Encoding & Transcoding. Captions in the UI: Player. More failures: Troubleshooting.