Transcription

NixStream generates subtitles with whisper.cpp (whisper-cli). FFmpeg pulls audio from the source video, then Whisper writes VTT for the player.

How it works

Admin triggers transcription (or auto on upload if configured)
Job dispatched to queue worker
FFmpeg extracts audio to WAV
whisper-cli transcribes using a selected model
VTT subtitle track attached to the video

The entire pipeline runs on your server. No audio is sent to external APIs.

Set in .env (Docker defaults shown; manual installs often use bin/whisper/build/bin/whisper-cli and bin/whisper/models):

WHISPER_CLI_PATH=/opt/whisper/whisper-cli
WHISPER_MODELS_PATH=/opt/whisper/models

Models are built into the Docker image. For manual installs, build whisper.cpp and download models to WHISPER_MODELS_PATH.

Available models

Model	Speed	Accuracy	RAM
`tiny`	Fastest	Lower	~1 GB
`base`	Fast	Good	~1 GB
`small`	Moderate	Better	~2 GB
`medium`	Slower	High	~5 GB
`large`	Slowest	Best	~10 GB

Choose based on your hardware. Docker installs typically ship base or small by default.

Auto-transcription on upload

Turn this on in Settings > Encoding under transcription settings. When enabled, a transcription job queues after encoding finishes.

Disable auto-transcription if you prefer manual review before generating subtitles.

Verify Whisper (Docker)

docker compose exec queue /opt/whisper/whisper-cli --help
docker compose exec queue ls -la /opt/whisper/models/

Manual install:

bin/whisper/build/bin/whisper-cli --help
ls -la bin/whisper/models/

Output paths

Transcripts are stored alongside encoded output:

outputs/transcripts/{video-id}.wav    # Extracted audio
outputs/subtitles/{lang}/sub_{lang}.vtt

Multiple languages require separate transcription runs with the target language set.

Admin actions

Generate creates the first transcript; Regenerate re-runs with a different model or language; Publish exposes captions in the player; Unpublish hides them without deleting files.

Published subtitles appear in the player settings menu under captions.

Player integration

The Shaka-based player loads VTT tracks from the video manifest. Viewers can toggle captions on or off. Customize default caption language in player settings.

Performance tips

Transcription is CPU-intensive. Run during off-peak hours for large libraries.
Scale queue workers if transcription backs up behind encoding.
Shorter clips transcribe faster; split long files if needed.
WAV extraction uses temporary disk; ensure adequate free space.

Troubleshooting

If transcription fails, check queue logs:

docker compose logs queue
tail -f storage/logs/laravel.log

Common issues: whisper-cli: not found means fix WHISPER_CLI_PATH; model load errors mean download the model into WHISPER_MODELS_PATH; OOM usually means pick base or small; empty VTT often means the source has no audible speech.

Ensure the Whisper binary and model files exist inside the queue container (Docker) or on the host (manual).

Encode pipeline: Encoding & Transcoding. Captions in the UI: Player. More failures: Troubleshooting.

Transcription

On this page