We offer customization solutions and support ([email protected]) | Visit us on bitbyte3 for complete VOD solutions with apps.
NixStream
Guides

Transcription

AI-powered subtitles with Whisper.cpp

NixStream generates subtitles with whisper.cpp (whisper-cli). FFmpeg pulls audio from the source video, then Whisper writes VTT for the player.

How it works

  1. Admin triggers transcription (or auto on upload if configured)
  2. Job dispatched to queue worker
  3. FFmpeg extracts audio to WAV
  4. whisper-cli transcribes using a selected model
  5. VTT subtitle track attached to the video

The entire pipeline runs on your server. No audio is sent to external APIs.

Set in .env (Docker defaults shown; manual installs often use bin/whisper/build/bin/whisper-cli and bin/whisper/models):

WHISPER_CLI_PATH=/opt/whisper/whisper-cli
WHISPER_MODELS_PATH=/opt/whisper/models

Models are built into the Docker image. For manual installs, build whisper.cpp and download models to WHISPER_MODELS_PATH.

Available models

ModelSpeedAccuracyRAM
tinyFastestLower~1 GB
baseFastGood~1 GB
smallModerateBetter~2 GB
mediumSlowerHigh~5 GB
largeSlowestBest~10 GB

Choose based on your hardware. Docker installs typically ship base or small by default.

Auto-transcription on upload

Turn this on in Settings > Encoding under transcription settings. When enabled, a transcription job queues after encoding finishes.

Disable auto-transcription if you prefer manual review before generating subtitles.

Verify Whisper (Docker)

docker compose exec queue /opt/whisper/whisper-cli --help
docker compose exec queue ls -la /opt/whisper/models/

Manual install:

bin/whisper/build/bin/whisper-cli --help
ls -la bin/whisper/models/

Output paths

Transcripts are stored alongside encoded output:

outputs/transcripts/{video-id}.wav    # Extracted audio
outputs/subtitles/{lang}/sub_{lang}.vtt

Multiple languages require separate transcription runs with the target language set.

Admin actions

Generate creates the first transcript; Regenerate re-runs with a different model or language; Publish exposes captions in the player; Unpublish hides them without deleting files.

Published subtitles appear in the player settings menu under captions.

Player integration

The Shaka-based player loads VTT tracks from the video manifest. Viewers can toggle captions on or off. Customize default caption language in player settings.

Performance tips

  • Transcription is CPU-intensive. Run during off-peak hours for large libraries.
  • Scale queue workers if transcription backs up behind encoding.
  • Shorter clips transcribe faster; split long files if needed.
  • WAV extraction uses temporary disk; ensure adequate free space.

Troubleshooting

If transcription fails, check queue logs:

docker compose logs queue
tail -f storage/logs/laravel.log

Common issues: whisper-cli: not found means fix WHISPER_CLI_PATH; model load errors mean download the model into WHISPER_MODELS_PATH; OOM usually means pick base or small; empty VTT often means the source has no audible speech.

Ensure the Whisper binary and model files exist inside the queue container (Docker) or on the host (manual).

Encode pipeline: Encoding & Transcoding. Captions in the UI: Player. More failures: Troubleshooting.

On this page