Workflow

Understand the complete processing pipeline of VideoCaptioner, from video input to final output.

Processing Flow

1Video Input
2Speech Recognition
3Subtitle Segmentation
4Subtitle Optimization
5Translation
6Video Synthesis

Full-Process Mode (Simplest)

The recommended approach for most users. Everything happens automatically:

  1. Create a task in the main interface (GUI) or use the process command (CLI)
  2. Drag and drop your video, or paste a YouTube/Bilibili URL
  3. Click "Start" — the pipeline runs: transcription → segmentation → optimization → translation → synthesis
  4. Output saves to the work-dir/ folder
CLI
videocaptioner process video.mp4 --target-language ja

Step-by-Step Mode

For more control, run each stage individually:

Step 1: Speech Transcription

  • Select your audio or video file
  • Configure the source language and VAD method
  • Optionally enable audio separation for noisy environments
  • Run transcription

Step 2: Subtitle Optimization & Translation

  • Load the generated subtitle file
  • Enable smart segmentation (semantic or linguistic mode)
  • Enable subtitle error correction
  • Configure translation target language
  • Optionally provide manuscript hints for terminology accuracy

Step 3: Video Synthesis

  • Choose a visual style preset (popular science, news, etc.)
  • Select method: hard-coded (burned in) or soft subtitles (separate track)
  • Run synthesis

Tips for Best Results

Improving quality

  • Use FasterWhisper with Large-v2 model for best accuracy
  • Enable VAD filtering to reduce hallucinations
  • Enable audio separation in noisy environments
  • Use intelligent segmentation for natural sentence breaks
  • Provide manuscript hints with terminology and proper nouns

Speeding up processing

  • Use online ASR (Bijian) to skip model downloads
  • Increase LLM concurrent threads
  • Use soft subtitle synthesis (faster than hard-coded)
  • Disable unnecessary optimization steps