Workflow

Understand the complete processing pipeline of VideoCaptioner, from video input to final output.

Processing Flow

1Video Input

→

2Speech Recognition

→

3Subtitle Segmentation

→

4Subtitle Optimization

→

5Translation

→

6Video Synthesis

Full-Process Mode (Simplest)

The recommended approach for most users. Everything happens automatically:

Create a task in the main interface (GUI) or use the process command (CLI)
Drag and drop your video, or paste a YouTube/Bilibili URL
Click "Start" — the pipeline runs: transcription → segmentation → optimization → translation → synthesis
Output saves to the work-dir/ folder

CLI

videocaptioner process video.mp4 --target-language ja

Step-by-Step Mode

For more control, run each stage individually:

Step 1: Speech Transcription

Select your audio or video file
Configure the source language and VAD method
Optionally enable audio separation for noisy environments
Run transcription

Step 2: Subtitle Optimization & Translation

Load the generated subtitle file
Enable smart segmentation (semantic or linguistic mode)
Enable subtitle error correction
Configure translation target language
Optionally provide manuscript hints for terminology accuracy

Step 3: Video Synthesis

Choose a visual style preset (popular science, news, etc.)
Select method: hard-coded (burned in) or soft subtitles (separate track)
Run synthesis

Tips for Best Results

Improving quality

Use FasterWhisper with Large-v2 model for best accuracy
Enable VAD filtering to reduce hallucinations
Enable audio separation in noisy environments
Use intelligent segmentation for natural sentence breaks
Provide manuscript hints with terminology and proper nouns

Speeding up processing

Use online ASR (Bijian) to skip model downloads
Increase LLM concurrent threads
Use soft subtitle synthesis (faster than hard-coded)
Disable unnecessary optimization steps