Workflow
Understand the complete processing pipeline of VideoCaptioner, from video input to final output.
Processing Flow
1Video Input
→
2Speech Recognition
→
3Subtitle Segmentation
→
4Subtitle Optimization
→
5Translation
→
6Video Synthesis
Full-Process Mode (Simplest)
The recommended approach for most users. Everything happens automatically:
- Create a task in the main interface (GUI) or use the
processcommand (CLI) - Drag and drop your video, or paste a YouTube/Bilibili URL
- Click "Start" — the pipeline runs: transcription → segmentation → optimization → translation → synthesis
- Output saves to the
work-dir/folder
CLI
videocaptioner process video.mp4 --target-language ja
Step-by-Step Mode
For more control, run each stage individually:
Step 1: Speech Transcription
- Select your audio or video file
- Configure the source language and VAD method
- Optionally enable audio separation for noisy environments
- Run transcription
Step 2: Subtitle Optimization & Translation
- Load the generated subtitle file
- Enable smart segmentation (semantic or linguistic mode)
- Enable subtitle error correction
- Configure translation target language
- Optionally provide manuscript hints for terminology accuracy
Step 3: Video Synthesis
- Choose a visual style preset (popular science, news, etc.)
- Select method: hard-coded (burned in) or soft subtitles (separate track)
- Run synthesis
Tips for Best Results
Improving quality
- Use FasterWhisper with Large-v2 model for best accuracy
- Enable VAD filtering to reduce hallucinations
- Enable audio separation in noisy environments
- Use intelligent segmentation for natural sentence breaks
- Provide manuscript hints with terminology and proper nouns
Speeding up processing
- Use online ASR (Bijian) to skip model downloads
- Increase LLM concurrent threads
- Use soft subtitle synthesis (faster than hard-coded)
- Disable unnecessary optimization steps