Getting Started

Get VideoCaptioner up and running on your machine in minutes. This guide covers installation, basic setup, and your first subtitle generation.

System Requirements

Platform	Minimum	Recommended
Windows	Windows 10 (64-bit)	Windows 11
macOS	macOS 10.15+	macOS 12+
Linux	Ubuntu 20.04+ / Debian 11+	Ubuntu 22.04+
Python	3.10+	3.11+
RAM	4 GB	8 GB+ (for local Whisper)

Installation

Option 1: pip install (Recommended)

The simplest way to get started. Install the CLI tool directly:

Terminal

pip install videocaptioner

For the GUI desktop application:

Terminal

pip install videocaptioner[gui]

Option 2: Windows Installer

Download the standalone installer (~60 MB) from the GitHub Releases page. All dependencies are bundled — just install and run.

Option 3: macOS / Linux Script

Terminal

git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
chmod +x run.sh
./run.sh

The script auto-detects your Python environment, creates a virtualenv, installs dependencies, and checks for FFmpeg and aria2.

Option 4: Manual Setup

macOS (Homebrew):

brew install ffmpeg aria2 [email protected]
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python main.py

Ubuntu / Debian:

sudo apt update && sudo apt install ffmpeg aria2 python3-venv
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python main.py

CLI Commands

Once installed, the videocaptioner command is available globally:

Command	Description
`transcribe`	Speech-to-subtitle. Supports faster-whisper, whisper-api, bijian (no key), jianying (no key)
`subtitle`	Optimize & translate subtitles via LLM, Bing, or Google
`synthesize`	Burn subtitles into video file
`process`	End-to-end: transcribe → optimize → translate → synthesize
`download`	Download videos from YouTube, Bilibili, etc.
`config`	Manage settings (show, set, get, path, init)

Your First Subtitle

The fastest way to generate subtitles — no API key needed:

Terminal

# Transcribe using Bijian ASR (no API key required)
videocaptioner transcribe video.mp4 --asr bijian

# Translate to English using Bing
videocaptioner subtitle output.srt --translator bing --target-language en

# Or do everything in one command
videocaptioner process video.mp4 --target-language en

Tip: No-key vs API-powered

You can use VideoCaptioner without any API key using Bijian ASR + Bing translation. For higher quality results, configure an LLM API (costs <$0.002 per 14-min video). See the LLM Configuration guide.

GUI Desktop Application

Launch the graphical interface by running videocaptioner without any arguments:

videocaptioner

The GUI provides a visual workflow with drag-and-drop, subtitle preview, and one-click processing.

Basic Configuration

View your current configuration:

videocaptioner config show

Set a value:

videocaptioner config set llm.api_key sk-your-key-here
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.model gpt-4o-mini

Configuration priority

CLI arguments > Environment variables (VIDEOCAPTIONER_*) > Config file > Defaults

What's Next

Quick Example — Process a full TED talk step by step
LLM Configuration — Set up AI-powered subtitle optimization
ASR Configuration — Choose the best speech recognition engine
Workflow — Understand the full processing pipeline