Getting Started
Get VideoCaptioner up and running on your machine in minutes. This guide covers installation, basic setup, and your first subtitle generation.
System Requirements
| Platform | Minimum | Recommended |
|---|---|---|
| Windows | Windows 10 (64-bit) | Windows 11 |
| macOS | macOS 10.15+ | macOS 12+ |
| Linux | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
| Python | 3.10+ | 3.11+ |
| RAM | 4 GB | 8 GB+ (for local Whisper) |
Installation
Option 1: pip install (Recommended)
The simplest way to get started. Install the CLI tool directly:
pip install videocaptioner
For the GUI desktop application:
pip install videocaptioner[gui]
Option 2: Windows Installer
Download the standalone installer (~60 MB) from the GitHub Releases page. All dependencies are bundled — just install and run.
Option 3: macOS / Linux Script
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
chmod +x run.sh
./run.sh
The script auto-detects your Python environment, creates a virtualenv, installs dependencies, and checks for FFmpeg and aria2.
Option 4: Manual Setup
macOS (Homebrew):
brew install ffmpeg aria2 [email protected]
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python main.pyUbuntu / Debian:
sudo apt update && sudo apt install ffmpeg aria2 python3-venv
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python main.pyCLI Commands
Once installed, the videocaptioner command is available globally:
| Command | Description |
|---|---|
transcribe | Speech-to-subtitle. Supports faster-whisper, whisper-api, bijian (free), jianying (free) |
subtitle | Optimize & translate subtitles via LLM, Bing (free), or Google (free) |
synthesize | Burn subtitles into video file |
process | End-to-end: transcribe → optimize → translate → synthesize |
download | Download videos from YouTube, Bilibili, etc. |
config | Manage settings (show, set, get, path, init) |
Your First Subtitle
The fastest way to generate subtitles — no API key needed:
# Transcribe using free Bijian ASR
videocaptioner transcribe video.mp4 --asr bijian
# Translate to English using free Bing
videocaptioner subtitle output.srt --translator bing --target-language en
# Or do everything in one command
videocaptioner process video.mp4 --target-language en
Tip: Free vs API-powered
You can use VideoCaptioner completely free with Bijian ASR + Bing translation. For higher quality results, configure an LLM API (costs <$0.002 per 14-min video). See the LLM Configuration guide.
GUI Desktop Application
Launch the graphical interface by running videocaptioner without any arguments:
videocaptionerThe GUI provides a visual workflow with drag-and-drop, subtitle preview, and one-click processing.
Basic Configuration
View your current configuration:
videocaptioner config showSet a value:
videocaptioner config set llm.api_key sk-your-key-here
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.model gpt-4o-miniConfiguration priority
CLI arguments > Environment variables (VIDEOCAPTIONER_*) > Config file > Defaults
What's Next
- Quick Example — Process a full TED talk step by step
- LLM Configuration — Set up AI-powered subtitle optimization
- ASR Configuration — Choose the best speech recognition engine
- Workflow — Understand the full processing pipeline