LLM Configuration
LLM (Large Language Model) is one of VideoCaptioner's core features, powering subtitle segmentation, optimization, and translation.
Why Configure LLM?
- Semantic segmentation — Natural, meaningful subtitle breaks instead of mechanical line splits
- Error correction — Automated fixing of ASR mistakes, terminology standardization
- Context-aware translation — Higher quality translation that understands meaning, not just words
Optional but recommended
VideoCaptioner works without LLM using free ASR + Bing translation. But adding an LLM dramatically improves quality for less than $0.002 per 14-minute video.
Supported Providers
| Provider | Strengths | Best For |
|---|---|---|
| OpenAI | Top quality, stable API | Premium quality needs |
| DeepSeek | Cost-effective, excellent Chinese | Chinese content |
| SiliconCloud | Domestic availability, many models | China-based users |
| Gemini | Google, generous free tier | Budget-conscious users |
| Ollama | Fully local, free, private | Privacy-sensitive use |
| LM Studio | Local with GUI | Local deployment |
| VideoCaptioner Relay | Multi-model, high concurrency, optimized | General use (recommended) |
Setup: Project Relay Station (Recommended)
The VideoCaptioner relay service provides access to OpenAI, Claude, and Gemini models with high concurrency and project-specific optimization. Comes with $0.4 free credit for testing.
videocaptioner config set llm.api_base https://api.videocaptioner.cn/v1
videocaptioner config set llm.api_key your-relay-key
videocaptioner config set llm.model gpt-4o-mini
Model Quality Tiers
| Tier | Models | Cost Ratio |
|---|---|---|
| High quality | gemini-2.0-flash-exp, claude-sonnet-4.5 | 3x |
| Good quality | gpt-4o-2024-08-07, claude-haiku | 1.2x |
| Standard | gpt-4o-mini, gemini-2.0-flash | 0.3x |
Setup: OpenAI
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.api_key sk-your-openai-key
videocaptioner config set llm.model gpt-4o-mini
Recommended threads: 10-20. Models: gpt-4o-mini (economical) or gpt-4o (premium quality).
Setup: DeepSeek
videocaptioner config set llm.api_base https://api.deepseek.com/v1
videocaptioner config set llm.api_key your-deepseek-key
videocaptioner config set llm.model deepseek-chat
Recommended threads: 5-10. Excellent for Chinese content processing.
Setup: SiliconCloud
videocaptioner config set llm.api_base https://api.siliconflow.cn/v1
videocaptioner config set llm.api_key your-siliconcloud-key
videocaptioner config set llm.model Qwen/Qwen2.5-72B-Instruct
Recommended threads: 3-5. Note: unverified accounts have daily request limits on some models.
Setup: Ollama (Local)
# Install and pull a model
ollama pull qwen2.5:14b
# Configure VideoCaptioner
videocaptioner config set llm.api_base http://localhost:11434/v1
videocaptioner config set llm.api_key ollama
videocaptioner config set llm.model qwen2.5:14b
Recommended threads: 2-4 (CPU), 4-8 (GPU). Models with 14B+ parameters recommended for good quality.
Note on local models
Local models typically produce lower quality results than cloud APIs. Use 14B+ parameter models for acceptable subtitle quality.
Advanced Settings
Temperature
| Range | Behavior | Recommended For |
|---|---|---|
| 0.1 - 0.3 | Stable, conservative output | Subtitle optimization |
| 0.5 - 0.7 | Natural, flexible output | Translation |
| 0.8 - 1.0 | Creative, unpredictable | Not recommended |
Default: 0.3 — works well for most scenarios.
Thread Count by Provider
| Provider | Recommended Threads |
|---|---|
| OpenAI | 10-20 |
| Relay station | 20-50 |
| DeepSeek | 5-10 |
| SiliconCloud | 3-5 |
| Ollama (local) | 2-8 |
Cost Estimation
For a 14-minute video (~2,000 characters, ~50 subtitle segments):
- gpt-4o-mini: ~5,000 tokens → < $0.002
- gpt-4o: ~5,000 tokens → ~$0.01
- With reflection translation: costs increase proportionally (2-3x)
LLM only processes text content without timeline data, so token consumption is very low.
Troubleshooting
Connection test fails
- Verify API key format (OpenAI keys start with
sk-) - Confirm Base URL includes
/v1suffix, no trailing slash - Check network connectivity and firewall settings
- Review logs in Settings > Logs
Frequent 429 errors
Your concurrency is too high. Reduce thread count and retry.
Poor output quality
- Upgrade to a better model
- Lower temperature (0.3 → 0.1)
- Add manuscript hints with terminology
- Enable reflective translation
Recommended Configurations
| Profile | Provider | Model | Threads | Temperature |
|---|---|---|---|---|
| Beginner | Relay station | gpt-4o-mini | 20 | 0.3 |
| Quality | Relay station | claude-sonnet-4.5 | 15 | 0.3 |
| Budget | SiliconCloud | Qwen2.5-72B-Instruct | 5 | 0.3 |
| Privacy | Ollama | qwen2.5:14b | 4 | 0.5 |