LLM Configuration

LLM (Large Language Model) is one of VideoCaptioner's core features, powering subtitle segmentation, optimization, and translation.

Why Configure LLM?

Semantic segmentation — Natural, meaningful subtitle breaks instead of mechanical line splits
Error correction — Automated fixing of ASR mistakes, terminology standardization
Context-aware translation — Higher quality translation that understands meaning, not just words

Optional but recommended

VideoCaptioner works without LLM using built-in ASR + Bing translation. But adding an LLM dramatically improves quality for less than $0.002 per 14-minute video.

Supported Providers

Provider	Strengths	Best For
OpenAI	Top quality, stable API	Premium quality needs
DeepSeek	Cost-effective, excellent Chinese	Chinese content
SiliconCloud	Domestic availability, many models	China-based users
Gemini	Google, generous usage tier	Budget-conscious users
Ollama	Fully local, private	Privacy-sensitive use
LM Studio	Local with GUI	Local deployment
VideoCaptioner Relay	Multi-model, high concurrency, optimized	General use (recommended)

Setup: Project Relay Station (Recommended)

The VideoCaptioner relay service provides access to OpenAI, Claude, and Gemini models with high concurrency and project-specific optimization. Comes with $0.4 trial credit for testing.

Terminal

videocaptioner config set llm.api_base https://api.videocaptioner.cn/v1
videocaptioner config set llm.api_key your-relay-key
videocaptioner config set llm.model gpt-4o-mini

Model Quality Tiers

Tier	Models	Cost Ratio
High quality	gemini-2.0-flash-exp, claude-sonnet-4.5	3x
Good quality	gpt-4o-2024-08-07, claude-haiku	1.2x
Standard	gpt-4o-mini, gemini-2.0-flash	0.3x

Setup: OpenAI

Terminal

videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.api_key sk-your-openai-key
videocaptioner config set llm.model gpt-4o-mini

Recommended threads: 10-20. Models: gpt-4o-mini (economical) or gpt-4o (premium quality).

Setup: DeepSeek

Terminal

videocaptioner config set llm.api_base https://api.deepseek.com/v1
videocaptioner config set llm.api_key your-deepseek-key
videocaptioner config set llm.model deepseek-chat

Recommended threads: 5-10. Excellent for Chinese content processing.

Setup: SiliconCloud

Terminal

videocaptioner config set llm.api_base https://api.siliconflow.cn/v1
videocaptioner config set llm.api_key your-siliconcloud-key
videocaptioner config set llm.model Qwen/Qwen2.5-72B-Instruct

Recommended threads: 3-5. Note: unverified accounts have daily request limits on some models.

Setup: Ollama (Local)

Terminal

# Install and pull a model
ollama pull qwen2.5:14b

# Configure VideoCaptioner
videocaptioner config set llm.api_base http://localhost:11434/v1
videocaptioner config set llm.api_key ollama
videocaptioner config set llm.model qwen2.5:14b

Recommended threads: 2-4 (CPU), 4-8 (GPU). Models with 14B+ parameters recommended for good quality.

Note on local models

Local models typically produce lower quality results than cloud APIs. Use 14B+ parameter models for acceptable subtitle quality.

Advanced Settings

Temperature

Range	Behavior	Recommended For
0.1 - 0.3	Stable, conservative output	Subtitle optimization
0.5 - 0.7	Natural, flexible output	Translation
0.8 - 1.0	Creative, unpredictable	Not recommended

Default: 0.3 — works well for most scenarios.

Thread Count by Provider

Provider	Recommended Threads
OpenAI	10-20
Relay station	20-50
DeepSeek	5-10
SiliconCloud	3-5
Ollama (local)	2-8

Cost Estimation

For a 14-minute video (~2,000 characters, ~50 subtitle segments):

gpt-4o-mini: ~5,000 tokens → < $0.002
gpt-4o: ~5,000 tokens → ~$0.01
With reflection translation: costs increase proportionally (2-3x)

LLM only processes text content without timeline data, so token consumption is very low.

Troubleshooting

Connection test fails

Verify API key format (OpenAI keys start with sk-)
Confirm Base URL includes /v1 suffix, no trailing slash
Check network connectivity and firewall settings
Review logs in Settings > Logs

Frequent 429 errors

Your concurrency is too high. Reduce thread count and retry.

Poor output quality

Upgrade to a better model
Lower temperature (0.3 → 0.1)
Add manuscript hints with terminology
Enable reflective translation

Recommended Configurations

Profile	Provider	Model	Threads	Temperature
Beginner	Relay station	gpt-4o-mini	20	0.3
Quality	Relay station	claude-sonnet-4.5	15	0.3
Budget	SiliconCloud	Qwen2.5-72B-Instruct	5	0.3
Privacy	Ollama	qwen2.5:14b	4	0.5