LLM Configuration

LLM (Large Language Model) is one of VideoCaptioner's core features, powering subtitle segmentation, optimization, and translation.

Why Configure LLM?

  • Semantic segmentation — Natural, meaningful subtitle breaks instead of mechanical line splits
  • Error correction — Automated fixing of ASR mistakes, terminology standardization
  • Context-aware translation — Higher quality translation that understands meaning, not just words

Optional but recommended

VideoCaptioner works without LLM using free ASR + Bing translation. But adding an LLM dramatically improves quality for less than $0.002 per 14-minute video.

Supported Providers

ProviderStrengthsBest For
OpenAITop quality, stable APIPremium quality needs
DeepSeekCost-effective, excellent ChineseChinese content
SiliconCloudDomestic availability, many modelsChina-based users
GeminiGoogle, generous free tierBudget-conscious users
OllamaFully local, free, privatePrivacy-sensitive use
LM StudioLocal with GUILocal deployment
VideoCaptioner RelayMulti-model, high concurrency, optimizedGeneral use (recommended)

Setup: Project Relay Station (Recommended)

The VideoCaptioner relay service provides access to OpenAI, Claude, and Gemini models with high concurrency and project-specific optimization. Comes with $0.4 free credit for testing.

Terminal
videocaptioner config set llm.api_base https://api.videocaptioner.cn/v1
videocaptioner config set llm.api_key your-relay-key
videocaptioner config set llm.model gpt-4o-mini

Model Quality Tiers

TierModelsCost Ratio
High qualitygemini-2.0-flash-exp, claude-sonnet-4.53x
Good qualitygpt-4o-2024-08-07, claude-haiku1.2x
Standardgpt-4o-mini, gemini-2.0-flash0.3x

Setup: OpenAI

Terminal
videocaptioner config set llm.api_base https://api.openai.com/v1
videocaptioner config set llm.api_key sk-your-openai-key
videocaptioner config set llm.model gpt-4o-mini

Recommended threads: 10-20. Models: gpt-4o-mini (economical) or gpt-4o (premium quality).

Setup: DeepSeek

Terminal
videocaptioner config set llm.api_base https://api.deepseek.com/v1
videocaptioner config set llm.api_key your-deepseek-key
videocaptioner config set llm.model deepseek-chat

Recommended threads: 5-10. Excellent for Chinese content processing.

Setup: SiliconCloud

Terminal
videocaptioner config set llm.api_base https://api.siliconflow.cn/v1
videocaptioner config set llm.api_key your-siliconcloud-key
videocaptioner config set llm.model Qwen/Qwen2.5-72B-Instruct

Recommended threads: 3-5. Note: unverified accounts have daily request limits on some models.

Setup: Ollama (Local)

Terminal
# Install and pull a model
ollama pull qwen2.5:14b

# Configure VideoCaptioner
videocaptioner config set llm.api_base http://localhost:11434/v1
videocaptioner config set llm.api_key ollama
videocaptioner config set llm.model qwen2.5:14b

Recommended threads: 2-4 (CPU), 4-8 (GPU). Models with 14B+ parameters recommended for good quality.

Note on local models

Local models typically produce lower quality results than cloud APIs. Use 14B+ parameter models for acceptable subtitle quality.

Advanced Settings

Temperature

RangeBehaviorRecommended For
0.1 - 0.3Stable, conservative outputSubtitle optimization
0.5 - 0.7Natural, flexible outputTranslation
0.8 - 1.0Creative, unpredictableNot recommended

Default: 0.3 — works well for most scenarios.

Thread Count by Provider

ProviderRecommended Threads
OpenAI10-20
Relay station20-50
DeepSeek5-10
SiliconCloud3-5
Ollama (local)2-8

Cost Estimation

For a 14-minute video (~2,000 characters, ~50 subtitle segments):

  • gpt-4o-mini: ~5,000 tokens → < $0.002
  • gpt-4o: ~5,000 tokens → ~$0.01
  • With reflection translation: costs increase proportionally (2-3x)

LLM only processes text content without timeline data, so token consumption is very low.

Troubleshooting

Connection test fails

  • Verify API key format (OpenAI keys start with sk-)
  • Confirm Base URL includes /v1 suffix, no trailing slash
  • Check network connectivity and firewall settings
  • Review logs in Settings > Logs

Frequent 429 errors

Your concurrency is too high. Reduce thread count and retry.

Poor output quality

  • Upgrade to a better model
  • Lower temperature (0.3 → 0.1)
  • Add manuscript hints with terminology
  • Enable reflective translation
ProfileProviderModelThreadsTemperature
BeginnerRelay stationgpt-4o-mini200.3
QualityRelay stationclaude-sonnet-4.5150.3
BudgetSiliconCloudQwen2.5-72B-Instruct50.3
PrivacyOllamaqwen2.5:14b40.5