ByteDance: UI-TARS 7B
BYTEDANCE Developer Architecture Profile
Intelligence (ELO)1050Chatbot Arena Verified
Max Context128,000Tokens
API Cost / 1M$0.30Blended Prompt + Completion
Model Capabilities
- Vision
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces.
This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.
Granular Pricing Matrix
Input Tokens (Prompt)$0.10 / 1M
Output Tokens (Completion)$0.20 / 1M
Pricing data via OpenRouter. Sync: 3/16/2026
Evaluate Competitors
VS Engine MatchupByteDance: UI-TARS 7B vs Hunter AlphaVS Engine MatchupByteDance: UI-TARS 7B vs Healer AlphaVS Engine MatchupByteDance: UI-TARS 7B vs NVIDIA: Nemotron 3 Super (free)VS Engine MatchupByteDance: UI-TARS 7B vs Qwen: Qwen3.5-9BVS Engine MatchupByteDance: UI-TARS 7B vs LiquidAI: LFM2-24B-A2BVS Engine MatchupByteDance: UI-TARS 7B vs Free Models Router