Inception: Mercury 2
INCEPTION Developer Architecture Profile
Intelligence (ELO)1120Chatbot Arena Verified
Max Context128,000Tokens
API Cost / 1M$1.00Blended Prompt + Completion
Model Capabilities
Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM).
Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost.
Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).
Granular Pricing Matrix
Input Tokens (Prompt)$0.25 / 1M
Output Tokens (Completion)$0.75 / 1M
Pricing data via OpenRouter. Sync: 3/16/2026
Evaluate Competitors
VS Engine MatchupInception: Mercury 2 vs Z.ai: GLM 5 TurboVS Engine MatchupInception: Mercury 2 vs Qwen: Qwen3.5-27BVS Engine MatchupInception: Mercury 2 vs Qwen: Qwen3.5-122B-A10BVS Engine MatchupInception: Mercury 2 vs AionLabs: Aion-2.0VS Engine MatchupInception: Mercury 2 vs Qwen: Qwen3.5 Plus 2026-02-15VS Engine MatchupInception: Mercury 2 vs Qwen: Qwen3.5 397B A17B