| Rank | Model | ||||
|---|---|---|---|---|---|
| Arrow 1.1 Official API | 40.93 | 39.00 | 52.20 | 35.20 | |
| Gemini 3.1 Pro Official API. reasoning_effort: medium | 33.10 | 54.80 | 42.20 | 21.20 | |
| Gemini 3.5 Flash Official API. reasoning_effort: medium | 28.95 | 47.40 | 30.00 | 22.60 | |
| 4 | GPT-5.5 Cloudflare Proxy API. Reasoning_effortt: medium | 22.99 | 50.60 | 25.40 | 13.00 |
| 5 | Gemini 3 Flash Official API. reasoning_effort: minimal | 22.49 | 41.60 | 29.80 | 12.40 |
| 6 | Qwen3.6-Max-Preview Official API. Thinking mode enabled. | 18.53 | 28.60 | 17.80 | 15.80 |
| 7 | DeepSeek v4 Pro Official API. Thinking mode enabled. reasoning_effort: high | 16.37 | 26.20 | 19.40 | 11.60 |
| 8 | GLM-5.1 Official API | 16.30 | 33.40 | 18.00 | 10.00 |
| 9 | MiMo-V2.5-Pro Official API | 15.63 | 27.80 | 17.80 | 10.60 |
| 10 | Qwen3.7-Max Official API. Thinking mode enabled. | 15.40 | 24.20 | 16.20 | 12.20 |
| 11 | Claude Sonnet 4.6 Cloudflare Proxy API. Effort: medium | 14.69 | 29.40 | 13.80 | 10.60 |
| 12 | Claude Opus 4.8 Cloudflare Proxy API. Thinking: adaptive. Effort: high | 13.73 | 26.40 | 12.60 | 10.40 |
| 13 | Doubao-Seed-2.0-pro Official API | 13.57 | 25.40 | 13.00 | 10.20 |
| 14 | MiMo-V2.5 Official API | 13.23 | 22.80 | 9.40 | 12.40 |
| 15 | Qwen3.6-Plus Official API. Thinking mode enabled. | 12.43 | 18.80 | 15.00 | 9.00 |
| 16 | DeepSeek v4 Flash Official API. Thinking mode disabled. | 11.96 | 16.60 | 15.00 | 8.80 |
| 17 | Qwen3.7-Plus Official API. Thinking mode enabled. | 11.31 | 19.20 | 12.80 | 8.00 |
| 18 | Claude Opus 4.7 Cloudflare Proxy API. Thinking: adaptive. Effort: medium | 10.59 | 19.20 | 10.40 | 8.00 |
| 19 | Composer 2 Generated by Cursor Subagents | 8.61 | 13.60 | 11.20 | 5.60 |
| 20 | Grok 4.3 Official API | 7.58 | 13.40 | 6.80 | 6.20 |
| 21 | Gemini 3.1 Flash-Lite Official API. reasoning_effort: minimal | 7.47 | 22.40 | 6.40 | 3.40 |
| 22 | Hy3 preview OpenRouter API | 6.13 | 15.00 | 8.20 | 2.20 |
| 23 | Composer 2.5 Generated by Cursor Subagents | 5.85 | 15.20 | 8.20 | 1.60 |
| 24 | Kimi K2.6 Official API. Thinking mode enabled. | 5.56 | 15.60 | 2.40 | 4.20 |
| 25 | Step 3.7 Flash OpenRouter API | 5.06 | 10.40 | 5.40 | 3.20 |
| 26 | Step 3.5 Flash OpenRouter API | 3.17 | 9.00 | 3.80 | 1.00 |