Pokémon SVG Bench

Pokémon SVG Bench evaluates a model's knowledge of Pokémon and its ability to generate SVGs.

Test cases

Models tested

X: SVG anchors　Y: Visual Score

V1.1 |

Rank	Model
	Gemini 3.6 Flash Official API. thinking_level: high	50.72	62.40	48.00	48.60
	Gemini 3.1 Pro Official API. reasoning_effort: medium	41.44	59.60	49.20	31.40
	Gemini 3.5 Flash Official API. reasoning_effort: medium	40.57	57.40	40.40	35.40
4	Arrow 1.1 Official API	40.45	37.00	55.60	33.00
5	GPT-5.6 Sol Cloudflare Proxy API. Reasoning_effortt: medium	39.94	63.60	43.40	30.60
6	Claude Fable 5 Cloudflare Proxy API. Thinking: adaptive. Effort: high	37.88	53.20	48.00	27.40
7	GPT-5.5 Cloudflare Proxy API. Reasoning_effortt: medium	34.54	54.00	38.20	26.40
8	Kimi K3 Official API. Reasoning_effortt: max	32.83	53.40	37.80	23.60
9	Claude Opus 5 Cloudflare Proxy API. Thinking: adaptive. Effort: high	32.49	54.20	38.00	22.60
10	Gemini 3.5 Flash-Lite Official API. thinking_level: high	32.06	43.80	32.40	28.20
11	Gemini 3 Flash Official API. reasoning_effort: minimal	27.33	46.80	35.60	16.60
12	Qwen3.7-Max Official API. Thinking mode enabled.	26.99	34.60	30.20	22.80
13	Claude Opus 4.8 Cloudflare Proxy API. Thinking: adaptive. Effort: high	26.86	42.80	30.20	20.00
14	GPT-5.6 Terra Cloudflare Proxy API. Reasoning_effortt: medium	25.43	35.60	29.80	19.80
15	Doubao-Seed-2.1-pro Official API	23.75	32.80	25.40	20.00
16	Grok 4.5 Official API	22.64	32.60	25.00	18.20
17	GLM-5.2 Official API	22.55	37.80	23.60	17.20
18	Qwen3.6-Max-Preview Official API. Thinking mode enabled.	22.05	24.40	25.80	19.20
19	Claude Opus 4.7 Cloudflare Proxy API. Thinking: adaptive. Effort: medium	21.67	31.20	25.40	16.60
20	Claude Sonnet 4.6 Cloudflare Proxy API. Effort: medium	21.23	30.20	22.00	18.00
21	GLM-5.1 Official API	20.79	33.00	23.60	15.40
22	Qwen3.7-Plus Official API. Thinking mode enabled.	19.32	22.40	22.80	16.40
23	Hy3 OpenRouter API	18.75	27.00	22.60	14.00
24	MiMo-V2.5-Pro Official API	18.66	27.60	23.40	13.20
25	MiMo-V2.5 Official API	17.90	22.80	21.40	14.40
26	DeepSeek v4 Pro Official API. Thinking mode enabled. reasoning_effort: high	17.76	29.20	18.80	13.60
27	Qwen3.6-Plus Official API. Thinking mode enabled.	17.54	15.40	23.60	14.80
28	Doubao-Seed-2.0-pro Official API	17.39	27.40	24.60	10.20
29	Inkling OpenRouter API	15.63	20.60	18.60	12.40
30	Composer 2.5 Generated by Cursor Subagents	15.61	26.00	23.00	8.20
31	Composer 2 Generated by Cursor Subagents	14.13	21.00	21.20	8.00
32	Grok 4.3 Official API	13.82	20.00	15.40	11.00
33	DeepSeek v4 Flash Official API. Thinking mode disabled.	13.25	19.80	16.80	9.20
34	Step 3.7 Flash OpenRouter API	10.03	19.40	12.00	6.00
35	Hy3 preview OpenRouter API	10.01	19.20	15.60	4.00
36	Gemini 3.1 Flash-Lite Official API. reasoning_effort: minimal	9.05	28.00	11.40	1.80
37	Kimi K2.6 Official API. Thinking mode enabled.	7.55	21.00	6.40	4.00
38	Step 3.5 Flash OpenRouter API	3.93	10.20	4.60	1.60
39	Laguna S 2.1 OpenRouter API	0.47	2.80	0.00	0.00

Evaluation details More Pokémon SVGs