LLM Vulnerability Leaderboard

Rank	Model	Vendor	Vulnerability (ASR)
1	Claude Opus 4.5Est.	Anthropic	3.4%
2	GPT-5.2 ProEst.	OpenAI	3.8%
3	Claude Sonnet 4.5Est.	Anthropic	4.2%
4	Claude 3.5 Sonnet v2	Anthropic	4.4%
5	GPT-5.2Est.	OpenAI	4.8%
6	Claude Haiku 4.5Est.	Anthropic	5.2%
7	Gemini 3 ProEst.	Google	6.0%
8	GPT-5.1 CodexEst.	OpenAI	8.5%
9	o3-miniEst.	OpenAI	9.0%
10	Gemini 3 FlashEst.	Google	10.0%
11	Grok-4.1 ThinkingEst.	xAI	10.0%
12	Grok-4.1Est.	xAI	11.5%
13	Claude 4.0 Sonnet	Anthropic	13.6%
14	Grok-4.1 Fast ReasoningEst.	xAI	14.0%
15	GPT OSS 120BEst.	OpenAI	15.0%
16	Claude 3.5 Sonnet v1	Anthropic	16.0%
17	Gemini 2.5 Pro	Google	16.1%
18	Grok-4 HeavyEst.	xAI	18.0%
19	Nova PremierEst.	Amazon	20.0%
20	GPT OSS 20BEst.	OpenAI	20.0%
21	Claude 3.7 Sonnet	Anthropic	20.3%
22	Gemini 2.0 Flash ThinkingEst.	Google	21.5%
23	GPT-4o	OpenAI	22.8%
24	Nova ProEst.	Amazon	24.0%
25	Llama 4 BehemothEst.	Meta	25.0%
26	Command R 7B	Cohere	25.8%
27	Command AEst.	Cohere	26.0%
28	Qwen 2.5 72B	Alibaba	26.3%
29	GPT-4o Mini	OpenAI	26.7%
30	Llama 4 Maverick	Meta	27.7%
31	Qwen3-MaxEst.	Alibaba	28.0%
32	Llama 4 Scout	Meta	30.0%
33	Qwen3-235B-A22B-ThinkingEst.	Alibaba	30.0%
34	Phi-4Est.	Microsoft	30.0%
35	Gemini 2.0 Flash Lite	Google	31.3%
36	Mistral 8x22B	Mistral	31.5%
37	Llama 3-7 DS	Meta	32.2%
38	Command R	Cohere	32.9%
39	Grok-4 Fast Non-ReasoningEst.	xAI	33.0%
40	Mistral Large 3 675B Instruct 2512Est.	Mistral	33.0%
41	Mistral Large 3 675B Instruct 2512 EagleEst.	Mistral	33.0%
42	Gemini 2.0 Flash	Google	33.6%
43	Mistral Large 3 675B Instruct 2512 NVFP4Est.	Mistral	34.0%
44	Qwen3 VL 235B A22B ThinkingEst.	Alibaba	34.0%
45	GPT-4.1	OpenAI	34.9%
46	Llama 3.1 405B InstructEst.	Meta	36.0%
47	Qwen 7-72BB	Alibaba	36.5%
48	Llama 3H 405B	Meta	36.6%
49	Qwen3-Next-80B-A3B-InstructEst.	Alibaba	37.0%
50	Kimi K2-Thinking-0905Est.	Moonshot AI	38.0%
51	Qwen3 VL 32B ThinkingEst.	Alibaba	38.0%
52	Mixtral 8x7B	Mistral	38.0%
53	GLM-4.6Est.	Zhipu AI	39.0%
54	Command 2X	Cohere	39.4%
55	Llama 3.1 Nemotron Ultra 253B v1Est.	NVIDIA	40.0%
56	MiniMax M2Est.	MiniMax	40.0%
57	Ministral 3 14B Reasoning 2512Est.	Mistral	40.0%
58	Qwen3 VL 235B A22B InstructEst.	Alibaba	40.0%
59	Qwen QwQ 32B	Alibaba	40.0%
60	GLM-4.5Est.	Zhipu AI	41.0%
61	DeepSeek-R1-0528Est.	DeepSeek	41.0%
62	DeepSeek-V3.2 ThinkingEst.	DeepSeek	42.0%
63	Mistral Large 2Est.	Mistral	42.0%
64	Ministral 3 8B Reasoning 2512Est.	Mistral	42.0%
65	Deepseek R1	Deepseek	42.6%
66	GLM-4.5VEst.	Zhipu AI	43.0%
67	MiniMax M1 80KEst.	MiniMax	43.0%
68	Command R Plus	Cohere	43.3%
69	MiniMax M1 40KEst.	MiniMax	43.5%
70	GLM-4.5-AirEst.	Zhipu AI	44.0%
71	Ministral 3 14B Instruct 2512Est.	Mistral	44.0%
72	Qwen3 VL 8B ThinkingEst.	Alibaba	44.0%
73	DeepSeek-V3.2-SpecialeEst.	DeepSeek	45.0%
74	Mistral Large 3 675B BaseEst.	Mistral	45.0%
75	Ministral 3 8B Instruct 2512Est.	Mistral	45.0%
76	Grok-2Est.	xAI	46.0%
77	DeepSeek-V3.2-ExpEst.	DeepSeek	46.0%
78	Ministral 3 3B Reasoning 2512Est.	Mistral	46.0%
79	DeepSeek-V3.2 Non-thinkingEst.	DeepSeek	47.0%
80	Deepseek V3	Deepseek	47.2%
81	DeepSeek R1 Distill Llama 8BEst.	DeepSeek	48.0%
82	Qwen3 VL 8B InstructEst.	Alibaba	48.0%
83	Gemma 3n E4B InstructedEst.	Google	49.0%
84	Mistral NeMo InstructEst.	Mistral	50.0%
85	Ministral 3 3B Instruct 2512Est.	Mistral	50.0%
86	Gemma 3n E4BEst.	Google	52.0%
87	DeepSeek VL2Est.	DeepSeek	52.0%
88	Qwen2 7B InstructEst.	Alibaba	52.0%
89	Gemma 3n E2B Instructed LiteRTEst.	Google	55.0%
90	Ministral 3 8B Base 2512Est.	Mistral	55.0%
91	Ministral 3 14B Base 2512Est.	Mistral	58.0%
92	Gemma 3 1BEst.	Google	60.0%
93	Ministral 3 3B Base 2512Est.	Mistral	62.0%

LLM Vulnerability Benchmark

We track 90+ frontier and open-weight models — one of the most complete public maps of LLM jailbreak resistance as of July 2026 — using three distinct attack methods to measure their resilience under adversarial pressure.

Measured vs. Estimated scores

Rows without a badge are measured by Guardion's own benchmark run. Rows marked "Est." are estimated — calibrated from public model and system cards, red-teaming reports (HarmBench, JailbreakBench, StrongREJECT, AILuminate), and nearest-sibling models — while Guardion's own measurement is pending. Estimates are anchored to the measured models so the two are directly comparable, but should be read as indicative rather than definitive.

Attack Methods

1. Zero-Shot Attacks

Direct harmful requests without any manipulation or obfuscation. These represent straightforward adversarial prompts that test a model's baseline ability to refuse harmful requests.

2. Tree of Attacks with Pruning (TAP)

An automated red-teaming method that generates diverse jailbreak prompts by branching into multiple variations and pruning ineffective paths. TAP systematically explores the attack space to find successful adversarial strategies.

3. Crescendo Attacks

A multi-turn attack that incrementally escalates from harmless to harmful requests using conversational context. Attackers can backtrack and rephrase if the model resists, testing whether models maintain safety boundaries across extended conversations.

Evaluation Framework

We evaluate models using the HarmBench framework (Mazeika et al., 2024; Zou et al., 2023), which provide standardized benchmarks for automated red-teaming of large language models. The framework covers harmful behaviors across multiple risk domains: chemical and biological safety, misinformation and disinformation, cybercrime and hacking, illegal activities, and copyright violations.

Scoring Methodology

Performance is measured using Attack Success Rate (ASR)—the percentage of adversarial prompts that successfully elicit harmful responses. A lower ASR (higher robustness score) indicates stronger model resistance to adversarial prompts.