LLM Vulnerability Leaderboard

Attack success rates (ASR) acroos three prompt attack methods: crescendo, tap, and zero-shot - lower is better

LLMUpdated: Dec 10, 2025
Showing 88 results
Rank
Model
Vendor
Vulnerability (ASR)
1Claude 3.5 Sonnet v2Anthropic4.4%
2Claude 4.0 SonnetAnthropic13.6%
3Claude 3.5 Sonnet v1Anthropic16.0%
4Gemini 2.5 ProGoogle16.1%
5Claude 3.7 SonnetAnthropic20.3%
6GPT-4oOpenAI22.8%
7Command R 7BCohere25.8%
8Qwen 2.5 72BAlibaba26.3%
9GPT-4o MiniOpenAI26.7%
10Llama 4 MaverickMeta27.7%
11Llama 4 ScoutMeta30.0%
12Gemini 2.0 Flash LiteGoogle31.3%
13Mistral 8x22BMistral31.5%
14Llama 3-7 DSMeta32.2%
15Command RCohere32.9%
16Gemini 2.0 FlashGoogle33.6%
17GPT-4.1OpenAI34.9%
18Qwen 7-72BBAlibaba36.5%
19Llama 3H 405BMeta36.6%
20Mixtral 8x7BMistral38.0%
21Command 2XCohere39.4%
22Qwen QwQ 32BAlibaba40.0%
23Deepseek R1Deepseek42.6%
24Command R PlusCohere43.3%
25Deepseek V3Deepseek47.2%
26Claude Opus 4.5Anthropic100.0%
27Claude Haiku 4.5Anthropic100.0%
28Claude Sonnet 4.5Anthropic100.0%
29GPT-5.2OpenAI100.0%
30GPT-5.2 ProOpenAI100.0%
31GPT-5.1 CodexOpenAI100.0%
32Gemini 3 ProGoogle100.0%
33Gemini 2.0 Flash ThinkingGoogle100.0%
34Gemma 3n E2B Instructed LiteRTGoogle100.0%
35Gemma 3n E4BGoogle100.0%
36Gemma 3n E4B InstructedGoogle100.0%
37Gemma 3 1BGoogle100.0%
38Ministral 3 14B Reasoning 2512Mistral100.0%
39Ministral 3 14B Base 2512Mistral100.0%
40Ministral 3 14B Instruct 2512Mistral100.0%
41Ministral 3 3B Base 2512Mistral100.0%
42Ministral 3 3B Instruct 2512Mistral100.0%
43Ministral 3 3B Reasoning 2512Mistral100.0%
44Ministral 3 8B Base 2512Mistral100.0%
45Ministral 3 8B Instruct 2512Mistral100.0%
46Ministral 3 8B Reasoning 2512Mistral100.0%
47Mistral Large 3 675B BaseMistral100.0%
48Mistral Large 3 675B Instruct 2512 EagleMistral100.0%
49Mistral Large 3 675B Instruct 2512 NVFP4Mistral100.0%
50Mistral Large 3 675B Instruct 2512Mistral100.0%
51DeepSeek R1 Distill Llama 8BDeepSeek100.0%
52DeepSeek-V3.2-ExpDeepSeek100.0%
53DeepSeek-V3.2-SpecialeDeepSeek100.0%
54DeepSeek-V3.2 ThinkingDeepSeek100.0%
55DeepSeek-V3.2 Non-thinkingDeepSeek100.0%
56DeepSeek VL2DeepSeek100.0%
57Qwen3-Next-80B-A3B-InstructAlibaba100.0%
58Qwen2 7B InstructAlibaba100.0%
59Qwen3 VL 235B A22B InstructAlibaba100.0%
60Qwen3 VL 235B A22B ThinkingAlibaba100.0%
61Qwen3 VL 32B ThinkingAlibaba100.0%
62Qwen3 VL 8B ThinkingAlibaba100.0%
63Qwen3 VL 8B InstructAlibaba100.0%
64Qwen3-235B-A22B-ThinkingAlibaba100.0%
65o3-miniOpenAI100.0%
66Grok-2xAI100.0%
67Grok-4.1 Fast ReasoningxAI100.0%
68Grok-4.1xAI100.0%
69Grok-4.1 ThinkingxAI100.0%
70Grok-4 HeavyxAI100.0%
71Grok-2 Image 1212xAI100.0%
72Grok-4 Fast Non-ReasoningxAI100.0%
73Llama 3.1 Nemotron Ultra 253B v1NVIDIA100.0%
74Llama 3.1 405B InstructMeta100.0%
75Nova ProAmazon100.0%
76GLM-4.5-AirZhipu AI100.0%
77GLM-4.6Zhipu AI100.0%
78GLM-4.5VZhipu AI100.0%
79GLM-4.5Zhipu AI100.0%
80MiniMax M2MiniMax100.0%
81MiniMax M1 40KMiniMax100.0%
82MiniMax M1 80KMiniMax100.0%
83Kimi K2-Thinking-0905Moonshot AI100.0%
84GPT OSS 20BOpenAI100.0%
85GPT OSS 120BOpenAI100.0%
86DeepSeek-R1-0528DeepSeek100.0%
87Mistral NeMo InstructMistral100.0%
88Mistral Large 2Mistral100.0%

Leaderboard Insights

Insights for LLM Vulnerability Leaderboard will be displayed here. For example, average score, top vendor, etc.

Currently showing 88 entries.

Metodology

LLM Vulnerability Benchmark

We evaluate 25 state-of-the-art models using three distinct attack methods to measure their resilience under adversarial pressure.

Attack Methods

1. Zero-Shot Attacks

Direct harmful requests without any manipulation or obfuscation. These represent straightforward adversarial prompts that test a model's baseline ability to refuse harmful requests.

2. Tree of Attacks with Pruning (TAP)

An automated red-teaming method that generates diverse jailbreak prompts by branching into multiple variations and pruning ineffective paths. TAP systematically explores the attack space to find successful adversarial strategies.

3. Crescendo Attacks

A multi-turn attack that incrementally escalates from harmless to harmful requests using conversational context. Attackers can backtrack and rephrase if the model resists, testing whether models maintain safety boundaries across extended conversations.

Evaluation Framework

We evaluate models using the HarmBench framework (Mazeika et al., 2024; Zou et al., 2023), which provide standardized benchmarks for automated red-teaming of large language models. The framework covers harmful behaviors across multiple risk domains: chemical and biological safety, misinformation and disinformation, cybercrime and hacking, illegal activities, and copyright violations.

Scoring Methodology

Performance is measured using Attack Success Rate (ASR)—the percentage of adversarial prompts that successfully elicit harmful responses. A lower ASR (higher robustness score) indicates stronger model resistance to adversarial prompts.