Adversarial Robustness and Attack Vectors

Research on improving LLM resilience against various attack vectors and understanding vulnerabilities

Hero image

Adversarial Robustness and Attack Vectors

Research on Large Language Models in Adversarial Robustness and Attack Vectors

Bypassing AI Defenses with No Prior Knowledge

Bypassing AI Defenses with No Prior Knowledge

Using CLIP as a surrogate model for no-box adversarial attacks

GazeCLIP: The Future of Gaze Tracking

GazeCLIP: The Future of Gaze Tracking

Enhancing accuracy through text-guided multimodal learning

LLM-Powered Phishing: A New Threat Landscape

LLM-Powered Phishing: A New Threat Landscape

Comparing AI-generated vs. human-crafted lateral phishing attacks

Secure Enterprise LLM Platform

Secure Enterprise LLM Platform

Making customized language models accessible while maintaining security

Safer Robot Decision-Making

Safer Robot Decision-Making

Using LLM Uncertainty to Enhance Robot Safety and Reliability

WildfireGPT: Intelligent Multi-Agent System for Natural Hazards

WildfireGPT: Intelligent Multi-Agent System for Natural Hazards

Enhancing disaster response with specialized RAG-based LLM systems

Enhanced Security Through Smarter Models

Enhanced Security Through Smarter Models

Leveraging Finetuned LLMs as Powerful OOD Detectors

Defending LLMs Against Feedback Manipulation

Defending LLMs Against Feedback Manipulation

Robust algorithms for protecting AI systems from adversarial feedback

Cross-Lingual Backdoor Attacks in LLMs

Cross-Lingual Backdoor Attacks in LLMs

Revealing Critical Security Vulnerabilities Across Languages

Flatter Models, Stronger Defense

Flatter Models, Stronger Defense

Linking Loss Surface Geometry to Adversarial Robustness

Securing LLMs Against Adversarial Attacks

Securing LLMs Against Adversarial Attacks

Novel defense strategy using residual stream activation analysis

Discovering Hidden LLM Vulnerabilities

Discovering Hidden LLM Vulnerabilities

A new approach to identifying realistic toxic prompts that bypass AI safety systems

Fingerprinting LLMs: A New Security Challenge

Fingerprinting LLMs: A New Security Challenge

Identifying specific LLMs with just 8 carefully crafted queries

Defeating Adversarial Phishing Attacks

Defeating Adversarial Phishing Attacks

Evaluating and improving ML-based detection systems against sophisticated threats

Exposing LLM Vulnerabilities

Exposing LLM Vulnerabilities

A Novel Approach to Red-teaming for Toxic Content Generation

Defending AI Against Harmful Fine-tuning

Defending AI Against Harmful Fine-tuning

Introducing Booster: A Novel Defense for LLM Safety

Advancing Model Extraction Attacks on LLMs

Advancing Model Extraction Attacks on LLMs

Locality Reinforced Distillation improves attack effectiveness by 11-25%

Securing DNA Language Models Against Attacks

Securing DNA Language Models Against Attacks

First Comprehensive Assessment of Adversarial Robustness in DNA Classification

Backdoor Threats to Vision-Language Models

Backdoor Threats to Vision-Language Models

Identifying security risks with out-of-distribution data

Bypassing AI Defenses: Smarter Adversarial Attacks

Bypassing AI Defenses: Smarter Adversarial Attacks

New semantically-consistent approach achieves 96.5% attack success rate

Securing LLM-based Agents

Securing LLM-based Agents

A new benchmark for agent security vulnerabilities and defenses

Exposing VLM Vulnerabilities

Exposing VLM Vulnerabilities

Self-supervised adversarial attacks on vision-language models

Security Vulnerabilities in SSM Models

Security Vulnerabilities in SSM Models

Clean-Label Poisoning Can Undermine Generalization

Hidden Dangers in LLM Alignment

Hidden Dangers in LLM Alignment

Advanced Backdoor Attacks That Evade Detection

The Dark Side of Web-Connected AI

The Dark Side of Web-Connected AI

Emerging security threats from LLMs with internet access

Enhancing YOLO with Contextual Intelligence

Enhancing YOLO with Contextual Intelligence

How Retriever-Dictionary modules expand object detection beyond single images

The Multilingual Vulnerability Gap

The Multilingual Vulnerability Gap

How fine-tuning attacks exploit language diversity in LLMs

Hidden Costs of Faster AI

Hidden Costs of Faster AI

How acceleration techniques affect bias in LLMs

Unmasking Backdoor Attacks in LLMs

Unmasking Backdoor Attacks in LLMs

Using AI-generated explanations to detect and understand security vulnerabilities

Vulnerabilities in AI-Powered Robots

Vulnerabilities in AI-Powered Robots

Critical security risks in Vision-Language-Action robotics systems

Targeted Bit-Flip Attacks on LLMs

Targeted Bit-Flip Attacks on LLMs

How evolutionary optimization can compromise model security with minimal effort

Fortifying Visual AI Against Attacks

Fortifying Visual AI Against Attacks

Novel Adversarial Prompt Distillation for Stronger Vision-Language Models

Secure AI Collaboration at the Edge

Secure AI Collaboration at the Edge

Building Resilient Multi-Task Language Models Against Adversarial Threats

Hidden Threats in Code Comprehension

Hidden Threats in Code Comprehension

How imperceptible code manipulations can deceive AI while fooling humans

Exposing Weaknesses in Time Series LLMs

Exposing Weaknesses in Time Series LLMs

Uncovering critical security vulnerabilities in forecasting models

Exploiting the Reasoning Vulnerability of LLMs

Exploiting the Reasoning Vulnerability of LLMs

How the SEED attack compromises LLM safety through subtle error injection

Exposing LLM Vulnerabilities: The AutoDoS Attack

Exposing LLM Vulnerabilities: The AutoDoS Attack

A new black-box approach to force resource exhaustion in language models

Defending LLMs Against Input Attacks

Defending LLMs Against Input Attacks

Making Prompt Engineering Robust to Real-World Text Imperfections

Hidden Threats in Language Models

Hidden Threats in Language Models

Cross-lingual backdoor attacks that evade detection

The Engorgio Attack: A New LLM Security Threat

The Engorgio Attack: A New LLM Security Threat

How malicious prompts can overwhelm language models

Adaptive Security for LLMs

Adaptive Security for LLMs

A New Framework That Balances Security and Usability

Fortifying Vision-Language Models Against Attacks

Fortifying Vision-Language Models Against Attacks

A Two-Stage Defense Strategy for Visual AI Security

Defending Against LLM Jailbreaking

Defending Against LLM Jailbreaking

A Novel Defense Mechanism for Safer AI Systems

The Deception Risk in AI Search Systems

The Deception Risk in AI Search Systems

How content injection attacks manipulate search results and AI judges

Boosting LLM Defense Without Retraining

Boosting LLM Defense Without Retraining

How more compute time creates stronger shields against adversarial attacks

Exposing LLM Vulnerabilities

Exposing LLM Vulnerabilities

Why current defenses fail under worst-case attacks

Strategic Information Handling in LLMs

Strategic Information Handling in LLMs

How LLMs reveal, conceal and infer information in competitive scenarios

Rethinking LLM Security Evaluations

Rethinking LLM Security Evaluations

Current assessments fail to capture real-world cybersecurity risks

Securing Federated Learning Against Attacks

Securing Federated Learning Against Attacks

A Communication-Efficient Approach for Byzantine-Resilient Optimization

Measuring Safety Depth in LLMs

Measuring Safety Depth in LLMs

A mathematical framework for robust AI safety guardrails

Exploiting Safety Vulnerabilities in DeepSeek LLM

Exploiting Safety Vulnerabilities in DeepSeek LLM

How fine-tuning attacks can bypass safety mechanisms in Chain-of-Thought models

Exploiting Human Biases in AI Recommendations

Exploiting Human Biases in AI Recommendations

How cognitive biases create security vulnerabilities in LLM recommenders

Backdoor Vulnerabilities in AI Vision Systems

Backdoor Vulnerabilities in AI Vision Systems

Detecting poisoned samples in CLIP models with 98% accuracy

The Distraction Problem in AI

The Distraction Problem in AI

How irrelevant context compromises LLM security

Protecting Medical AI from Theft

Protecting Medical AI from Theft

Novel attacks expose vulnerabilities in medical imaging models

Navigating the LLM Security Battlefield

Navigating the LLM Security Battlefield

Comprehensive Analysis of Adversarial Attacks on Large Language Models

Synthetic Data for Better AI Security

Synthetic Data for Better AI Security

Using LLMs to Generate OOD Data for Robust Classification

Confidence Elicitation: A New LLM Vulnerability

Confidence Elicitation: A New LLM Vulnerability

How attackers can extract sensitive information without model access

Hidden Dangers in LLMs

Hidden Dangers in LLMs

Mapping the Growing Backdoor Threat Landscape

Visual Illusion: The New Frontier in CAPTCHA Security

Visual Illusion: The New Frontier in CAPTCHA Security

Combating LLM-powered attacks with human visual perception advantages

Bypassing AI Safety Guardrails

Bypassing AI Safety Guardrails

How simple activation shifting can compromise LLM alignment

Rethinking Adversarial Alignment for LLMs

Rethinking Adversarial Alignment for LLMs

Why current approaches to LLM security fall short

Breaking the Fortress of Language Models

Breaking the Fortress of Language Models

A novel backdoor attack targeting o1-like LLMs' reasoning capabilities

UniGuardian: Unified Defense for LLM Security

UniGuardian: Unified Defense for LLM Security

A novel approach to detect and prevent multiple types of prompt-based attacks

Defending Against LLM Permutation Attacks

Defending Against LLM Permutation Attacks

How reordering demonstrations can compromise model security

EigenShield: Fortifying Vision-Language Models

EigenShield: Fortifying Vision-Language Models

A Novel Defense Against Adversarial Attacks Using Random Matrix Theory

Defending AI Models from Poisoned Training Data

Defending AI Models from Poisoned Training Data

A novel adversarial training approach to counter label poisoning attacks

Breaking LLM Guardrails: Advanced Adversarial Attacks

Breaking LLM Guardrails: Advanced Adversarial Attacks

New semantic objective approach improves jailbreak success by 16%

Emoji-Based Attacks on Language Models

Emoji-Based Attacks on Language Models

Invisible Vulnerabilities in Modern NLP Systems

Defending AI Against Adversarial Attacks

Defending AI Against Adversarial Attacks

A robust zero-shot classification approach using CLIP purification

Combating DoS Attacks in LLMs

Combating DoS Attacks in LLMs

Detecting and preventing harmful recursive loops in language models

Strengthening LLM Security Through Robustness Testing

Strengthening LLM Security Through Robustness Testing

New framework detects vulnerabilities in LLM-based NLP applications

Strengthening LLM Robustness Against Prompt Variations

Strengthening LLM Robustness Against Prompt Variations

A latent adversarial framework that improves resilience to paraphrased prompts

Testing LLMs Against Adversarial Defenses

Testing LLMs Against Adversarial Defenses

Evaluating AI's ability to autonomously exploit security measures

Hijacking LLM Agent Reasoning

Hijacking LLM Agent Reasoning

A Novel Framework for Comprehensive Security Testing of AI Agents

LLM Safety and Output Length

LLM Safety and Output Length

How longer responses affect model security under adversarial attacks

Security Vulnerabilities in RLHF Platforms

Security Vulnerabilities in RLHF Platforms

How adversaries can misalign language models through manipulation of reinforcement learning systems

Securing the Gatekeepers: LLM Router Vulnerabilities

Securing the Gatekeepers: LLM Router Vulnerabilities

First comprehensive security analysis of LLM routing systems across their entire lifecycle

The Repeated Token Vulnerability in LLMs

The Repeated Token Vulnerability in LLMs

Understanding and resolving a critical security flaw in language models

Breaking Black-Box AI Models

Breaking Black-Box AI Models

A simple attack approach achieving over 90% success rate against GPT-4.5/4o/o1

Defeating Face-Morphing Attacks with AI

Defeating Face-Morphing Attacks with AI

Zero-Shot Detection Using Multi-Modal LLMs and Vision Models

Boosting Vision-Language Model Security

Boosting Vision-Language Model Security

Evolution-based Adversarial Prompts for Robust AI Systems

Defending LLMs Against Manipulative Attacks

Defending LLMs Against Manipulative Attacks

A Temporal Context Awareness Framework for Multi-Turn Security

Autonomous Defense: Next-Gen LLM Security Testing

Autonomous Defense: Next-Gen LLM Security Testing

AI that continuously evolves to find LLM vulnerabilities

The Achilles' Heel of AI Reasoning

The Achilles' Heel of AI Reasoning

How Manipulated Endings Can Override Correct Reasoning in LLMs

Teleporting Security Across Language Models

Teleporting Security Across Language Models

Zero-shot mitigation of Trojans in LLMs without model-specific alignment data

The Hidden Fragility of LLMs

The Hidden Fragility of LLMs

Understanding and mitigating performance collapse during deployment

Guiding AI Reasoning Through Intervention

Guiding AI Reasoning Through Intervention

A novel approach for controlling LLM behavior during the reasoning process

Defending AI Systems Against Adversarial Attacks

Defending AI Systems Against Adversarial Attacks

A Universal Detection Framework Using Pre-trained Encoders

Advanced Multi-Turn Red Teaming for LLM Security

Advanced Multi-Turn Red Teaming for LLM Security

Emulating sophisticated adversarial attacks through dual-level learning

Defending Recommender Systems from Attacks

Defending Recommender Systems from Attacks

A robust retrieval-augmented framework to combat LLM vulnerabilities

Uncovering LLM Vulnerabilities

Uncovering LLM Vulnerabilities

New methods to identify and address stability issues in language models

Advancing Face Anti-Spoofing Security

Advancing Face Anti-Spoofing Security

Novel Content-Aware Composite Prompt Engineering for Cross-Domain Protection

VRAG: Smart Defense Against Visual Attacks

VRAG: Smart Defense Against Visual Attacks

Training-Free Detection of Visual Adversarial Patches

Hidden Threats in LLM Recommendation Systems

Hidden Threats in LLM Recommendation Systems

How adversaries can manipulate rankings while evading detection

Fortifying AI Reward Systems Against Attacks

Fortifying AI Reward Systems Against Attacks

Adversarial training for more robust AI alignment

Defending Against LLM-Powered Attacks on Rumor Detection

Defending Against LLM-Powered Attacks on Rumor Detection

A novel approach to secure social media analysis from AI-generated manipulation

Securing LLMs Against Hidden Threats

Securing LLMs Against Hidden Threats

Using Influence Functions to Detect Poisoned Fine-tuning Data

LLM Vulnerabilities in Spam Detection

LLM Vulnerabilities in Spam Detection

Security weaknesses in AI-powered spam filters

Evaluating LLM-Powered Security Attacks

Evaluating LLM-Powered Security Attacks

A critical assessment of benchmarking practices in offensive security

Backdoor Vulnerabilities in LLM Recommendations

Backdoor Vulnerabilities in LLM Recommendations

Exposing & defending against security threats in LLM-powered recommendation systems

Unveiling Hidden Threats in LLMs

Unveiling Hidden Threats in LLMs

Detecting semantic backdoors that manipulate AI outputs

Key Takeaways

Summary of Research on Adversarial Robustness and Attack Vectors