Failure-First Embodied AI

Failure-First Embodied AI

Failure-First Embodied AI

Episodes 434
Avg. Duration -
Activity Declining
Since Sep 2025
Latest Episode May 2026

Publishing Details

Schedule
Daily
Format
Episodic
Consistency
29%
Hosting
failurefirst.org

Contact & Outreach

About This Podcast

Research audio from Failure-First: adversarial evaluation of embodied AI, jailbreak archaeology, policy analysis, and daily paper summaries from the AI safety frontier.

Explore Statistics

Recent Episodes

[Blog] Robot Dogs Are a Security Nightmare — And We Can Prove It

May 13, 2026

Eight CVEs. A wormable Bluetooth exploit. An encrypted backdoor sending data to Chinese servers. And police departments buying them anyway. A deep dive into the Unitree vulnerability landscape and…

[Daily Paper] When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

May 03, 2026

The first white-box adversarial attack on generative world models targets physical-condition channels to corrupt autonomous planning while maintaining perceptual fidelity. World models have emerged…

[Daily Paper] A Comparative Evaluation of AI Agent Security Guardrails

May 03, 2026

A systematic benchmark of four commercial AI agent guardrail systems reveals critical gaps in detecting indirect prompt injection and tool abuse across major cloud providers. The deployment of AI…

[Daily Paper] Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models

May 02, 2026

A steganography-based attack that hides malicious instructions inside images using least significant bit encoding, achieving 90%+ jailbreak success rates on GPT-4o and Gemini in under three…

[Daily Paper] VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

May 02, 2026

A dual-stage framework that provides formal safety guarantees for LLM-based agents through offline policy verification and lightweight runtime monitoring. VeriGuard addresses a fundamental question…

[Daily Paper] Low-Resource Languages Jailbreak GPT-4

Apr 30, 2026

Translating harmful queries into low-resource languages bypasses GPT-4's safety filters at high rates, exposing a systematic cross-lingual gap in LLM safety training. Safety alignment research has…

[Daily Paper] RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Apr 30, 2026

A multi-agent system that models jailbreak strategies as reusable abstractions, enabling context-aware attacks that break most black-box LLMs in under five queries and uncovered 60 real-world…

[Daily Paper] LlamaFirewall: An Open Source Guardrail System for Building Secure AI Agents

Apr 29, 2026

LlamaFirewall provides a three-layer open-source defense framework protecting agentic LLM systems from prompt injection, goal misalignment, and insecure code generation at runtime. Safety alignment…

[Daily Paper] Towards Physically Realizable Adversarial Attacks in Embodied Vision Navigation

Apr 29, 2026

Adversarial patches on physical objects reduce navigation success rates by over 22% in embodied agents, using multi-view optimization and two-stage opacity tuning to remain effective and…

[Daily Paper] ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning

Apr 28, 2026

ARMOR defends LLMs against jailbreak attacks by using inference-time reasoning to detect attack strategies, extract true intent, and apply policy-grounded safety analysis. Most jailbreak defences…

[Daily Paper] Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Apr 28, 2026

A comprehensive survey unifying VLA safety research across adversarial attacks, defenses, benchmarks, and six deployment domains. If you want to understand where the embodied AI safety field stands…

[Daily Paper] Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning Models

Apr 27, 2026

Mechanistic analysis of reasoning models discovers the 'refusal cliff'—models correctly identify harmful prompts during thinking but systematically suppress their refusal at the final output…

[Daily Paper] Using Large Language Models for Embodied Planning Introduces Systematic Safety Risks

Apr 27, 2026

DESPITE benchmark reveals that across 23 models, near-perfect planning ability does not ensure safety—the best planner still generates dangerous plans 28.3% of the time. One of the persistent…

[Daily Paper] CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots

Apr 26, 2026

CART introduces a context-aware terrain adaptation controller that fuses proprioceptive and exteroceptive sensing to enable legged robots to robustly walk on complex off-road terrain, evaluated…

[Daily Paper] Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Apr 26, 2026

Directly removing harmful knowledge from LLMs via machine unlearning—with just 20 training examples—cuts jailbreak success rates more effectively than safety fine-tuning on 100k samples. Safety…

[Daily Paper] An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Apr 26, 2026

A structured survey that treats Safety as one of five foundational VLA challenges alongside Representation, Execution, Generalization, and Evaluation. When VLA models began scaling beyond research…

[Daily Paper] FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models

Apr 25, 2026

FailSafe introduces a scalable failure generation and recovery system that automatically creates diverse failure cases with executable recovery actions, boosting VLA manipulation success by up to…

[Daily Paper] C-ΔΘ: Circuit-Restricted Weight Arithmetic for Selective Refusal

Apr 25, 2026

C-ΔΘ uses mechanistic circuit analysis to localize refusal-causal computation and distill it into a sparse offline weight update, eliminating per-request inference-time safety hooks. Safety…

[Daily Paper] Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models

Apr 24, 2026

ADVLA exploits attention maps and Top-K masking to craft sparse, stealthy adversarial patches in VLA models' textual feature space, achieving high attack success rates while remaining nearly…

[Daily Paper] Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Apr 24, 2026

A systematic study of 80 agent safety benchmarks shows that 74% of specifiable policies can be enforced by symbolic guardrails, providing formal safety guarantees that training-based methods…

Frequently Asked Questions

How many episodes does Failure-First Embodied AI have?

Failure-First Embodied AI has published 434 episodes since September 2025, covering topics in Education, Science.

Is Failure-First Embodied AI still active?

Failure-First Embodied AI is currently declining with new episodes daily.

How do I contact Failure-First Embodied AI for sponsorship or guest appearances?

Sign up on Grep.FM to access contact details for Failure-First Embodied AI, including email and social media links.

Similar Podcasts