Publishing Details
Contact & Outreach
About This Podcast
Explore Statistics
Recent Episodes
AI Agents: Adoption and Usage | Perplexity Comet
Dive into the key findings of the first large-scale field study on the adoption, usage intensity, and use cases of general-purpose AI agents, drawing on hundreds of millions of anonymized user…
S2 WEF & Accenture | Advancing Responsible AI Innovation: A Playbook
This episode of the AI Safety Paper Digest is about the World Economic Forum's new playbook on advancing responsible AI innovation. In cooperation with Accenture, the report provides a practical…
S2E2 Okay Waymo, Crash My Car! 🗣️ Testing Autonomous Vehicle Safety with Adversarial Driving Scenarios | LD-Scene
How can we make autonomous driving systems safer through generative AI? In this episode, we explore LD-Scene, a novel framework that combines Large Language Models (LLMs) with Latent Diffusion Models…
The Full LLM Glossary and Foundations
Ever wanted a clear, comprehensive explanation of all the key terms related to Large Language Models (LLMs)? This episode has you covered.In this >1-hour deep-dive, we'll guide you through the…
S1E9 Anthropic's Best-of-N: Cracking Frontier AI Across Modalities
In this special christmas episode, we delve into "Best-of-N Jailbreaking," a powerful new black-box algorithm that demonstrates the vulnerabilities of cutting-edge AI systems. This approach works by…
S1E8 Auto-Rewards & Multi-Step RL for Diverse AI Attacks by OpenAI
In this episode, we explore the latest advancements in automated red teaming from OpenAI, presented in the paper "Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step…
S1E7 Battle of the Scanners: Top Red Teaming Frameworks for LLMs
In this episode, we explore the findings from "Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis." As large language models (LLMs) are integrated into more…
S1E6 Watermarking LLM Output: SynthID by DeepMind
In this episode, we delve into the groundbreaking watermarking technology presented in the paper "Scalable Watermarking for Identifying Large Language Model Outputs," published in Nature.…
S1E5 Open Source Red Teaming: PyRIT by Microsoft
In this episode, we dive into PyRIT, the open-source toolkit developed by Microsoft for red teaming and security risk identification in generative AI systems. PyRIT offers a model-agnostic framework…
S1E3 Jailbreaking GPT o1: STCA Attack
This podcast, "Jailbreaking GPT o1, " explores how the GPT o1 series, known for its advanced "slow-thinking" abilities, can be manipulated into generating disallowed content like hate speech through…
S1E4 The Attack Atlas by IBM Research
This episode explores the intricate world of red-teaming generative AI models as discussed in the paper "Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI."…
S1E2 The Single-Turn Crescendo Attack
In this episode, we examine the cutting-edge adversarial strategy presented in "Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)." Building on the multi-turn crescendo attack…
S1E1 Outsmarting ChatGPT: The Power of Crescendo Attacks
This episode dives into how the Crescendo Multi-Turn Jailbreak Attack leverages seemingly benign prompts to escalate dialogues with large language models (LLMs) such as ChatGPT, Gemini, and Anthropic…
Frequently Asked Questions
AI Safety - Paper Digest has published 13 episodes since October 2024, covering topics in Technology.
AI Safety - Paper Digest is currently declining with new episodes monthly. Average episode length is 10m.
Sign up on Grep.FM to access contact details for AI Safety - Paper Digest, including email and social media links.