Deep Papers

Arize AI

English Business Mathematics Science Technology

Website RSS

Episodes 60

Avg. Duration 36m

Activity Highly Active

Since Jan 2023

Latest Episode Feb 2026

Outreach Signals

Features Guests

Publishing Details

Schedule

Monthly

Format

Episodic

Consistency

36%

Hosting

feeds.buzzsprout.com

About This Podcast

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Podcasting 2.0 Features

person

Social Media

X LinkedIn

Explore Statistics

English Podcasts Report Business Report Mathematics Report Science Report Technology Report English Business Report

Recent Episodes

CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent

Feb 11, 2026 23m

We dive into the latest paper from a team of researchers at IBM: "From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production." We're excited to host several of the…

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

Nov 24, 2025 23m

We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture."Hear from one of the paper's authors — Yongchao Chen,…

Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations

Nov 10, 2025 22m

In our latest paper reading, we had the pleasure of hosting Grégoire Mialon — Research Scientist at Meta Superintelligence Labs — to walk us through Meta AI’s groundbreaking paper titled “ARE:…

Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI

Oct 14, 2025 31m

Santosh Vempala, Frederick Storey II Chair of Computing and Distinguished Professor in the School of Computer Science at Georgia Tech, explains his paper co-authored by OpenAI's Adam Tauman Kalai,…

Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies

Sep 22, 2025 26m

Large language models are increasingly used to turn complex study output into plain-English summaries. But how do we know which models are safest and most reliable for healthcare? In this most recent…

Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper

Sep 06, 2025 48m

This episode dives into "Category-Theoretic Analysis of Inter-Agent Communication and Mutual Understanding Metric in Recursive Consciousness." The paper presents an extension of the Recursive…

Small Language Models are the Future of Agentic AI

Sep 05, 2025 31m

We had the privilege of hosting Peter Belcak – an AI Researcher working on the reliability and efficiency of agentic systems at NVIDIA – who walked us through his new paper making the rounds in AI…

Watermarking for LLMs and Image Models

Jul 30, 2025 42m

In this AI research paper reading, we dive into "A Watermark for Large Language Models" with the paper's author John Kirchenbauer. This paper is a timely exploration of techniques for embedding…

Self-Adapting Language Models: Paper Authors Discuss Implications

Jul 08, 2025 31m

The authors of the new paper *Self-Adapting Language Models (SEAL)* shared a behind-the-scenes look at their work, motivations, results, and future directions.The paper introduces a novel method for…

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

Jun 20, 2025 30m

This week we discuss The Illusion of Thinking, a new paper from researchers at Apple that challenges today’s evaluation methods and introduces a new benchmark: synthetic puzzles with controllable…

Accurate KV Cache Quantization with Outlier Tokens Tracing

Jun 04, 2025 25m

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing…

Scalable Chain of Thoughts via Elastic Reasoning

May 16, 2025 28m

In this week's episode, we talk about Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models by explicitly separating the reasoning process…

Sleep-time Compute: Beyond Inference Scaling at Test-time

May 02, 2025 30m

What if your LLM could think ahead—preparing answers before questions are even asked?In this week's paper read, we dive into a groundbreaking new paper from researchers at Letta, introducing…

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

Apr 18, 2025 27m Transcript

For this week's paper read, we dive into our own research.We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data…

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

Apr 04, 2025 26m

This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Humanity's Last Exam (HLE). In the session we…

Model Context Protocol (MCP)

Mar 25, 2025 15m

We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging…

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

Mar 01, 2025 30m

This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks.We break down key announcements,…

How DeepSeek is Pushing the Boundaries of AI Development

Feb 21, 2025 29m

This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its…

Multiagent Finetuning: A Conversation with Researcher Yilun Du

Feb 04, 2025 30m

We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper, "Multiagent Finetuning: Self Improvement with Diverse Reasoning…

Training Large Language Models to Reason in Continuous Latent Space

Jan 14, 2025 24m

LLMs have typically been restricted to reason in the "language space," where chain-of-thought (CoT) is used to solve complex reasoning problems. But a new paper argues that language space may not…

Frequently Asked Questions

How many episodes does Deep Papers have?

Deep Papers has published 60 episodes since January 2023, covering topics in Business, Mathematics.

Is Deep Papers still active?

Deep Papers is currently highly active with new episodes monthly. Average episode length is 36m.