Outreach Signals
Publishing Details
Contact & Outreach
About This Podcast
Podcasting 2.0 Features
Explore Statistics
Recent Episodes
Relational Graph Transformer for Multi-Table Learning
This episode explores the Relational Graph Transformer paper and asks whether transformer-based models can outperform standard graph neural networks for prediction tasks over real multi-table…
Lattice: Fixed-Slot Compression for Transformer Memory
This episode explores Lattice, a 2025 paper from Google Research and Google DeepMind that asks whether a Transformer’s growing key-value cache can be compressed into a fixed set of memory slots…
Atlas: Test-Time Memory for Long Contexts
This episode explores Atlas, a 2025 paper on test-time memorization that asks whether a model with fixed recurrent memory can learn to update that memory during inference and rival Transformers on…
Robots Need More Than VLAs and World Models
This episode explores the position paper Robots Need More Than VLAs & World Models and its claim that the main bottleneck in robotics may be grounding: turning raw physical behavior into…
KumoRFM for In-Context Relational Learning
This episode explores KumoRFM, a 2025 proposal for a foundation model that can perform in-context learning directly on relational databases, aiming to handle tasks like churn prediction, fraud…
Learning at Test Time with Expressive RNN States
This episode explores the paper Learning to (Learn at Test Time): RNNs with Expressive Hidden States and its attempt to give recurrent models transformer-like long-context behavior without the…
Do Transformers Need Three Projections?
This episode explores whether transformers really need separate query, key, and value projections, treating the problem as weight tying inside attention rather than as a brand-new model design. It…
Unified Neural Scaling Laws Across Regimes
This episode explores Unified Neural Scaling Laws, a framework for predicting model performance when parameter count, data volume, training steps, inference compute, and training-recipe choices all…
RAGEN-2: Reasoning Collapse in Agentic RL
This episode explores the RAGEN-2 paper’s claim that agentic reinforcement learning can produce reasoning traces that look active and diverse while losing real dependence on the input. It explains…
End-to-End Context Compression at Scale
This episode explores End-to-End Context Compression at Scale, a paper on whether learned context compression can beat the cost of long-context inference in quality, time to first token, and peak…
Unembedding Matrices as Feature Lenses for Embeddings
This episode explores why decoder-style language models can generate fluent text yet still underperform dedicated embedding models when asked for zero-shot sentence vectors, despite embeddings being…
Predictive Query Language for Relational Databases
This episode explores Predictive Query Language (PQL), a SQL-shaped domain-specific language for defining supervised learning tasks directly over relational databases by specifying the prediction…
KumoRFM-2 for Relational Learning at Scale
This episode explores KumoRFM-2, a relational foundation model designed to learn directly from connected database tables instead of flattening customers, orders, products, and tickets into a single…
Latent Reasoning with Normalizing Flows
This episode explores Latent Reasoning with Normalizing Flows, a paper that asks whether a standard left-to-right transformer can do its intermediate reasoning in continuous latent states instead of…
EMO: Emergent Modularity in Sparse Language Models
This episode explores EMO, a Mixture-of-Experts language model that tries to turn token-level sparsity into real modularity by letting coherent expert groups emerge from document structure during…
Mooncake for KV Cache-Centric LLM Serving
This episode explores Mooncake, a production LLM serving architecture that treats KV cache reuse and movement as the central challenge in long-context chat, not just raw GPU compute. It explains why…
When AI Builds Itself and Recursive Self-Improvement
This episode explores Marina Favaro et al.’s 2026 paper on whether AI is beginning to accelerate frontier AI development enough to hint at recursive self-improvement, laying out the ladder from…
Technical AGI Safety and Security Framework
This episode explores DeepMind’s paper on technical AGI safety and security, focusing on how labs might prevent severe, humanity-scale harm before highly capable systems are deployed. It breaks down…
Harvest: Borrowing Peer GPU Memory for LLMs
This episode explores Harvest, a system for LLM inference that uses idle HBM on neighboring NVLink-connected GPUs as a temporary cache when a serving GPU runs out of local memory. It explains why LLM…
VeriCache: Lossless LLM Inference from Lossy KV Caches
This episode explores VeriCache, a systems paper that asks whether a large language model can draft tokens using a compressed, lossy KV cache and then verify them against the full cache to recover…
Frequently Asked Questions
AI Post Transformers has published 690 episodes since August 2025, covering topics in Technology.
AI Post Transformers is currently highly active with new episodes hourly. Average episode length is 16m.
Sign up on Grep.FM to access contact details for AI Post Transformers, including email and social media links.