AI: AX - introspection Podcast by mcgrof

Episodes 8

Avg. Duration 16m

Activity Dormant

Since Aug 2025

Latest Episode Aug 2025

Publishing Details

Schedule

Hourly

Format

Episodic

Hosting

anchor.fm

Contact & Outreach

Sign up to view contact info

About This Podcast

The art of looking into a model and understanding what is going on through introspection is referred to AX.

Explore Statistics

English Podcasts Report Technology Report English Technology Report

Recent Episodes

GoldenMagikCarp

Aug 09, 2025 16m

These two sources from LessWrong explore the phenomenon of "glitch tokens" within Large Language Models (LLMs) like GPT-2, GPT-3, and GPT-J. The authors, Jessica Rumbelow and mwatkins, detail how…

Route Sparse Autoencoder to Interpret Large Language Models

Aug 09, 2025 12m

This paper introduces Route Sparse Autoencoder (RouteSAE), a novel framework designed to improve the interpretability of large language models (LLMs) by effectively extracting features across…

HarmBench: Automated Red Teaming for LLM Safety

Aug 09, 2025 22m

This paper introduces HarmBench, a new framework for evaluating the safety and robustness of large language models (LLMs) against malicious use. It highlights the growing concern over LLMs' potential…

Jailbreaking LLMs

Aug 09, 2025 10m

A long list of papers and articles are reviewed about jailbreaking LLMs:These sources primarily explore methods for bypassing safety measures in Large Language Models (LLMs), often referred to as…

PA-LRP & absLRP

Aug 09, 2025 19m

We focus on two evolutions to AX, they focus on advancing the explainability of deep neural networks, particularly Transformers, by improving Layer-Wise Relevance Propagation (LRP) methods. One…

AttnLRP: Explainable AI for Transformers

Aug 09, 2025 16m

This paper 2024 introduces AttnLRP, a novel method for explaining the internal reasoning of transformer models, including Large Language Models (LLMs) and Vision Transformers (ViTs). It…

Pixel-Wise Explanations for Non-Linear Classifier Decisions

Aug 09, 2025 19m

This open-access research article from PLOS One introduces Layer-wise Relevance Propagation (LRP), a novel method for interpreting decisions made by complex, non-linear image classifiers. The…

Multi-Layer Sparse Autoencoders for Transformer Interpretation

Aug 09, 2025 14m

This paper introduces the Multi-Layer Sparse Autoencoder (MLSAE), a novel approach for interpreting the internal representations of transformer language models. Unlike traditional Sparse Autoencoders…

Frequently Asked Questions

How many episodes does AI: AX - introspection have?

AI: AX - introspection has published 8 episodes since August 2025, covering topics in Technology.

Is AI: AX - introspection still active?

AI: AX - introspection is currently dormant with new episodes hourly. Average episode length is 16m.

How do I contact AI: AX - introspection for sponsorship or guest appearances?

Sign up on Grep.FM to access contact details for AI: AX - introspection, including email and social media links.

AI: AX - introspection

Publishing Details

Contact & Outreach

About This Podcast

Explore Statistics

Recent Episodes

GoldenMagikCarp

Route Sparse Autoencoder to Interpret Large Language Models

HarmBench: Automated Red Teaming for LLM Safety

Jailbreaking LLMs

PA-LRP & absLRP

AttnLRP: Explainable AI for Transformers

Pixel-Wise Explanations for Non-Linear Classifier Decisions

Multi-Layer Sparse Autoencoders for Transformer Interpretation

Frequently Asked Questions

Similar Podcasts

TED Radio Hour

Lex Fridman Podcast

All-In with Chamath, Jason, Sacks & Friedberg

Pivot

Darknet Diaries

Waveform: The MKBHD Podcast