Episodes 8
Avg. Duration 16m
Activity Dormant
Since Aug 2025
Latest Episode Aug 2025

Publishing Details

Schedule
Hourly
Format
Episodic
Hosting
anchor.fm

Contact & Outreach

About This Podcast

The art of looking into a model and understanding what is going on through introspection is referred to AX.

Explore Statistics

Recent Episodes

GoldenMagikCarp

Aug 09, 2025 16m

These two sources from LessWrong explore the phenomenon of "glitch tokens" within Large Language Models (LLMs) like GPT-2, GPT-3, and GPT-J. The authors, Jessica Rumbelow and mwatkins, detail how…

Route Sparse Autoencoder to Interpret Large Language Models

Aug 09, 2025 12m

This paper introduces Route Sparse Autoencoder (RouteSAE), a novel framework designed to improve the interpretability of large language models (LLMs) by effectively extracting features across…

HarmBench: Automated Red Teaming for LLM Safety

Aug 09, 2025 22m

This paper introduces HarmBench, a new framework for evaluating the safety and robustness of large language models (LLMs) against malicious use. It highlights the growing concern over LLMs' potential…

Jailbreaking LLMs

Aug 09, 2025 10m

A long list of papers and articles are reviewed about jailbreaking LLMs:These sources primarily explore methods for bypassing safety measures in Large Language Models (LLMs), often referred to as…

PA-LRP & absLRP

Aug 09, 2025 19m

We focus on two evolutions to AX, they focus on advancing the explainability of deep neural networks, particularly Transformers, by improving Layer-Wise Relevance Propagation (LRP) methods. One…

AttnLRP: Explainable AI for Transformers

Aug 09, 2025 16m

This paper 2024 introduces AttnLRP, a novel method for explaining the internal reasoning of transformer models, including Large Language Models (LLMs) and Vision Transformers (ViTs). It…

Pixel-Wise Explanations for Non-Linear Classifier Decisions

Aug 09, 2025 19m

This open-access research article from PLOS One introduces Layer-wise Relevance Propagation (LRP), a novel method for interpreting decisions made by complex, non-linear image classifiers. The…

Multi-Layer Sparse Autoencoders for Transformer Interpretation

Aug 09, 2025 14m

This paper introduces the Multi-Layer Sparse Autoencoder (MLSAE), a novel approach for interpreting the internal representations of transformer language models. Unlike traditional Sparse Autoencoders…

Frequently Asked Questions

How many episodes does AI: AX - introspection have?

AI: AX - introspection has published 8 episodes since August 2025, covering topics in Technology.

Is AI: AX - introspection still active?

AI: AX - introspection is currently dormant with new episodes hourly. Average episode length is 16m.

How do I contact AI: AX - introspection for sponsorship or guest appearances?

Sign up on Grep.FM to access contact details for AI: AX - introspection, including email and social media links.

Similar Podcasts