Publishing Details
Contact & Outreach
About This Podcast
Explore Statistics
Recent Episodes
GoldenMagikCarp
These two sources from LessWrong explore the phenomenon of "glitch tokens" within Large Language Models (LLMs) like GPT-2, GPT-3, and GPT-J. The authors, Jessica Rumbelow and mwatkins, detail how…
Route Sparse Autoencoder to Interpret Large Language Models
This paper introduces Route Sparse Autoencoder (RouteSAE), a novel framework designed to improve the interpretability of large language models (LLMs) by effectively extracting features across…
HarmBench: Automated Red Teaming for LLM Safety
This paper introduces HarmBench, a new framework for evaluating the safety and robustness of large language models (LLMs) against malicious use. It highlights the growing concern over LLMs' potential…
Jailbreaking LLMs
A long list of papers and articles are reviewed about jailbreaking LLMs:These sources primarily explore methods for bypassing safety measures in Large Language Models (LLMs), often referred to as…
PA-LRP & absLRP
We focus on two evolutions to AX, they focus on advancing the explainability of deep neural networks, particularly Transformers, by improving Layer-Wise Relevance Propagation (LRP) methods. One…
AttnLRP: Explainable AI for Transformers
This paper 2024 introduces AttnLRP, a novel method for explaining the internal reasoning of transformer models, including Large Language Models (LLMs) and Vision Transformers (ViTs). It…
Pixel-Wise Explanations for Non-Linear Classifier Decisions
This open-access research article from PLOS One introduces Layer-wise Relevance Propagation (LRP), a novel method for interpreting decisions made by complex, non-linear image classifiers. The…
Multi-Layer Sparse Autoencoders for Transformer Interpretation
This paper introduces the Multi-Layer Sparse Autoencoder (MLSAE), a novel approach for interpreting the internal representations of transformer language models. Unlike traditional Sparse Autoencoders…
Frequently Asked Questions
AI: AX - introspection has published 8 episodes since August 2025, covering topics in Technology.
AI: AX - introspection is currently dormant with new episodes hourly. Average episode length is 16m.
Sign up on Grep.FM to access contact details for AI: AX - introspection, including email and social media links.