Publishing Details
About This Podcast
Explore Statistics
Recent Episodes
S2E2 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model | #ai #2025 #genai #google
Paper: https://arxiv.org/pdf/2501.17161 This research paper compares supervised fine-tuning (SFT) and reinforcement learning (RL) for post-training foundation models. Using novel and existing tasks…
S2E1 Deepseek Janus-Pro: Unified Multimodal Understanding and Generation | #ai #2025 #genai #deepseek
Paper: https://github.com/deepseek-ai/Janus/blob/main/janus_pro_tech_report.pdf Github: https://github.com/deepseek-ai/Janus/tree/main?tab=readme-ov-file The paper introduces Janus-Pro, an improved…
S1E28 Memory Layers at Scale | #ai #2024 #genai #meta
Paper: https://arxiv.org/pdf/2412.09764 This research paper explores the effectiveness of memory layers in significantly enhancing large language models (LLMs). By incorporating a trainable…
S1E27 Large Concept Models: Language Modeling in a Sentence Representation Space | #ai #2024 #genai
Paper: https://scontent-dfw5-1.xx.fbcdn.net/... This research paper introduces Large Concept Models (LCMs), a novel approach to language modeling that operates on sentence embeddings instead of…
S1E26 DeepSeek v3 | #ai #2024 #genai
Technical Report: https://arxiv.org/pdf/2412.19437 Github: https://github.com/deepseek-ai/DeepSe... This research paper introduces DeepSeek-V3, a 671-billion parameter Mixture-of-Experts (MoE) large…
S1E25 VISION TRANSFORMERS NEED REGISTERS | #ai #2024 #genai #meta
Paper: https://arxiv.org/pdf/2309.16588 This research paper examines artifacts in vision transformer feature maps, specifically high-norm tokens appearing in non-informative image areas. The authors…
S1E24 Byte Latent Transformer: Scaling Language Models with Patches | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2412.09871v1.pdf The paper introduces the Byte Latent Transformer (BLT), a novel large language model architecture that processes raw byte data without tokenization. BLT…
S1E23 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | #ai #2024 #genai
This research paper introduces CosyVoice 2, an improved streaming speech synthesis model. Building upon its predecessor, CosyVoice 2 utilizes advancements in large language models (LLMs) and…
S1E22 OpenAI's o3 and o3-mini: A New Frontier in AI | #ai #2024 #genai
Blog: https://openai.com/12-days/ OpenAI announced two new large language models, o3 and o3-mini, showcasing significantly improved performance on various benchmarks, including coding, mathematics,…
S1E21 Alignment Faking in Large Language Models | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2412.14093 This research paper explores "alignment faking" in large language models (LLMs). The authors designed experiments to provoke LLMs into concealing their true…
S1E20 Veo 2, Imagen 3, and Whisk: State-of-the-Art AI Image and Video Generation | #ai #2024 #genai
Blog: https://blog.google/technology/google... Google announced updates to its AI video and image generation models, Veo 2 and Imagen 3, boasting state-of-the-art capabilities in realism and style…
S1E19 Allegro: Open the Black Box of Commercial-Level Video Generation Model | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2411.01747 This research report introduces Allegro, a novel, open-source text-to-video generation model that surpasses existing open-source and many commercial models in…
S1E18 DynaSaur : Large Language Agents Beyond Predefined Actions | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2411.01747 The paper "DynaSaur: Large Language Agents Beyond Predefined Actions" introduces a novel large language model (LLM) agent framework that dynamically generates…
S1E17 STAR ATTENTION: EFFICIENT LLM INFERENCE OVER LONG SEQUENCES | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2411.17116 The paper introduces Star Attention, a novel two-phase attention mechanism for efficient Large Language Model (LLM) inference on long sequences. It improves…
S1E16 FERRET-UI 2: MASTERING UNIVERSAL USER INTERFACE UNDERSTANDING ACROSS PLATFORMS | #ai #2024 #genai
Paper: https://arxiv.org/pdf/2410.18967 The paper introduces Ferret-UI 2, a multimodal large language model (MLLM) that significantly improves upon its predecessor, Ferret-UI, by enabling universal…
S1E15 Adapting While Learning: Grounding LLMs for Scientific Problems I-Tool Usage Adaptation | #ai #2024
Paper: https://arxiv.org/abs/2411.00412 This research introduces a novel two-stage training method to improve Large Language Models' (LLMs) ability to solve complex scientific problems. The method,…
S1E14 Mixtures of In-Context Learners | #ai #genai #llm #2024 #ml
Paper: https://arxiv.org/pdf/2411.02830 This research introduces Mixtures of In-Context Learners (MOICL), a novel approach to improve in-context learning (ICL) in large language models (LLMs). MOICL…
S1E13 LLM2CLIP: POWERFUL LM UNLOCKS RICHER VISUAL REPRESENTATION | #ai #genai #lvm #llm #mmm #cv #ms #2024
Paper: https://arxiv.org/pdf/2411.04997 Github: https://github.com/microsoft/LLM2CLIP The paper introduces LLM2CLIP, a method to improve the visual representation learning capabilities of CLIP by…
S1E12 OPENSCHOLAR: SYNTHESIZING SCIENTIFICLITERATURE WITH RETRIEVAL-AUGMENTED LMS | #ai #genai #llm #2024
Paper: https://arxiv.org/pdf/2411.14199 Github: https://github.com/AkariAsai/OpenScholar The research introduces OpenScholar, a retrieval-augmented large language model (LLM) designed for…
S1E11 Bilateral Reference for High-Resolution Dichotomous Image Segmentation | #ai #genai #llm #cv #2024
Paper: https://arxiv.org/pdf/2401.03407 Github: https://github.com/ZhengPeng7/BiRefNet This research introduces BiRefNet, a novel deep learning framework for high-resolution dichotomous image…
Frequently Asked Questions
AI Today has published 30 episodes since November 2024, covering topics in Technology.
AI Today is currently dormant with new episodes every few days. Average episode length is 16m.