distillation

an archive of posts in this category

Apr 13, 2026 What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Apr 08, 2026 Generative Modeling via Drifting
Apr 04, 2026 FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Apr 01, 2026 MaxRL: Maximum Likelihood via Reinforcement Learning
Mar 22, 2026 Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers
Mar 21, 2026 Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Mar 20, 2026 Position-aware Automatic Circuit Discovery
Mar 19, 2026 Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Mar 18, 2026 Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Mar 17, 2026 Attention Residuals
Mar 16, 2026 Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Mar 15, 2026 Causal Abstractions of Neural Networks
Feb 26, 2026 End-to-End Test-Time Training for Long Context
Feb 26, 2026 Interpreting Physics in Video World Models
Feb 21, 2026 Language Models use Lookbacks to Track Beliefs
Feb 20, 2026 Fast KV Compaction via Attention Matching
Feb 17, 2026 BRIDGE: Predicting Human Task Completion Time From Model Performance
Feb 16, 2026 HunyuanVideo: A Systematic Framework For Large Video Generative Models
Feb 14, 2026 Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Feb 14, 2026 Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion
Feb 13, 2026 Stable Flow: Vital Layers for Training-Free Image Editing
Feb 13, 2026 Localizing Knowledge in Diffusion Transformers
Feb 12, 2026 Tutorial on Diffusion Models for Imaging and Vision
Feb 11, 2026 Learning to Discover at Test Time
Feb 10, 2026 ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
Feb 09, 2026 Recursive Language Models
Feb 09, 2026 FLUX.2: Analyzing and Enhancing the Latent Space of FLUX – Representation Comparison
Feb 08, 2026 Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Feb 08, 2026 Scalable Diffusion Models with Transformers
Feb 07, 2026 Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Jan 24, 2026 The Adolescence of Technology
Jan 20, 2026 Diffusion Meets Flow Matching: Two Sides of the Same Coin
Jan 12, 2026 mHC: Manifold-Constrained Hyper-Connections
Jan 10, 2026 A Rosetta Stone for AI Benchmarks
Jan 06, 2026 Measuring AI Ability to Complete Long Software Tasks
Jan 06, 2026 On scalable oversight with weak LLMs judging strong LLMs
Jan 05, 2026 Reliable and Efficient Amortized Model-based Evaluation
Dec 26, 2025 SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Dec 26, 2025 SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Dec 25, 2025 Generative Modeling by Estimating Gradients of the Data Distribution
Dec 25, 2025 What are Diffusion Models?
Dec 24, 2025 Building Diffusion Model's theory from ground up
Dec 07, 2025 Weight-sparse transformers have interpretable circuits
Dec 07, 2025 Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features
Dec 07, 2025 Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents
Dec 06, 2025 Fluid Language Model Benchmarking
Dec 05, 2025 Auditing language models for hidden objectives
Dec 04, 2025 DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Dec 03, 2025 DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Nov 30, 2025 Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Nov 24, 2025 ML Tea: Planning and Problem-Solving with General, Scalable Neuro-Symbolic Models
Nov 24, 2025 Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Nov 23, 2025 Continuous Language Model Interpolation yields Dynamic and Controllable Text Generation
Nov 23, 2025 Natural Emergent Misalignment from Reward Hacking in Production RL
Nov 22, 2025 A Mathematical Framework for Transformer Circuits
Nov 19, 2025 Value Augmented Sampling for Language Model Alignment and Personalization
Nov 18, 2025 Deriving Muon
Nov 18, 2025 Curiosity-driven Red-teaming for Large Language Models
Nov 18, 2025 Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Nov 18, 2025 Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
Nov 17, 2025 Domain-Aware Scaling Laws Uncover Data Synergy
Nov 17, 2025 Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Nov 17, 2025 Ambient Diffusion Omni: Training Good Models with Bad Data
Nov 16, 2025 Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Nov 15, 2025 To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Nov 15, 2025 Training Language Models to Self-Correct via Reinforcement Learning
Nov 14, 2025 ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Nov 13, 2025 Large Language Diffusion Models
Nov 12, 2025 Teaching AI to see the world more like we do
Nov 11, 2025 Self-Adapting Language Models
Nov 11, 2025 The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
Nov 11, 2025 Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks
Nov 08, 2025 Nested Learning: The Illusion of Deep Learning Architecture
Nov 04, 2025 LoRA Without Regret