distillation | Chris Ge

Jul 26, 2026	Video Generation Models are General-Purpose Vision Learners
Jul 26, 2026	Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
Jul 14, 2026	AIDE²: The First Evidence of Recursive Self-Improvement
May 18, 2026	Training Agents Inside of Scalable World Models
Apr 13, 2026	What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Apr 08, 2026	Generative Modeling via Drifting
Apr 04, 2026	FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Apr 01, 2026	MaxRL: Maximum Likelihood via Reinforcement Learning
Mar 22, 2026	Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers
Mar 21, 2026	Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Mar 20, 2026	Position-aware Automatic Circuit Discovery
Mar 19, 2026	Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Mar 18, 2026	Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Mar 17, 2026	Attention Residuals
Mar 16, 2026	Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Mar 15, 2026	Causal Abstractions of Neural Networks
Feb 26, 2026	End-to-End Test-Time Training for Long Context
Feb 26, 2026	Interpreting Physics in Video World Models
Feb 21, 2026	Language Models use Lookbacks to Track Beliefs
Feb 20, 2026	Fast KV Compaction via Attention Matching
Feb 17, 2026	BRIDGE: Predicting Human Task Completion Time From Model Performance
Feb 16, 2026	HunyuanVideo: A Systematic Framework For Large Video Generative Models
Feb 14, 2026	Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Feb 14, 2026	Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion
Feb 13, 2026	Stable Flow: Vital Layers for Training-Free Image Editing
Feb 13, 2026	Localizing Knowledge in Diffusion Transformers
Feb 12, 2026	Tutorial on Diffusion Models for Imaging and Vision
Feb 11, 2026	Learning to Discover at Test Time
Feb 10, 2026	ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
Feb 09, 2026	Recursive Language Models
Feb 09, 2026	FLUX.2: Analyzing and Enhancing the Latent Space of FLUX – Representation Comparison
Feb 08, 2026	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Feb 08, 2026	Scalable Diffusion Models with Transformers
Feb 07, 2026	Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Jan 24, 2026	The Adolescence of Technology
Jan 20, 2026	Diffusion Meets Flow Matching: Two Sides of the Same Coin
Jan 12, 2026	mHC: Manifold-Constrained Hyper-Connections
Jan 10, 2026	A Rosetta Stone for AI Benchmarks
Jan 06, 2026	Measuring AI Ability to Complete Long Software Tasks
Jan 06, 2026	On scalable oversight with weak LLMs judging strong LLMs
Jan 05, 2026	Reliable and Efficient Amortized Model-based Evaluation
Dec 26, 2025	SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Dec 26, 2025	SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Dec 25, 2025	Generative Modeling by Estimating Gradients of the Data Distribution
Dec 25, 2025	What are Diffusion Models?
Dec 24, 2025	Building Diffusion Model's theory from ground up
Dec 07, 2025	Weight-sparse transformers have interpretable circuits
Dec 07, 2025	Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features
Dec 07, 2025	Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents
Dec 06, 2025	Fluid Language Model Benchmarking
Dec 05, 2025	Auditing language models for hidden objectives
Dec 04, 2025	DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Dec 03, 2025	DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Nov 30, 2025	Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Nov 24, 2025	ML Tea: Planning and Problem-Solving with General, Scalable Neuro-Symbolic Models
Nov 24, 2025	Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Nov 23, 2025	Continuous Language Model Interpolation yields Dynamic and Controllable Text Generation
Nov 23, 2025	Natural Emergent Misalignment from Reward Hacking in Production RL
Nov 22, 2025	A Mathematical Framework for Transformer Circuits
Nov 19, 2025	Value Augmented Sampling for Language Model Alignment and Personalization
Nov 18, 2025	Deriving Muon
Nov 18, 2025	Curiosity-driven Red-teaming for Large Language Models
Nov 18, 2025	Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Nov 18, 2025	Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
Nov 17, 2025	Domain-Aware Scaling Laws Uncover Data Synergy
Nov 17, 2025	Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Nov 17, 2025	Ambient Diffusion Omni: Training Good Models with Bad Data
Nov 16, 2025	Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Nov 15, 2025	To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Nov 15, 2025	Training Language Models to Self-Correct via Reinforcement Learning
Nov 14, 2025	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Nov 13, 2025	Large Language Diffusion Models
Nov 12, 2025	Teaching AI to see the world more like we do
Nov 11, 2025	Self-Adapting Language Models
Nov 11, 2025	The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
Nov 11, 2025	Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks
Nov 08, 2025	Nested Learning: The Illusion of Deep Learning Architecture
Nov 04, 2025	LoRA Without Regret