-
Fast KV Compaction via Attention Matching
Zweiger KV compaction
-
BRIDGE: Predicting Human Task Completion Time From Model Performance
Predicting Human Time from IRT difficulty
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Hunyuan video
-
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
activation patching and attention maps in MM-DiTs
-
Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion
scaling text embeddings for Image Editing