-
Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
Stratego
-
Domain-Aware Scaling Laws Uncover Data Synergy
Data domain synergy in scaling laws
-
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Debiasing data using TRAK
-
Ambient Diffusion Omni: Training Good Models with Bad Data
diffusion using low quality data
-
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Boomerang Distillation