-
Domain-Aware Scaling Laws Uncover Data Synergy
Data domain synergy in scaling laws
-
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
Debiasing data using TRAK
-
Ambient Diffusion Omni: Training Good Models with Bad Data
diffusion using low quality data
-
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Boomerang Distillation
-
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Backtracking vs Best of n