Domain-Aware Scaling Laws Uncover Data Synergy
Data domain synergy in scaling laws
- data from different domain has synergy (e.g. math and code work well together)
- adapt the chinchilla scaling law to include two terms: first order effects (directly how much does each domain contribute to the benchmark) and second order effects (synergy among domains)
- fit the scaling law, look at the coefficients and recover the associations that make sense