Domain-Aware Scaling Laws Uncover Data Synergy

Data domain synergy in scaling laws

  • data from different domain has synergy (e.g. math and code work well together)
  • adapt the chinchilla scaling law to include two terms: first order effects (directly how much does each domain contribute to the benchmark) and second order effects (synergy among domains)
  • fit the scaling law, look at the coefficients and recover the associations that make sense