Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection

Debiasing data using TRAK

  • problem: data bias can cause worse performance on specific groups. Normal debiasing methods just remove data, which isn’t ideal.
  • used predictive data attribution method (TRAK) to figure out which data points contribute most to the worst group’s performance on a small validation dataset, just get rid of those data points
  • this paper seems mostly just like an application of TRAK