-
The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
TTT for ARC and BBH
-
Reasoning or reciting? Exploring the capabilities and limitations of language models through counterfactual tasks
Reasoning or Reciting
-
Nested Learning: The Illusion of Deep Learning Architecture
Deepmind Nested Learning short paper
-
LoRA Without Regret
LoRA blogpost