Reading Notes

Notes and summaries from reading ML/AI papers (and some blog posts). All credit to the content in the papers and blog posts goes to the original authors.

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

SAIA

2 min read · November 24, 2025

2025 · distillation
Continuous Language Model Interpolation yields Dynamic and Controllable Text Generation

continuous model interpolation with lora

1 min read · November 23, 2025

2025 · distillation
Natural Emergent Misalignment from Reward Hacking in Production RL

Anthropic emergent misalignment

5 min read · November 23, 2025

2025 · distillation
A Mathematical Framework for Transformer Circuits

Transformer Circuits Framework

6 min read · November 22, 2025

2025 · distillation
Value Augmented Sampling for Language Model Alignment and Personalization

Value Augmented Sampling

3 min read · November 19, 2025

2025 · distillation