Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
neural thickets
For larger scale models, if you literally just add noise to the weights after pre-training, You have a high chance of landing on a specialist set of weights that are good at one particular downstream task (Figure 2 below)
- Sort of implies that the post-training algorithm use is not that important.

- Figure 2: very specific to larger scale

- Figure 3: Further evidence that for smaller scale models, you get much less diverse and interesting/good models by perturbing your weights.

- Figure 4: These models are actually specialists; they’re not generalists
They boost performance by literally just sampling the top fifty, such randomly chosen models, and then taking a majority vote/ensembling from them (RandOpt)

- Figure 9: The weights in the thickets could improve both reasoning performance and just getting the answer output right.