Neural Thickets reframes pretraining as a local distribution over weights: in large, well-pretrained models, many nearby parameter vectors are already task-improving experts, so simple random sampling, top-K selection, and majority-vote ensembling can rival post-training methods like PPO and GRPO.
The foundational algorithm behind Neural Thickets: a zero-training approach that discovers task-specific experts by perturbing pretrained weights and ensembling the best candidates.
Add Gaussian noise: θ̃ = θ + σε
Score each candidate in parallel across GPUs
Keep top-k by task performance
Aggregate via majority vote
Neural Thickets opens multiple interconnected research directions—from theoretical foundations to practical applications. Each thrust feeds back into the others.
Why are task experts so densely distributed near pretrained weights? We characterize the local geometry, connectivity, and mode structure of the weight-space thicket.
How does thicket density change with model size, data volume, and pretraining compute? We derive and validate scaling laws that predict when thickets become exploitable.
Deploying models to new tasks without any gradient updates. We explore applications in low-resource languages, medical imaging, and on-device personalization.
Random search is just the beginning. We are developing evolutionary, gradient-guided, and structured perturbation methods to navigate thickets orders of magnitude faster.
Thicket ensembles resemble posterior sampling. We formalize the link to Bayesian deep learning, uncertainty quantification, and model diversity theory.
We welcome collaborators. Reach out to propose a new direction.
PhD Student, MIT EECS
Associate Professor, MIT EECS
We welcome collaborators