Neel Nanda: Mechanistic Interpretability & Mathematics

Описание к видео Neel Nanda: Mechanistic Interpretability & Mathematics

Neel Nanda (Deep Mind)
12 October 2023

Abstract: Mechanistic Interpretability is a branch of machine learning that takes a trained neural network, and tries to reverse-engineer the algorithms it's learned. First, I'll discuss what we've learned by reverse-engineering tiny models trained to do mathematical operations, eg the algorithm learned to do modular addition. I'll then discuss the phenomena of superposition, where models spontaneously learn to use the geometry of high-dimensional spaces to use compression schemes and represent and compute more features than they have dimensions. Superposition is a major open problem in mechanistic interpretability, and I'll discuss some of the weird mathematical phenomena that come up with superposition, some recent work exploring it, and open problems in the field.

Mathematical challenges in AI webpage: https://sites.google.com/view/m-ml-sy...

Комментарии

Информация по комментариям в разработке