How Neural Networks Think at Scale
This document explores how neural networks represent and store information as they scale. It introduces the concept of a privileged basis, which is necessary for neurons to be interpretable. However, in real-world data, there are usually more features than neurons available, leading to “polysemantic” neurons—neurons that respond to multiple unrelated features.
The key idea presented is the Superposition Hypothesis. This hypothesis suggests that neural networks pack more features into neurons by sharing, which can cause interference but allows the network to represent more features than the number of neurons it has. This efficient packing and sharing of information is central to how neural networks manage complex tasks with limited resources.
Foundations of Large Language Models
This textbook offers an introduction to the core concepts behind large language models (LLMs) like GPT and BERT. The book is structured into five main chapters:
- Pre-training (methods and model architectures)
- Generative Models (scaling and training on long texts)
- Prompting (prompt design and advanced strategies such as chain-of-thought)
- Alignment (instruction fine-tuning, human feedback)
- Inference (decoding, acceleration, and scaling at inference time)
The book explains how pre-training on large, unlabeled datasets enables the creation of “foundation models” that can be adapted for a wide range of tasks, marking a shift in NLP from task-specific models trained from scratch to versatile, pre-trained models. It covers different neural network architectures, fine-tuning, prompt engineering, and alignment with human preferences. The content is accessible for readers with or without deep technical backgrounds and is part of an open-access educational resource on neural networks and LLMs. Excellent X/Twitter summary by Alex Prompter.

Leave a Reply