|<<>>|22 of 309 Show listMobile Mode

Andrej Karpathy explains LLM construction and training

Published by marco on

This is a 210-minute video about LLMs are built and trained. What works? What doesn’t? The whole thing is well-worth your time if you’re at-all interested in learning about what the inherent limitations are, so you can better leverage these tools. For example, “models need tokens to think” was great.

Deep Dive into LLMs like ChatGPT by Andrej Karpathy (YouTube)

  • 00:00:00 introduction
  • 00:01:00 pretraining data (internet)
  • 00:07:47 tokenization
  • 00:14:27 neural network I/O
  • 00:20:11 neural network internals
  • 00:26:01 inference
  • 00:31:09 GPT-2: training and inference
  • 00:42:52 Llama 3.1 base model inference
  • 00:59:23 pretraining to post-training
  • 01:01:06 post-training data (conversations)
  • 01:20:32 hallucinations, tool use, knowledge/working memory
  • 01:41:46 knowledge of self
  • 01:46:56 models need tokens to think
  • 02:01:11 tokenization revisited: models struggle with spelling
  • 02:04:53 jagged intelligence
  • 02:07:28 supervised finetuning to reinforcement learning
  • 02:14:42 reinforcement learning
  • 02:27:47 DeepSeek-R1
  • 02:42:07 AlphaGo
  • 02:48:26 reinforcement learning from human feedback (RLHF)
  • 03:09:39 preview of things to come
  • 03:15:15 keeping track of LLMs
  • 03:18:34 where to find LLMs
  • 03:21:46 grand summary