This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Andrej Karpathy explains LLM construction and training

Description

This is a 210-minute video about LLMs are built and trained. What works? What doesn't? The whole thing is well-worth your time if you're at-all interested in learning about what the inherent limitations are, so you can better leverage these tools. For example, "models need tokens to think" was great. <media href="https://www.youtube.com/watch?v=7xTGNNLPyMI" src="https://www.youtube.com/v/7xTGNNLPyMI" width="560px" source="YouTube" author="Andrej Karpathy" caption="Deep Dive into LLMs like ChatGPT"> <ul> 00:00:00 introduction 00:01:00 pretraining data (internet) 00:07:47 tokenization 00:14:27 neural network I/O 00:20:11 neural network internals 00:26:01 inference 00:31:09 GPT-2: training and inference 00:42:52 Llama 3.1 base model inference 00:59:23 pretraining to post-training 01:01:06 post-training data (conversations) 01:20:32 hallucinations, tool use, knowledge/working memory 01:41:46 knowledge of self 01:46:56 models need tokens to think 02:01:11 tokenization revisited: models struggle with spelling 02:04:53 jagged intelligence 02:07:28 supervised finetuning to reinforcement learning 02:14:42 reinforcement learning 02:27:47 DeepSeek-R1 02:42:07 AlphaGo 02:48:26 reinforcement learning from human feedback (RLHF) 03:09:39 preview of things to come 03:15:15 keeping track of LLMs 03:18:34 where to find LLMs 03:21:46 grand summary </ul>