This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.
Title
Andrej Karpathy explains LLM construction and training
Description
This is a 210-minute video about LLMs are built and trained. What works? What doesn't? The whole thing is well-worth your time if you're at-all interested in learning about what the inherent limitations are, so you can better leverage these tools. For example, "models need tokens to think" was great.
<media href="https://www.youtube.com/watch?v=7xTGNNLPyMI" src="https://www.youtube.com/v/7xTGNNLPyMI" width="560px" source="YouTube" author="Andrej Karpathy" caption="Deep Dive into LLMs like ChatGPT">
<ul>
00:00:00 introduction
00:01:00 pretraining data (internet)
00:07:47 tokenization
00:14:27 neural network I/O
00:20:11 neural network internals
00:26:01 inference
00:31:09 GPT-2: training and inference
00:42:52 Llama 3.1 base model inference
00:59:23 pretraining to post-training
01:01:06 post-training data (conversations)
01:20:32 hallucinations, tool use, knowledge/working memory
01:41:46 knowledge of self
01:46:56 models need tokens to think
02:01:11 tokenization revisited: models struggle with spelling
02:04:53 jagged intelligence
02:07:28 supervised finetuning to reinforcement learning
02:14:42 reinforcement learning
02:27:47 DeepSeek-R1
02:42:07 AlphaGo
02:48:26 reinforcement learning from human feedback (RLHF)
03:09:39 preview of things to come
03:15:15 keeping track of LLMs
03:18:34 where to find LLMs
03:21:46 grand summary
</ul>