This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Simon Willison on LLMs

Description

Simon Willison continues to plug along, examining every LLM-related announcement and trying it out on his own machine wherever possible. The following video is a presentation he gave in early August. It's quite interesting and worth the ~40 minutes. <media href="https://www.youtube.com/watch?v=h8Jth_ijZyY" src="https://www.youtube.com/v/h8Jth_ijZyY" caption="'Catching up on the weird world of LLMs' - Simon Willison (North Bay Python 2023)" author="Simon Willison" width="560px" source="YouTube"> At some point, he says: <bq>This is Vicuna 7b. It is a large language model. It is a 4.2GB file on my computer right now. [...] If you open up that file, it's just numbers. These things are giant, binary blobs of numbers---and anything you do with them just involves vast amounts of matrix multiplication. And that's it. That's the whole thing. It's this opaque blob that can do all sorts of weird and interesting things.</bq> His description suggests <i>mystery</i>, but he's really just described an executable file with machine code in it. Actually, he's described <i>any file</i>, which, absent any form of inferred encoding that we consider to be "human-readable", is just 1s and 0s. It's actually ... what is it? A file is actually just a set of bunched electron configurations in a special material, where we interpret the bunched parts to be 1s and the sparse parts to be 0s. We interpret those 1s and 0s as a pattern that we call a "file system". The structure is a language that we've invented to express complexity. There are several layers of it. The material contains these bunches and we have circuits to read out these bunches <i>reliably</i>. Those sequences of 1s and 0s are interpreted <i>bytes</i>, interpreted through the lens of 2's complement, from which we derive <i>numbers</i> of various sizes. Some of those numbers we call <i>characters</i>, that we interpret with a specific <i>encoding</i>. The only difference being that we understand the instruction set of the machine code, we understand the virtual machine for which it forms instructions. We ought to: we built it all. The LLM, on the other hand, is an opaque runtime that we don't really understand, in the sense that we didn't design the circuits or the instruction set. All we know is that it has an input and output system onto which we can build plugins that allow us to use natural language to poke it, and to interpret its results as natural language. It's a mysterious process, but not for the reasons implied by the description above. A giant heap of numbers is a description of <i>any file</i>, even text files. The only reason we understand them as "text files" is that we assume an encoding for the 1s and 0s and derive meaning from there. Notes: <ul> The analogy he draws between the iOS Keyboard prediction and LLMs is good. It's just a matter of scale. It's interesting to see how he uses the LLM in his daily work He also shows tools that he's written that incorporate LLMs (e.g., one that reformulates his queries as more sophisticated prompts that are more likely to return usable results) <div>He mentions several times that people are just poking around at these things, but there is little rhyme or reason to it. He cites one example of how it took two years for someone to discover that the model returns more reliable answers when you ask it to <iq>go step by step</iq>. There might be a plethora of other goodies like that hidden in there---or there might be nothing. I am, once again, reminded of <i><a href="{app}/view_article.php?id=3230">Roadside Picnic</a></i>. </div> <div>He goes on to discuss the data that contributed to it, and how he's <iq>very concerned</iq> about the provenance of most of it. He doesn't get into it more than that, but I will. Essentially, the same companies that will sue the ever-loving Christ out of anyone who uses anything of theirs that they claim to have copyrighted now simultaneously claim that their complete and utter disregard for copyright protection is obviously the thing that we want to do, because otherwise how would we even get all of this awesome stuff from which we're hoping to profit immensely? So, they're basically arguing that they can steal content from everyone without actually allowing anyone else to participate in this glorious world in which it's OK to use each other's content without permission. A nice trick, available only to very wealthy companies and individuals. </div> He is one in a long line of people who is impressed by the way that these models can translate to French (or whatever) when he can't actually read French---because a cursory glance at the translation shows that it's not at all an accurate translation of the original, missing many idioms, etc. </ul>