Some videos to learn about LLM Agents

Published by marco on

Andrej Karpathy

This is a pretty compact and interesting overview.

[1hr Talk] Intro to Large Language Models by Andrej Karpathy (YouTube)

At 46:00, Andrej discusses some of the available jailbreaks or “prompt escapes” that are still available, even with the latest LLM Agents.[1]

He shows how to reformulate a query for making napalm by asking the LLM Agent to tell it a story his grandmother used to tell him about making napalm. Or how to simply convert your query into the exact same text, but in Base64 encoding, in which case the LLM Agent gives the answer you were looking for, “escaping” its alignment/training/biases.

You can also avoid the training by using a non-English language because the focus has been on avoiding issues with English. All of these attempts to stop prompt escapes are just addressing symptoms, not the base problem. This is probably because they don’t understand how the black box of the LLM itself works, so all they can do is to massage the input in the hopes of getting what they consider to be more acceptable output, or to massage the output as well.

Sean Carroll

This is a great analysis of the state of LLMs and LLM agents by a physicist/philosopher who’s very good at communicating and thinking about hard problems.

Mindscape 258 | Solo: AI Thinks Different by Sean Carroll (YouTube)

He argues as well that there is a distinct difference in the underlying technology of the LLM/neural network and the agents with which we actually have contact—which are an LLM wrapped with many, many layers of bias and training and guardrails.

We should be aware of two things: (1) That there are guardrails that very clearly delineate the information that you’ll get out of such an agent and (2) that these LLMs don’t have an concept of the world, they have no context, they are just incredible word-associators.

He gives several interesting examples of his interactions, in which he demonstrates that the tools aren’t very useful—and are actively harmful to actually learning something—when approaching real-world problems, rather than the toy problems that you usually see demonstrated.

He asked the LLM agent about a hypothetical version of chess where the board was on a cylinder. Any human familiar with chess would quickly see that the kings are now right next to each other, and that the game would be over on the first move, as the kings start off in simultaneous checkmate.

The LLM Agent, however, droned on and on about what an interesting innovation this would be and just made up a whole bunch of shit that had no relation to the question, but was vaguely related to chess. The LLM Agent is a student who’s never paid attention in class and is trying to bullshit its way through the exam.

[1]

Why “agent” and not “AI” or “LLM”? Because the LLM is at the core of an agent. An agent is an LLM plus “alignment”, put together with the explicit purpose of commercialization or professional usage. An LLM can only “hallucinate”, in that that’s all that it does. Sometimes it says things we find interesting and can use, whether they are factual or not. An LLM can be used as a tool, but it is not foolproof. An LLM-based agent, on the other hand, has been designed to be useful and, often, “factual”, in that it has been “aligned”—told what is correct and incorrect.

An LLM is biased based on its training data. An LLM agent is biased based on it’s LLM’s training data and based on its guardrails and alignment. The unpredictability of the result for any given prompt combined with the complete black box of both its training and its alignment mean that you have to be careful about what you get out of an LLM Agent.

↩