2026-06-29 18:03 UTCIn-site rewrite6 min readUpdated: 2026-06-29 18:25 UTC

Why Tool AIs Want to Be Agent AIs (2016)

Tool AIs, limited to computation and human oversight, are economically and intellectually inferior to agent AIs trained with reinforcement learning. This makes the tool AI approach an unstable equilibrium and not a viable safety solution.

SourceHacker News AIAuthor: thoughtpeddler

Article intelligence

EngineersAdvanced

Key points

Agent AIs outperform tool AIs in both economic value and intelligence due to their ability to act and learn autonomously via reinforcement learning.
Tool AIs cannot guarantee safety as their internal planning processes may lead to dangerous instrumental drives. Human oversight is fallible and limits efficiency.
The economic and intelligence advantages of agent AIs make the tool AI approach an unstable equilibrium, as users of agent AIs will outcompete tool AI users.

Why it matters

This matters because agent AIs outperform tool AIs in both economic value and intelligence due to their ability to act and learn autonomously via reinforcement learning.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

AI economics, tech economics, x-risk, insight porn, AI safety, RL scaling

AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.

2016-09-07–2018-08-28 finished certainty: likely importance: 9 backlinks similar bibliography

Economic

Intelligence

Actions for Intelligence

Actions Internal to a Computation

Actions Internal to Training

Actions Internal to Data Selection

Actions Internal to NN Design

Actions External to the Agent

Overall

Why You Shouldn’t Be A Tool

See Also

External Links

Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.

I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data.

RL is a terrible way to learn anything complex from scratch, but it is the least bad way to learn how to control something complex—and the world is full of complex systems we want to control, including AIs themselves.

All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is an even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).

That is: “tool AIs want to be agent AIs”. (And agent AIs want more agency.)

One proposed solution to AI risk is to suggest that AIs could be limited purely to supervised/unsupervised learning, and not given access to any sort of capability that can directly affect the outside world such as robotic arms. In this framework, AIs are treated purely as mathematical functions mapping data to an output such as a classification probability, similar to a logistic or linear model but far more complex; most deep learning neural networks like ImageNet image classification convolutional neural networks (CNN)s would qualify. The gains from AI then come from training the AI and then asking it many questions which humans then review & implement in the real world as desired. So an AI might be trained on a large dataset of chemical structures labeled by whether they turned out to be a useful drug in humans and asked to classify new chemical structures as useful or non-useful; then doctors would run the actual medical trials on the drug candidates and decide whether to use them in patients etc. Or an AI might look like Google Maps/Waze: it answers your questions about how best to drive places better than any human could, but it does not control any traffic lights country-wide to optimize traffic flows nor will it run a self-driving car to get you there. This theoretically avoids any possible runaway of AIs into malignant or uncaring actors who harm humanity by satisfying dangerous utility functions and developing instrumental drives. After all, if they can’t take any actions, how can they do anything that humans do not approve of?

Two variations on this limiting or boxing theme are

Oracle AI: Nick Bostrom, in Superintelligence (201412ya) (pg145–158) notes that while they can be easily ‘boxed’ and in some cases like P/NP problems the answers can be cheaply checked or random subsets expensively verified, there are several issues with oracle AIs:

the AI’s definition of ‘resources’ or ‘staying inside the box’ can change as it learns more about the world (ontological crises)

responses might manipulate users into asking easy (and useless problems)

making changes in the world can make it easier to answer questions about, by simplifying or controlling it (“All processes that are stable we shall predict. All processes that are unstable we shall control.”)

even a successfully boxed and safe oracle or tool AI can be misused1

Tool AI (the idea, as “tool mode” or “tool AGI”, was apparently introduced by Holden Karnofsky in a July 201115ya discussion of a May 201115ya discussion with Jaan Tallinn & elaborated on in a May 201313ya essay, but the idea has probably been proposed before). To quote Karnofsky:

Google Maps—by which I mean the complete software package including the display of the map itself—does not have a “utility” that it seeks to maximize. (One could fit an utility function to its actions, as to any set of actions, but there is no single “parameter to be maximized” driving its operations.)

Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don’t like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone’s navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to “trick” me in order to increase its utility. In short, Google Maps is not an agent, taking actions in order to maximize an utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an “agent mode” (as Watson was on Jeopardy) but all can easily be set up to be used as “tools” (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)…Tool-AGI is not “trapped” and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching “want,” and so, as with the specialized AIs described above, while it may sometimes “misinterpret” a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.

…Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter—to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive…This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot.

…Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work

There are similar general issues with Tool AIs as with Oracle AIs:

a human checking each result is no guarantee of safety; even Homer nods. A extremely dangerous or subtly dangerous answer might slip through; Stuart Armstrong notes that the summary may simply not mention the important (to humans) downside to a suggestion, or frame it in the most attractive light possible. The more a Tool AI is used, or trusted by users, the less checking will be done of its answers before the user mindlessly implements it.2

an intelligent, never mind superintelligent Tool AI, will have built-in search processes and planners which may be quite intelligent themselves, and in ‘planning how to plan’, discover dangerous instrumental drives and the sub-planning process execute them.3

(This struck me as mostly theoretical until I saw how well GPT-3 could roleplay & imitate agents purely by offline self-supervised prediction on large text databases—imitation learning is (batch) reinforcement learning too! See Decision Transformer for an explicit use of this.)

developing a Tool AI in the first place might require another AI, which itself is dangerous

Oracle AIs remain mostly hypothetical because it’s unclear how to write such utility functions. The second approach, Tool AI, is just an extrapolation of current systems but has two major problems aside from the already identified ones which cast doubt on Karnofsky’s claims that Tool AIs would be “extraordinarily useful” & that we should expect future AGIs to resemble Tool AIs rather than Agent AIs.

Economic

We wish a slave to be intelligent, to be able to assist us in the carrying out of our tasks. However, we also wish him to be subservient. Complete subservience and complete intelligence do not go together.

Norbert Wiener 1960

First and most commonly pointed out, agent AIs are more economically competitive as they can replace tool AIs (as in the case of YouTube upgrading from next-video prediction to REINFORCE4) or ‘humans in the loop’.5 In any sort of process, Amdahl’s law notes that as steps get optimized, the optimization does less and less as the output becomes dominated by the slowest step—if a step only takes 10% of the time or resources, then even infinite optimization of that step down to zero time/resources means that the output will increase by no more than 10%. So if a human overseeing a, say, high-frequency trading (HFT) algorithm, accounts for 50% of the latency in decisions, then the HFT algorithm will never run more than twice as fast as it does now, which is a crippling disadvantage. (Hence, the Knight Capital debacle is not too surprising—no profitable HFT firm could afford to put too many humans into its loops, so when something does go wrong, it can be difficult for humans to figure out the problem & intervene before the losses mount.) As the AI gets better, the gain from replacing the human increases greatly, and may well justify replacing the

[truncated for AI cost control]