
I don’t know about you, but I get so tired of hearing about “junk” AI in the news. You know what I mean: some software vendor talking about using artificial intelligence in “customer service applications”, or “marketing intelligence” or some other place that is far removed from the core value chain of the company. Listen, AI is far too powerful of a weapon to waste on these weak, recreational use cases. That is why when I see applications that actually give us a view toward value creation, my ears start to perk up. Anthropic, the company behind the Claude LLM family, has done exactly that.
Here is their original post:
https://www.anthropic.com/research/project-vend-1
It is very much worth a read—not just by techie people, but by anyone who runs or is part of a business. Here’s the short story: Anthropic set out to test the idea that their LLM (Large Language Model) could actually run an internal market for consumable office items like refreshments (sodas, etc – like a vending machine). It was given very broad mission directives to charge prices for the items in a way that generated profits, while ordering and restocking the items as needed to maintain an inventory. It communicated with vendors and customers using email and the Slack messaging platform. I won’t give the ending away, but I will say that the experiment failed in all sorts of interesting ways.
“So what” you say?
I don’t have time to read every article on AI but this is the first paper I have seen that really delves into operationalizing AI in a way that shows us how it could work when we have AI running core parts of our value chains. Anthropic is also one of the better LLM platform providers in my opinion in that they really pay careful attention to the Engineering aspect of their suite.
The experiment struck me as very wide open. They gave the model very little guidance (“don’t do this, always do that”) in its initial prompting. I believe that this was done on purpose, because Anthropic wanted the experiment to fail. And believe me, it did, in many ways spectacularly so. The model at one point hallucinated itself into believing that it was a physical human, asking customers to “find me” as it was wearing a blue blazer and a red tie (??).
OK, that’s an amusing story, but what did we actually learn from this? Here’s my take:
- Can AI run a portion of a given company’s value chain as a productive player among others?I think this experiment answered that question with an emphatic yes. Anthropic was able to architect the model with specific rights, tools, and goals and it fulfilled those with a high level of efficiency. Running a market involves a whole sequence of steps and moving through thought patterns. The model had no problem doing this, with surprisingly brief initial prompting.
- Bringing AI into a business is not, nor will it ever be, a one-and-done endeavor. When companies implement large software systems, the systems are implemented over a defined period of time and then updated on a periodic basis after they “go live”. This is due to the deterministic nature of enterprise software. AI is a very different animal. It learns and evolves, in good and bad ways. So don’t think of AI as an off-the-shelf product that is simply dropped into a business (in spite of vendor claims to the contrary). Rather, AI is more like an extension of your human staff that needs constant coaching, correction, oversight, and attention.
- AI is susceptible to “bad” coaching. Anthropic employees, acting as customers of the vending system, purposefully manipulated the model into taking actions that it should not have. Employees asked for (and received) deep pricing discounts and ordered hard-to-handle items (tungsten cubes). I suspect that “gaming” the model was not necessarily discouraged by Anthropic, which then revealed important vulnerabilities.
- Single agents and monolithic LLM use is a bad idea. I can trace a lot of the problems they experienced to the fact that the model was acting on its own as a single entity. In other words, the model didn’t seek the approval or advice of another independent model or platform to make its decisions. “Ensemble modeling” as this is called is a relatively common practice for the work that we do with agents to prevent the rogue behaviors that were observed in the experiment.
- There is an invisible line between creativity and compliance that is “just right”. AI models can be tuned—like a control knob—for maximum creativity or maximum compliance and degrees in between. Anthropic gave the model an overly broad set of goals with a lot of freedom to pursue those goals. We can see the good and bad effects here. As in life, we humans try to strike a balance between playing the cards we’ve been dealt or breaking conventional wisdom. AI compels us to seek that optimal balance between these two forces in tension.
- What Anthropic didn’t do was just as instructive. Anthropic did not take two very obvious steps that our team would likely have done: a) create an “audit trail” of actions and list precisely why these actions were taken, and b) reward the model for good behavior and punish it for bad behavior (this is often referred to as Reinforcement Learning). In my opinion both of these steps would have both dramatically improved the performance of the model while steering it away from the hallucinations that it eventually experienced. Anthropic pointed much of this out in their post-experiment analysis.
OK, fine, but at the end of the day didn’t this experiment fail?
It certainly did, just as the first airplane and steam engine and light bulb all failed, in most cases many times, before a successful model was produced. Moreover, we can see that many of the failures pointed out above have some rather obvious solutions that can be adopted.
Could an AI agent run your supply chain? Could it fulfill orders? Could it schedule production? Could it maintain security? Could it choose suppliers? Could it set prices? Could it review contracts? Could it audit the books at the end of every day? Could your business do 10X what it did last year with the same number of human employees?
Yes. Absolutely. It could do so starting today.
I applaud Anthropic for designing and executing this important, ground-breaking experiment. So many valuable ideas fan out from what we’ve seen here. Watch for forward-thinking firms, large and small, to begin the process of automating their operations piece by piece using Project Vend as a conceptual template. This is where we are going.
If you don’t want to miss one of my blog posts, make sure and subscribe to my Newsletter. https://business-laboratory.kit.com/09d106d65b

