Why You Can't Just Ask AI for a Quote

Here is something that looks completely reasonable and is not. You have a spreadsheet of your prices, a few notes on what materials tend to cost, and a takeoff list for a project, say the interior build of a house. You paste all of it into a capable AI model and ask it to research the rest and give you a quote. A short while later you have a beautiful document: itemized, formatted, confident, the kind of thing you would be happy to send a client. And you have almost no way to know whether the number at the bottom of it is anywhere near right.

That is the part worth sitting with. It is not that the quote is obviously bad. It is that it is quietly, possibly badly wrong, and it does not look wrong at all. If you are going to use AI for anything where the answer has to be correct, not just plausible, this is the trap to understand first, because almost everyone walks into it.

Why the pretty quote is lying to you

Think about what the model would actually have to know to produce that quote correctly. The real labor for every component, not the average but the specific. The contingencies. The things that routinely go wrong on a build and the cost of absorbing them. How the hardware fits together, what goes with what, what cannot. Inventory pricing that already accounts for shipping. The order of operations, and what each step depends on. That is an enormous, specific, interconnected body of knowledge about your operation, and the model does not have it. It has a general sense of how things like this tend to go.

And some of what it is missing is not just specific, it is structurally invisible to a spreadsheet of averages. Take inventory pricing. The real cost of a material is rarely its list price. It is the list price plus shipping, and shipping depends on the quantity you order, the supplier you order it from, and when. Or take contingencies: the offcut you will waste, the panel that arrives damaged, the step that has to be redone because the one before it was done wrong. An experienced estimator carries all of that quietly in their head and pads the number to match. The model has no idea any of it exists, so it quotes the clean, frictionless version of the job, the one that never actually happens.

So when you ask it for a number, it does the only thing it can. It fills every gap with a plausible average and hands you the result with total confidence. The quote is not a calculation. It is a very articulate interpolation between things the model has seen before, dressed in the format of a calculation. And the dressing is the dangerous part, because a confident, well-formatted document reads as authoritative whether or not anything underneath it is true.

You might think you can fix this by being more careful with the prompt. Break the job into steps. Feed it more context each turn. Chain the prompts together so each one handles a piece. It helps a little, and it does not solve the problem, because every handoff between steps loses information and every step adds a little noise, and by the end you have compounded a dozen small distortions into one clean-looking total. The failure mode here is not the gibberish you can catch and laugh at. It is a professional document that is over- or underpriced by a margin you cannot see. Sometimes, by luck, it even lands close, which is the worst outcome of all, because it teaches you to trust the machine that produced it.

If you have worked with these models for a while, you already know the texture of this. The failure is not that the model breaks. It is that it is confident and plausible and sometimes simply wrong, and you cannot tell which from the surface.

A model is only as good as what you give it to work with

Here is the thing people keep getting wrong, and it gets more wrong, not less, as the models improve. A more sophisticated model does not fix this. No matter how good the model is, no matter what it was trained on, it is only ever as good as the harnesses, context, knowledge bases, tools, and data it can actually reach, plus how well it knows how and when to use them. A frontier model with a bare prompt is a brilliant guesser. The same model wired into real tools is a different thing entirely, and the difference is not the model. It is everything you built around it.

The leverage, in other words, was never in the prompt. It is in the harness. This is the same point I make about the systems themselves: the intelligence is only as good as everything underneath it, which is exactly why you build the intelligence last and the foundation first.

Put it bluntly: the model’s job is to be smart about the things that genuinely need judgment. It is not the place your facts should live. The moment you ask it to recall a labor rate or a material cost, you have handed a lookup to a thing that does not look anything up. It predicts. Prediction is wonderful for language, and it is exactly the wrong instrument for a number that has a single right answer sitting in a database you already own.

Two paths to the same quote. Hand a model your inputs and it fills the gaps with plausible guesses, so the document looks right and the price is not. Route the same request through tools that actually track labor, materials, and install, and every number traces back to something that computed it.

What a real quote actually requires

So what does the other path look like? Let me walk you through it with cabinetry, because that is what I build, but the shape of it transfers to anything that has to be priced for real.

Start with the rule that everything else follows from: you do not let the model estimate the things you can measure. You build the parts that measure them.

Labor. You do not ask the model how long the work takes. You build components that track and measure labor directly, so the number in the quote comes from something that recorded it, not from something that guessed at it. The model can reason about the result. It does not get to invent it.

Materials. You do not let the model imagine the bill of materials. You build the parts that track materials properly: which materials go into a given item and in what ratios, how much of each is actually used, how the items are organized and sorted, which ones belong to which room. The bill of materials is computed from the product, not narrated from a vibe. That distinction is the whole game.

Installation, build, and hardware. You account for what the install actually involves, what pieces go together and what pieces do not, the compatibility rules between components. These are constraints, and constraints are precisely the thing a free-form model is worst at respecting and a small piece of deterministic code is best at. A model will happily quote two parts that cannot physically coexist. A check will not.

The things that quietly eat the margin. The parts that are easiest to forget get encoded too, because those are the ones that turn a profitable job into a loss. The waste built into how material is cut and nested. The rework when a step depends on one that changed. The cost of an order that has to ship in two batches instead of one. You do not ask the model to remember these, because it will not. You build them into the system that produces the number, so they are accounted for every single time instead of whenever someone happens to think of them.

Once those pieces exist, the agent’s job changes completely. It is no longer the thing that knows the answer. It is the thing with the context to use each of these tools and databases to arrive at the answer. The reasoning is genuinely the model’s. The facts are looked up and computed. That division of labor is the entire point, and it is why the systems I build are so heavily agentic and, at the same time, wrapped in so much programmatic search, structured data, and calculation.

And even then you are not done, because a system that produces money figures for real customers cannot be allowed to be creative about it. You document what it produced. You build templates and blueprints so the same kind of quote comes out the same way every time, instead of being re-derived from scratch and drifting. And you wrap the whole thing in safety procedures and guardrails, the checks that catch a number before it reaches a customer rather than after. None of that is glamorous. All of it is the difference between a number you can defend and a number you are hoping is right.

Stop trying to make the model smarter

If you take one practical thing from this, take this: when you need AI to be correct rather than merely convincing, stop pouring your effort into better prompts. You cannot out-prompt a fact the model does not have. Spend that effort instead on giving it real tools, real data, and real calculations, and on teaching it how and when to reach for each one. The model is the reasoning glue. The harness is where the truth lives.

That is the whole reason the systems I build look the way they do. Very agentic, yes, but the agent is surrounded by programmatic searches, structured databases, deterministic math, documented procedures, and guardrails. The agent decides what to do. The harness makes sure what it does is grounded. None of this is exotic, and all of it sits downstream of two earlier disciplines: decomposing the product until it can explain itself, and doing the boring data work before the intelligence layer. Get those right and the agent has something solid to stand on. Skip them, and no amount of model quality will save the quote. If you want to see what the harnessed version actually produced, I wrote about what I built and the platform it became.

The test to carry

So here is how you tell the two apart, and it takes about ten seconds. Pick any number in the quote and ask where it came from. If the answer is “a tool that measured it or computed it,” you have a quote. If the answer is “the model decided it seemed about right,” you have a confident guess in a nice font, and you are one unlucky job away from finding out which.

The pretty document was never the proof. It is the easiest thing in the world for a model to generate and the most expensive thing for you to trust. The work is not getting a better-looking answer. The work is making sure that when the answer looks right, it is right, because something underneath it actually did the math.