Catalog Automation: From Weeks to One Day Across Ten Product Lines

If your business sells across more than a handful of product lines, you almost certainly keep a catalog of some kind: a big reference document, or a set of them, that lists every product, its specs, its dimensions, its options, and what it costs. And if you are honest about it, that catalog is probably out of date right now. Not because anyone is lazy. Because keeping it current by hand is a genuinely large job, and the moment you finish, a price moves or an item goes out of stock and it starts drifting again.

This is the unglamorous bottleneck I want to walk you through, because it is more expensive than it looks and because the fix is a good example of a pattern you can use far beyond catalogs. The manufacturer I build for maintains a product reference across about ten product lines. Keeping it accurate by hand was costing roughly three to four days of careful work per product line. Across the whole set, that is the better part of a month of analyst time for one refresh, and it had to happen again every time the underlying products changed. After the work I am about to describe, a full refresh takes about a day, and it is current by construction rather than by effort. Here is how, and more importantly, how you would do the same thing.

Why the hand-kept catalog quietly costs so much

Start with what maintaining a catalog by hand actually involves, because the cost is hidden in the details. You are not just typing. For each product line you are reconciling several sources at once: the current spec for each item, the current price, what is actually in stock, the dimension drawings, the option lists, the finishes. Each of those lives somewhere else, and a person has to go find the latest version of each, decide which is right, and transcribe it into the catalog. Do that across ten product lines and you have a multi-sheet workbook with thousands of cells, every one of which is a small opportunity to be wrong or out of date.

Then notice the second cost, the one that does not show up on a timesheet. A catalog that takes weeks to refresh does not get refreshed often. So it is stale most of the time, which means the people who rely on it, sales, customers, whoever, are working from numbers that quietly stopped being true. For a small manufacturer this is a real competitive problem. The large e-commerce sellers you are up against keep their catalogs fresh automatically, because their catalog is just a view of their live data. Yours is a document someone maintains. That gap, days of analyst time versus none, is exactly the kind of thing that keeps smaller operations a step behind, and it is worth closing. It is also a small instance of a much larger pattern, the one where the smaller manufacturers who make up most of the sector keep getting left behind the tools, which I wrote about in the backbone without the tools.

The decision: stop maintaining the catalog, start generating it

Here is the reframe, and it is the whole piece in one sentence. A catalog is not a document you should be editing. It is a view of data you already have. Your products already exist as records. Your inventory already exists as live data. The catalog is just those facts, arranged and formatted. So the move is not to find a faster way to hand-edit the catalog. It is to stop hand-editing it at all, and generate it from the live source every time you need it.

If you have read the framework I work from, this is the same thesis applied to one artifact: once your product is decomposed into real data, the things above it, pricing, quoting, and yes, the catalog, become functions of that data rather than things you maintain separately. The catalog was the most visible place that promise had not yet been kept, so it was a good place to keep it.

The same catalog, kept two ways. Maintained by hand, it is stale the moment you stop editing. Generated from the live product data through one generator, with every value carrying its source and a changelog, it is current by construction.

How it actually works

The build has three parts, and the order matters, because the third part is what makes the first two trustworthy enough to rely on.

First, generate from the live source. Instead of a person curating each product line, a generator pulls the current product data and the newest inventory snapshot directly from where they already live, merges in the specs and naming that genuinely do need human curation, and emits the catalog, both the working reference and the print-ready version. The pricing and stock are never transcribed; they are read from the source at generation time. Run it, and you get a catalog that matches reality as of a minute ago instead of as of whenever someone last had a free afternoon.

Second, separate the two kinds of catalog content. This is the subtle part, and it is why “just automate it” is not enough on its own. Some of a catalog is derived (price, stock, dimensions, the parts that come straight from data) and some of it is curated (the carefully written description, a cleaned-up product name, the things a human decides). If you regenerate everything, you blow away the curation every time. So the generator keeps the curated layer separate and merges it in, which means a regeneration refreshes all the derived facts without touching the human judgment. You get freshness and craft, not one at the expense of the other.

Third, and most important, make it auditable. A generated catalog is only useful if you trust it, and trust comes from being able to answer “where did this number come from, and when was it last true.” So every field carries its own provenance: which source it came from, when it was first seen, when it was last verified, and when it last changed. There is a sources list, and an append-only changelog that records every field-level change over time. This is the part that turns automation from a black box into something you can actually stand behind. When a customer asks why a spec is what it is, you are not guessing. The catalog can tell you.

That third piece is the one most people skip, and it is the one I would tell you not to skip. The same discipline shows up everywhere I build: a result you cannot trace is a result you cannot defend, which is the entire argument of why you cannot just ask an AI for a quote. A generated catalog without provenance is just a faster way to publish numbers you cannot vouch for.

The part that was actually hard

If you take this on, the hard part is not the part that sounds hard. Pulling live data and formatting it into a catalog is routine work. The genuinely tricky piece is the boundary between the derived facts and the curated ones, and what to do when they disagree.

Here is the situation that forces the issue. The live source says a product’s dimension is one thing; the curated spec a person wrote last quarter says something slightly different. Which one wins? You cannot just always take the live value, because sometimes the person caught something the source data has wrong. And you cannot always keep the curated value, because then you are right back to a stale snapshot. The answer is not a clever rule that decides automatically. It is to make the disagreement visible. The changelog records that a field changed and what it changed from, so a person can glance at the diff and make the call, instead of the system either silently overwriting good judgment or silently preserving bad data.

That is the real lesson hiding inside a catalog automation, and it is worth more than the time saving. Automating the easy ninety percent is worth doing, but the value lives in how you handle the contested ten percent. A system that hides its conflicts is one you stop trusting the first time it is confidently wrong. A system that surfaces them is one you can actually hand the work to and walk away.

The result

A refresh that took roughly three to four days of hand-work per product line now takes about a day for the whole catalog, across all ten lines. But the time saving is honestly the smaller half of the win. The larger half is that the catalog is now current by default. It is no longer a document that decays between heroic manual updates; it is a view that reflects the live product and inventory data whenever it is generated, with a full audit trail of how every value got there. The analyst days that used to go into transcription go into the work that actually needs a human instead.

And notice what this is an instance of. The catalog stopped being a thing the business maintains and became a thing the business produces, on demand, from data it already has. That is the same move as turning a hand-built quote into a calculated one, or a hand-kept report into a generated one. It is the document-generation layer of the platform doing what it is for: assembling what reaches the outside world from the system, instead of having someone rebuild it by hand each time.

What this means for you

If you maintain any document by hand that is really derived from data you already have, a catalog, a price list, a spec sheet, a capabilities deck, you are sitting on the same opportunity. The instinct is to make the editing faster: a better template, a cleaner spreadsheet, a part-time hire. Resist it. The faster editing still leaves you with a document that is stale the moment you stop typing.

Do this instead. First, find where the real data already lives, and accept that the document should be a view of it, not a copy. (If the data does not exist in clean form yet, that is your actual first job, and I wrote about why the boring data work comes before everything.) Second, split the derived parts from the curated parts so regenerating never destroys the human work. Third, give every value provenance, so the output is something you can trust and trace, not just something you can produce quickly. Get those three right and the document stops being a chore you fall behind on and becomes a button you press.

The weeks-to-a-day number is the headline, and it is real. But the thing worth taking away is quieter: most of the documents you maintain by hand are snapshots of data you already have. You do not need a faster way to keep the snapshot current. You need to stop keeping a snapshot at all.