Software & Tools

AI Agents in Complex Engineering Systems: Why Data Structure Decides Everything

Thomas AubertJuly 1, 202611 min

Large language models are moving from the factory floor to the engineering backbone. The first industrial AI wave was about sensor data: predictive maintenance, computer vision, energy optimization. The second wave targets something harder and more valuable: the engineering data itself. Requirements, multi-level bills of materials, traceability matrices, compliance files. This is where product knowledge actually lives, and it is where LLMs can either save engineering teams thousands of hours or quietly corrupt the technical baseline of a regulated product.

This article takes a technical look at both outcomes. It covers where AI genuinely helps in industrial engineering, why AI agents fail in complex systems, why hallucination is a data problem before it is a model problem, and how an Engineering Operating System like Koddex, through its MCP server, governance model, and lock/revision/audit mechanisms, makes agentic AI usable in environments where every design decision must survive an audit.

Where LLMs actually add value in engineering

Skip the generic "AI will transform industry" claims. Here is what LLMs are demonstrably good at when applied to engineering data:

Reading and cross-referencing large volumes of semi-structured text. A certification file for an aerospace subsystem or a medtech technical file contains thousands of requirements, test procedures, and justifications. Checking consistency across them is exactly the kind of exhaustive, low-creativity work humans do badly and slowly.

Translating between abstraction levels. Decomposing a system requirement into subsystem requirements, drafting verification criteria from a requirement statement, mapping a clause of a European standard to the design elements it constrains. These are pattern-matching tasks over structured relationships, and LLMs handle them well when the relationships are explicit.

Answering natural language queries over structured data. "Which requirements in this release have no linked test?" "Which components in this BOM are affected by the new revision of this standard?" Today, answering these questions means one of three people in the company opening four tools and building a spreadsheet. An LLM with proper data access answers in seconds, and anyone can ask.

Bulk construction of structured data. Building a multi-level BOM, instantiating a requirement set from a standard, creating traceability links at scale. This is data entry with structure, and agents do it orders of magnitude faster than humans.

The common thread: none of this requires the model to invent anything. It requires the model to read, traverse, and write structured data reliably. Which brings us to the failure modes.

Why AI agents fail in complex systems

Hallucination is a context problem

The most discussed LLM failure mode is hallucination: the model produces a plausible, confident, wrong answer. The mechanism is worth understanding precisely, because it dictates the fix.

An LLM does not retrieve facts. It generates the most probable continuation given its context. When the context fully determines the answer, the model is accurate. When the context is incomplete, ambiguous, or contradictory, the model interpolates. It fills gaps with statistically plausible content. That is not a bug to be patched; it is the operating principle of the architecture.

Now consider the typical engineering data landscape in a hardware company: requirements in Word and Excel, BOMs in the ERP and in three divergent spreadsheets, test results in a project tool, compliance mappings in someone's local folder, and tribal knowledge in email threads. Feed that to an LLM and you have engineered the exact conditions under which hallucination is guaranteed. The model must infer which document version is authoritative, guess implicit relationships between artifacts, and reconcile contradictions it has no basis to resolve. It will do all three, confidently.

Concrete failure: you ask an agent whether safety requirement REQ-SYS-0142 is covered by a validation test. The agent finds a test procedure referencing REQ-SYS-0142 in a document from eight months ago. It answers yes. The requirement was revised since; the test covers the obsolete version. Nobody catches it until the certification audit, or until the field.

The conclusion engineers should draw: in industrial contexts, hallucination is primarily a data quality and data structure problem. A better model on fragmented data still hallucinates. A mid-tier model on a clean, typed, single source of truth mostly does not, because there are no gaps to fill.

Ungoverned write access

Reading is the easy half. The productivity gains of agentic AI come from writing: creating items, setting attributes, restructuring trees. And write access without governance is how you corrupt a baseline.

In a complex system, modifications propagate. Change a component in a BOM and you potentially invalidate mass budgets, safety analyses, supplier qualification files, and every requirement allocated to that component. Experienced engineers carry a mental model of these propagation paths. An agent operating through an API has no such model unless the system exposes the dependency graph explicitly and enforces rules at the data layer.

Prompt-level guardrails ("do not modify frozen items") are not guardrails. They are suggestions to a probabilistic system. Anyone who has run agents in production knows that instructions in context get ignored under long contexts, tool-call chains, and edge cases. Enforcement must live in the platform, not in the prompt.

Broken accountability

Regulated industries run on attributable decisions. Who changed this requirement, when, why, and who approved it. If agents write to engineering data without their actions being versioned and logged with the same rigor as human actions, the traceability chain breaks. An auditor who finds unattributed modifications in a design history file does not care that an AI made them. The finding is the same: loss of configuration control.

Uncontrolled data exposure

Connecting an LLM to engineering data also raises scope questions: export-controlled data in defense programs, confidential design data, supplier pricing. An agent that can read everything is an exfiltration channel. Access control must apply to agents exactly as it applies to users.

The architectural answer: structure first, then connect, then govern

There are two responses to these risks. Ban AI from engineering data and accept a widening productivity gap against competitors who did not. Or build the conditions under which agentic AI is safe. The second requires three layers, in order.

Layer 1: a typed, single source of truth

Koddex is an Engineering Operating System: it unifies requirements, components, traceability, and compliance data into one backbone with an explicit data model. Every element is a typed object with defined attributes and explicit relations. A requirement knows which tests verify it. A component knows which assemblies contain it. A clause of a standard knows which requirements implement it. Revisions are first-class: each item carries its revision tree, and one version is authoritative.

This is the anti-hallucination layer, and it is worth being precise about why. An LLM connected to this graph does not infer relationships; it traverses them. It does not guess which version is current; the model of record says so. It does not reconcile contradictory spreadsheets; there are none. The gaps that force interpolation are gone, so the model's output degenerates, in the best sense, into reading. The agent's answer to "is REQ-SYS-0142 covered" becomes a graph query result, not a probabilistic reconstruction from stale documents.

Put differently: structuring the data is what allows the AI to understand the system. Understanding, for an LLM, is not intelligence in the model. It is completeness and explicitness in the context. Koddex's data model is that context.

Layer 2: the MCP server

To connect models to this source of truth, Koddex exposes an MCP server. MCP (Model Context Protocol) is the emerging open standard for wiring AI assistants to external systems, and it matters here for a practical reason: it makes the engineering backbone accessible from whatever assistant the organization already uses, with typed tools rather than screen-scraping or brittle custom integrations.

Through the Koddex MCP server, an agent can search items across the organization tree, retrieve full item details with values and relations, traverse an item tree several levels deep, create items under a parent, set attribute values, add or move children in a BOM structure, and link regulatory requirements from European standards directly to the design elements they constrain.

What this looks like in practice:

An engineer describes a product architecture in natural language and the agent builds the multi-level BOM: creates the items, types them, sets attributes, establishes parent-child relations. Days of structured data entry become minutes, and the output lands in the governed model, not in another orphan spreadsheet.

A compliance engineer asks the agent to attach the applicable requirements of a standard to the relevant subsystems. The agent creates the requirement items and the traceability links in one pass.

A program manager asks coverage questions in plain language and gets answers computed from the live model. No exports, no tool expertise required.

That last point is the underrated one. The bottleneck in most engineering organizations is not that the data does not exist; it is that only a handful of people can extract it. An LLM over a single source of truth removes the query language barrier. Data access stops being a specialist skill and becomes available to everyone: design, quality, purchasing, program management, all querying the same authoritative model.

Layer 3: governance that binds agents and humans equally

Write access for agents is only acceptable if the platform enforces the rules, independently of prompts. Koddex applies three mechanisms uniformly to human and AI actors.

Locks. Validated items can be locked. A locked item rejects modification at the platform level, regardless of who or what requests it. A requirement frozen for a certification milestone stays frozen. The agent does not need to be told not to touch it; it cannot. This is the difference between a guardrail and a wish.

Revisions. Substantial changes do not overwrite; they create a new revision in the item's revision tree, with prior versions retained and comparable. When an agent drafts a reworked requirement, the draft enters the same revision workflow as a human edit: reviewable, diffable, revertible, approved by the people accountable for it. The agent accelerates production; the decision stays human.

Full activity history. Every action on every item, creation, value change, link added or removed, is logged with author and timestamp. Agent actions are recorded identically to human actions. The accountability chain that auditors require remains unbroken, and reviewing what an agent did during a bulk operation is a query, not an investigation.

Ship to audit

These mechanisms converge on the operating principle Koddex calls ship to audit. In regulated hardware, speed without proof is worthless. A certification file must demonstrate, at any point in time, who did what, on which revision, under which justification. By combining the single source of truth with locks, revisions, and complete activity history, the platform lets teams use agents to build models and data at AI speed while continuously producing the audit evidence the authorities require. Velocity and configuration control stop being a trade-off.

The takeaway

Three points, for engineers evaluating agentic AI on their engineering data:

Hallucination is a property of the context you provide, not just of the model you choose. Fragmented, contradictory data guarantees confident wrong answers. A typed single source of truth removes the gaps that force the model to interpolate. Fix the data before blaming the model.

Never rely on prompts for governance. Enforcement belongs in the data layer: locks that reject writes structurally, revisions that make every change reversible and reviewable, activity logs that treat agents as accountable actors.

The payoff of getting both right is not incremental. It is BOMs built in minutes, standards mapped to designs in one pass, and engineering data queryable in plain language by the entire organization, all while the audit trail writes itself.

That is the role of an Engineering Operating System: give the AI what it lacks by construction, a complete model of the system and enforced rules for touching it. For teams in aerospace, defense, medtech, nuclear, and robotics, it is the difference between AI as a liability and AI as leverage.

Explore the Engineering Operating System or book a demo.