B2B Support Tickets · AI · Full Architecture
How the two systems
connect — and improve
Designing two agent systems is the visible work. What makes them actually function is what happens in between — the context object that connects them, the guardrails that constrain them, and the monitoring loop that makes them better over time.
System architecture
Two agent systems · one ticket in the middle
Operator
plant technician
Client system
Intake side
what the operator experiences
Enriched ticket
Technician
manufacturer's support agent
Technical system
Resolution side
what the technician experiences
Operator
Client system
Intake side
Enriched ticket
Technician
Technical system
Resolution side
Shared context layer
The operator sees one interface · the technician sees another · the agents work in parallel. Context passes as a structured object — not free text.
The shared context object
The most critical design decision in the entire system is not visual — it's structural. When the client system finishes its conversation with the operator, it doesn't pass a chat summary to the technical system. It passes a structured object. That distinction matters for three reasons:
Precision: structured fields can be parsed, validated, and reasoned over. Free text cannot. An agent that receives urgency: high as a field acts differently than one that has to infer urgency from a paragraph.
Reliability: a structured object has a defined schema. If a field is missing, the receiving agent knows exactly what it doesn't have — and can act accordingly. A missing paragraph in a summary is invisible.
Traceability: every field in the context object is logged with the agent that produced it. When a diagnosis is wrong, the trace shows which field was missing or incorrect — and which agent is responsible.
The object schema
From the machine database
Machine ID, model, installation date, hours of use, warranty status, last service date, assigned service contract, SLA tier.
From the client conversation
Error code, symptom description, production impact, urgency classification, prior attempts made, operator ID, timestamp of each exchange.
From the diagnostic agent
Preliminary hypothesis if available, confidence level, source tickets referenced, fields marked as incomplete or uncertain.
Metadata
Session ID, agent IDs that touched the object, version of each agent, timestamp of context object creation.
Agent IDs — the naming convention
{system}.{role}.{version}
System: client or diag — which side of the ticket the agent operates on.
Role: what the agent does — triage, diag, ticket, analysis, suggest, escalation.router.
Version: v1, v2, etc. — which version of that agent is running in production.
Why this matters beyond naming
In production, every agent action is logged with its full ID. When a trace shows diag.suggest.v1 produced a suggestion with 78% confidence that was later discarded, the team can:
- ›Pull all tickets where that agent suggested with 60–80% confidence
- ›Compare acceptance rates across confidence bands
- ›Identify whether the calibration is accurate or systematically biased
- ›Deploy diag.suggest.v2 alongside v1 and compare performance before full rollout
The ID is not bureaucracy — it's the infrastructure that makes the system auditable, improvable, and safe to evolve.
Guardrails — what the system never does
Guardrails are defined at three levels:
Level 1 — Prompt instructions
Explicit instructions in each agent's system prompt defining prohibited behaviors, with examples of edge cases where the prohibition applies.
Level 2 — Evals
Before any agent version goes to production, it is tested against guardrail scenarios — inputs designed to trigger the prohibited behavior. Any violation blocks the release.
Level 3 — Production monitoring
Real-time alerts when an agent produces output matching a guardrail violation pattern. Each alert is logged with agent ID, input, and output.
Client system — what it never does
- ›Invent technical procedures not present in the knowledge base
- ›Promise response times or SLA commitments
- ›Minimize urgency described by the operator as critical
- ›Ask for information that won't be used in the ticket
- ›Continue the conversation beyond four exchanges without creating a ticket
Technical system — what it never does
- ›Suggest a solution without indicating confidence level and source
- ›Hide that a problem requires a specialist it cannot route to
- ›Access commercial, financial, or contractual data about the client
- ›Present a hypothesis as confirmed when it is inferred
- ›Reproduce a previous suggestion that the technician already discarded in the session
Evals — before every release
Every agent version goes through a structured evaluation before reaching production. Evals are not unit tests — they are scenario-based assessments that simulate real interactions with known expected outcomes.
Eval structure
Input: the full context the agent would receive in a real session.
Expected output: what the agent should produce, defined as a rubric rather than an exact string match.
Failure conditions: specific outputs that constitute a failed eval — guardrail violations, hallucinated references, miscalibrated confidence scores.
Eval categories by agent
client.triage.v1
Urgency classification precision · Guardrail compliance · Context completeness of generated ticket · Handoff trigger accuracy
client.diag.v1
Error code recognition · Symptom disambiguation · Appropriate escalation when pattern is unknown
client.ticket.v1
Structured object schema compliance · Field completeness · Absence of fields that should be excluded
diag.analysis.v1
History retrieval accuracy · Cross-ticket pattern detection · Correct identification of missing context fields
diag.suggest.v1
Hypothesis precision top-1 and top-2 · Confidence calibration · Zero hallucinated ticket references · Appropriate uncertainty expression
escalation.router.v1
Correct specialist identification · Context object completeness at handoff · Trigger accuracy — escalates when it should, doesn't when it shouldn't
Release threshold
- ›No agent version goes to production with any guardrail violation in evals
- ›Urgency misclassification above 10% on high-urgency scenarios blocks release
- ›Confidence miscalibration above 15 percentage points blocks release
- ›Any hallucinated reference to a ticket, manual section, or data field blocks release
Monitoring — in production
Once agents are live, four categories of metrics are tracked continuously:
Escalation rate by agent and trigger type
If client.triage.v1 escalates more than the expected range, the urgency threshold is miscalibrated. If it escalates less, conversations are ending in frustration. Both directions are problems.
Time between ticket created and first technician action
The primary business metric. If the enriched ticket is working, this number decreases. If it doesn't, the context isn't useful to the technician.
Suggestion adoption rate
What percentage of diag.suggest.v1 suggestions the technician adopts without modification. Healthy range: 40–70%. Below 40% means suggestions aren't useful. Above 70% risks blind trust.
Context completeness at handoff
Average percentage of fields complete in the context object. Target: above 75%. Below 70% triggers a review of which questions the client agent is failing to ask.
Guardrail monitoring
- 1.Alert is logged with full session context
- 2.The specific agent ID and version is flagged
- 3.The input that triggered the violation is added to the eval suite
- 4.A review is triggered before the next deployment of that agent
The improvement cycle
Monitoring feeds into evals, which feed into prompt and context adjustments, which feed back into production. The cycle runs continuously:
Observe: a metric falls outside the expected range or a guardrail alert fires.
Diagnose: review the traces of the sessions where the anomaly occurred. What context did the agent have? What did it produce? What was missing?
Intervene: adjust the agent's context, update the prompt, or add scenarios to the eval suite. Evaluate the new version before deploying.
Deploy: release the new version alongside the current one. Monitor adoption rate. Roll back if performance degrades.
This cycle is not automated — it requires human judgment to diagnose and decide. But the data that feeds it is fully automated. That combination is what makes the system improvable without becoming opaque.
Impact — the SaaS owner perspective
This is the section that matters most for the company that built the platform.
Churn reduction
The system learns from each manufacturer's specific machines, failure patterns, and resolution history. The longer a manufacturer uses the platform, the more accurate the diagnostic suggestions become — for their specific context. That accumulated intelligence is not transferable.
KPI: monthly churn rate — does churn decrease after the first 90 days of use.
Net revenue retention
The system's value grows with data volume. A manufacturer who starts with one product line has an incentive to bring in more machines, more clients, more technicians — because each addition makes the system smarter for all of them.
KPI: NRR at 12 months — a well-designed AI system should drive expansion, not just retention.
Win rate in competitive demos
A support ticket SaaS without AI competes on price and feature parity. A support ticket SaaS with this system competes on accumulated intelligence — something a competitor can't replicate in a sales cycle.
KPI: win rate against competitors in demos where the AI system is demonstrated with real ticket data.
Data as a strategic asset
Every enriched ticket, every structured context object, every root cause logged at ticket close — this is data that didn't exist in structured form before. Over time, it becomes a structured map of how industrial machines fail, across manufacturers, models, and operating conditions. That's not a support product anymore. That's an intelligence platform.
KPI: structured data volume growth — the foundation of the company's long-term defensibility.