All Case Studies
JPMorgan Chase · Senior Product Manager, Product Lead · 2024 - Present

Building a GenAI Rules System with Human-in-the-Loop

3,000+
Configurations Analyzed
40%
Target Maintenance Reduction
Human-in-the-Loop
Architecture Pattern

Problem

With 180+ product definitions and 3,000+ rule configurations, detecting conflicts, duplications, and optimization opportunities had become humanly impossible at scale. Rule authors across different business units were creating overlapping or contradictory logic without visibility into the full rule landscape.

Manual rule audits happened quarterly at best and could only cover a fraction of the configuration space. Conflicts that slipped through caused downstream processing errors, onboarding delays, and occasional compliance issues.

Research & Discovery

I evaluated the problem through the lens of "where can AI add value without creating new risk?" The answer was clear: pattern detection across large rule sets is exactly the kind of task where LLMs excel: finding similarities, flagging anomalies, and suggesting optimizations across a corpus too large for human review.

But the risk profile was equally clear. Auto-applying AI-suggested rule changes to a system that governs client onboarding and compliance would be irresponsible without human validation. The cost of a wrong change was too high.

I prototyped and validated the concept using Claude Code and n8n to identify high-impact automation opportunities, reducing the discovery-to-validation cycle from weeks to days before committing engineering resources.

Approach

I designed a human-in-the-loop architecture with confidence-based routing. The system analyzes rules and generates recommendations, each with a confidence score. High-confidence recommendations (clear duplicates, obvious conflicts) can be auto-applied after a brief review period. Low-confidence recommendations route to domain expert review with full context on why the change was suggested.

This architecture embodies a core product philosophy: trust over engagement. Rather than maximizing the volume of automated changes, we optimize for accuracy and user trust in the system's recommendations.

Solution

The system uses GenAI to analyze decision logic patterns across the full product catalog of 3,000+ configurations. It identifies three categories: duplicated rules (same logic, different expressions), conflicting rules (contradictory outcomes for same inputs), and optimization opportunities (rules that could be simplified or consolidated).

Each recommendation includes the AI's reasoning, confidence level, affected products, and potential impact. Domain experts can approve, modify, or reject recommendations, and their decisions feed back into the system to improve future suggestions.

Impact

The system targets a 40% reduction in manual rules maintenance. The proof-of-concept demonstrated the ability to surface rule conflicts that had gone undetected through manual review, validating the core hypothesis.

The human-in-the-loop architecture has become a reference pattern within the organization for how to responsibly deploy GenAI in high-stakes enterprise contexts, where the cost of automation errors exceeds the cost of human review.

Reflection

This project crystallized my thinking on enterprise AI product design. The temptation is always to maximize automation: "look how much we can do without humans." But in regulated environments, the right question is "where does human judgment still matter, and how do we make that judgment more efficient?"

The confidence threshold approach isn't just a technical architecture decision. It's a trust contract with users: "We'll only act autonomously when we're confident, and we'll always show our work." That transparency is what makes adoption possible in risk-averse organizations.

Prototyping with Claude Code and n8n before requesting engineering resources proved essential. It let us validate the approach quickly and build organizational confidence in the concept before making a larger investment.