From RAGs to Riches: Lessons from the “Bleeding Edge” of Building AI Agents

Healthcare is fundamentally different from other industries. The stakes are higher, compliance requirements are stricter, and every minute saved through automation means more time for what truly matters: patient care. At Arya, we've spent the last two years building AI agents that automate the mundane so healthcare staff can focus on the meaningful – but that journey hasn't been straightforward.
Our First Foray: The Payroll Agent
In late 2022, we built our first AI-powered product – a payroll agent to automate the complex task of calculating how much people get paid. One challenging aspect was determining prevailing wages, which vary based on role, location, and work type from unstructured publicly available data sitting in PDF documents on a government site.
Back then, the AI landscape looked very different. Claude was just launching, ChatGPT was on version 1.5, and many of today's frameworks didn't exist. When implementing AI to analyze government prevailing wage documents, we immediately hit roadblocks:
- Hallucinations were rampant with longer documents
- Response times were painfully slow
- Token costs were prohibitive
We restructured our approach by breaking documents into smaller, context-aware segments, and our accuracy jumped from around 40% to nearly 98%. But this required extensive manual preparation of data.
As one of the first companies to gain access to Claude's API, we discovered an important lesson: LLMs aren't magical. They're powerful tools for structuring information and humanizing interactions, but require careful implementation, especially in healthcare where accuracy is non-negotiable.
The Shift to Healthcare Scheduling
By late 2023, we were pulled into the crazy world of caregiver scheduling – a critical pain point in healthcare operations. After numerous conversations with clients, we discovered that scheduling inefficiencies were sapping productivity and creating compliance risks for organizations.
Healthcare organizations were spending hours manually matching workers to shifts, often using spreadsheets or basic calendar tools. Front-office staff would call or text workers individually, with no systematic way to track who was qualified, who was approaching overtime limits, or who was most likely to accept a particular shift. Bad process, lost productivity, unhappy patients and employees.
Our clients faced three fundamental challenges:
- Finding qualified clinicians for shifts – Schedulers needed to consider licenses, skills, certifications, patient preferences, geographic proximity, and availability windows simultaneously.
- Communicating efficiently with mobile workers – Healthcare professionals are rarely at desks and have limited time to monitor scheduling portals or respond to requests.
- Maintaining compliance while avoiding burnout – Organizations needed to prevent overtime violations and credential mismatches while ensuring fair distribution of shifts.
The complexity of these challenges demanded more than simple automation – it required intelligent agents that could understand context, make appropriate recommendations, and communicate naturally with both schedulers and clinicians. This required us to design a new generation of agentic workflows specifically tailored to healthcare's unique constraints.
The Security Challenge
When building AI systems for healthcare, we hit a fundamental conflict: LLMs need data to provide intelligent responses, but healthcare data demands the highest level of protection.
Most commercial LLM providers log prompts for training. While they protect user identity, the prompt content was fair game – unacceptable for healthcare data containing patient information, schedules, and clinical details.
Our solution emerged when AWS launched Bedrock, offering a partnership with Claude that kept data within our AWS environment. Since our databases and APIs already lived in AWS, this maintained our security boundary and SOC 2 compliance.
Implementing this was far from simple. Bedrock was so new that documentation was virtually non-existent. At an AI conference, one AWS SageMaker specialist candidly admitted, "You know more about Bedrock than I do." We had to pioneer our own implementation patterns, developing workarounds for undocumented limitations.
The tradeoff was performance – routing through Bedrock added approximately 50% more latency compared to direct API calls. However, the security benefits were worth it. With Bedrock, all data stayed within our AWS Virtual Private Cloud, and prompts were never stored or used for model retraining, ensuring HIPAA compliance.
To further secure AI-driven scheduling, we implemented structured access control, ensuring that AI queries were limited by user roles. This prevented unauthorized data access, a critical safeguard for HIPAA-regulated environments. Additionally, all AI-generated responses were logged and auditable, aligning with SOC 2 and HIPAA guidelines.
From RAG to Structured Data
While securing our environment solved one problem, we still faced a mismatch between how LLMs process information and how healthcare data exists in our systems.
Initially, we followed the industry standard approach: Retrieval Augmented Generation (RAG), which works brilliantly for text-heavy information like policy documents or training materials.
The breakthrough insight came at that same AI conference. During a presentation about Google's Gemini launch, a researcher explained: "RAGs are amazing for text-based systems, but the moment you start using actual structured data, it fails miserably."
This hit home immediately. Our healthcare data wasn't narrative—it was highly structured:
- Worker profiles with credentials, skills, and availability
- Shift records with start/end times, locations, and requirements
- Compliance information with expiration dates and certification types
We had been forcing this structured data into narrative form with prompts like: "The clinician's name is John Smith. He is available on Mondays from 9am to 5pm..." This approach consumed excessive tokens, created opportunities for hallucinations, required extensive prompt engineering, and broke when model updates changed response patterns.
The Pydantic Revolution
The biggest breakthrough in our AI pipeline came when we adopted Pydantic AI. Instead of relying on prompts alone to structure responses, we enforced schema validation at the API level, ensuring every AI-generated response adhered to a strict format.
Pydantic is a powerful data validation and settings management library that ensures our data is structured, reliable, and type-safe. Originally designed to validate JSON and API inputs, it has become a staple for developers who want strong guarantees around their data integrity. Unlike traditional Python data classes, Pydantic models enforce data validation at runtime. This ensures that anytime data is passed through a Pydantic model, it is checked against predefined constraints, reducing errors and unexpected behavior in AI applications.
With Pydantic AI, we treat LLMs as structured data providers rather than unstructured text generators, making them more predictable and reliable for real-world applications. By defining strict schemas on LLM inputs and outputs, we significantly reduce hallucinations and enforce compliance with business logic.
We ensured that our scheduling agents produced predictable, valid results – a necessity in a high-stakes industry like healthcare. This shift toward structured LLM outputs transformed our AI from a probabilistic black box into a deterministic, auditable decision-support system.
To test this approach, I built a work pattern analysis tool using Pydantic AI in a week. Instead of writing lengthy prose instructions, we defined the exact structure we wanted. The results were transformative:
- Hallucination rates dropped from ~30% to less than 1%
- We used 40-60% fewer tokens by eliminating verbose instructions
- We could use less expensive models like Gemini Flash instead of more powerful ones
- When models updated, our structured requests continued to work without prompt tweaking
- We could immediately detect if a response didn't match our expected structure
This approach also allowed us to build a modular system of specialized agents, each with a clearly defined input/output contract, making the entire system more maintainable and reliable.
Building the Mavi Agent Ecosystem
With this foundation, we developed Mavi (whom our team decided to name after our co-founder's son, just as he had initially named Arya after his daughter). Mavi isn't a single agent but a framework powering multiple specialized agents:
- Work Pattern Analysis Agent - Examines historical shift data to identify when clinicians typically work
- Shift Details Humanization Agent - Converts structured shift information into natural language
- Employee Scoring Agent - Ranks available shifts based on clinician preferences and proximity
- Employee Negotiation Agent - Communicates to employees about open shifts, negotiate rates and convinces them to pick up additional work
Mavi works best when we first classify the user's intent, then route the conversation to specialized sub-agents with clearly defined input/output structures. This approach has also allowed us to build a modular system of specialized agents, each with a clearly defined input/output contract, making the entire system more maintainable and reliable.
The Future of AI in Healthcare Workforce Management
Healthcare is experiencing a once-in-a-generation opportunity to reimagine workforce management. The industry faces unprecedented challenges – staffing shortages, rising compliance demands, and clinician burnout – that can't be solved by simply digitizing paper processes. The next wave of transformation will be defined by three key developments:
- Interface transformation - We're moving beyond the tyranny of forms and clicks to intent-based interactions. Imagine a nurse practitioner texting "I can't make my Thursday shifts next month" and having the system automatically identify affected shifts, determine which require immediate coverage, and begin targeted outreach to qualified replacements – all while confirming the change with the practitioner. These natural interactions are particularly crucial for healthcare workers who spend their days moving between patients rather than sitting at computers.
- Proactive workforce management - Current systems are reactive, addressing problems after they emerge. The future belongs to predictive systems that identify potential coverage gaps weeks in advance by analyzing historical patterns, staff preferences, and seasonal trends. Our work pattern analysis is just the beginning. Soon, systems will detect early warning signs of clinician burnout based on schedule density, autonomously manage credential renewal processes before expirations create compliance issues, and dynamically adjust staffing models based on patient census predictions.
- End-to-end lifecycle management - The fragmentation of healthcare HR systems creates enormous friction. A clinician's journey from application through credentialing, scheduling, compensation, professional development, and eventual transition requires interactions with 5-10 different systems. We're building toward a comprehensive experience where AI agents orchestrate these processes seamlessly. When a clinician gains a new certification, it automatically updates their profile, eligibility for specialized shifts, and compensation rate. When census patterns change, the system proactively suggests schedule adjustments to affected staff.
The healthcare organizations gaining competitive advantage will be those that eliminate administrative friction, allowing their clinical workforce to focus on patient care rather than paperwork.
Key Takeaways from Our Journey
Building AI for healthcare requires several critical elements working in harmony:
- Security and compliance as non-negotiables - Traditional LLM implementations often log prompts, potentially exposing sensitive information. Our AWS Bedrock solution ensures data stays within our secure environment.
- Type safety and structured responses - Our hallucination rates plummeted from 30% to less than 1% with Pydantic. A scheduling error isn't just an inconvenience—it potentially impacts patient care.
- Careful application design - Resist building AI for AI's sake. Each agent starts with a clear problem statement and workflow understanding.
- Rigorous validation - What works today might break tomorrow as models update. We validate not just for accuracy but for regulatory compliance, fairness, and consistency.
- Putting people first - Clinicians didn't enter healthcare to fill out forms—they came to care for patients. Our technology adapts to their needs and learns from their habits.
These principles have guided our evolution from basic AI implementations to sophisticated agentic workflows that are transforming healthcare operations.
Our journey from RAG to sophisticated agentic workflows reflects a deeper truth about AI implementation: the most powerful applications aren't built on technological novelty alone, but on deep understanding of domain-specific problems. By focusing on the real-world needs of healthcare workers and the organizations that employ them, we've built solutions that deliver meaningful value today while laying the foundation for even more transformative capabilities tomorrow.