Track 00 · AI Architecture

Multi-Level LLM + SLM Routing

The 8-factor decision engine. Every turn is scored across privacy, complexity, domain match, urgency, cost, reasoning depth, context, and clarity. PHI-bearing turns route to the on-prem SLM; complex reasoning routes to a frontier LLM. Same engine, healthcare vocabulary.

About the SLM

Bounteous Fine-Tuned HCSC SLM

A small language model (Mistral-7B) fine-tuned via LoRA on HCSC-shaped policy content — Summary Plan Descriptions, Evidence of Coverage, provider manuals, denial-code references (CARC / RARC), prior-auth criteria, and appeals procedures. Quantised to GGUF and runs on-prem via llama.cpp so PHI never leaves the HCSC environment.

Best at: eligibility lookups · benefit / copay / deductible questions · denial-code translation · routine claim status · prior-auth status · appeals-process explanations · plain-language re-framing of policy text.For multi-step clinical reasoning, peer-to-peer prep, or open-ended interpretation, the router escalates to a frontier LLM after PHI redaction.

In this prototype the SLM responses are pre-canned to keep the demo fully static. The same fine-tuning pipeline produces a live local inference service for the production phase.

Pick an example — it lands in the box, then hit Route + Answer

Ask anything — the router decides

Press ⌘/Ctrl + Enter to send

Continue the story

What feeds the privacy factorPHI Detection & RedactionInspect exactly which patterns trigger the on-prem-SLM override.Routing in a live conversationCSS Copilot DesktopSee the same routing in action across three scripted healthcare calls.Routing on the provider sideClaims self-service botDomain-match factor steers most provider questions to the SLM — fast and cheap.

← Back to overview