A multi-tenant AI platform had a growing usability problem: domain experts needed to find specific records across a large operational worklist, but the only way to search was through a structured filter UI with dozens of fields and dropdown combinations. Experts think in domain language — "unconfirmed AI findings from last week" or "abnormal impressions for Site 12" — not in filter combinations. The gap between how they think and how the system expected them to search was slowing down the people who needed data most urgently.
ML LABS built a natural language search capability that let users query the worklist in plain language — parsing domain-specific intent into structured queries, searching across both raw records and AI-generated outputs, and returning results fast enough for real-time use on desktop and mobile.
The Problem
The worklist contained operational records annotated with AI-generated outputs: algorithmic classifications, interpretive analyses, and AI impressions from multiple model providers. Searching required combining filters across both raw record fields (site, date, status, assigned expert) and AI output fields (classification result, impression text, confirmation status). The structured filter UI exposed these as separate controls, and combining them required knowing which fields existed and how they interacted.
graph TD
A1["Natural Language<br/>Query"] --> B1["Domain Entity<br/>Recognition"]
B1 --> C1{"Query Targets"}
C1 -->|"Record Fields"| D1["Primary Record<br/>Search"]
C1 -->|"AI Outputs"| D2["AI-Generated<br/>Content Search"]
C1 -->|"Both"| D3["Unified<br/>Intersection"]
D1 --> E1["Sub-Second<br/>Results"]
D2 --> E1
D3 --> E1
style A1 fill:#1a1a2e,stroke:#0f3460,color:#fff
style B1 fill:#1a1a2e,stroke:#ffd700,color:#fff
style C1 fill:#1a1a2e,stroke:#ffd700,color:#fff
style D1 fill:#1a1a2e,stroke:#0f3460,color:#fff
style D2 fill:#1a1a2e,stroke:#0f3460,color:#fff
style D3 fill:#1a1a2e,stroke:#0f3460,color:#fff
style E1 fill:#1a1a2e,stroke:#16c79a,color:#fffNatural Language Query Parser
The core is a constrained intent parser that maps natural language queries to a known set of domain entities and structured filters. This is not general-purpose text-to-SQL — the parser recognizes entities specific to the operational domain and translates them into executable queries.
When a user types "abnormal AI impressions for Site 12 this month," the system understands the intent — an AI classification status, a site identifier, and a date range — and returns the right results. The parser handles ambiguity through explicit resolution rules developed against real user queries, not through probabilistic guessing.
The parser's reliability comes from constraint, not sophistication. It recognizes every entity that maps to a filterable field — and nothing else. Bounded vocabulary means predictable results.
Searching AI-Generated Outputs
The worklist contained two layers of searchable data: raw operational records and AI-generated outputs from the inference pipeline. ML LABS built unified search across both layers. A query for "abnormal AI impression" searches AI outputs. A query for "Site 12 records from last week" searches primary records. A query combining both returns the intersection — and the user never needs to know which data layer is being queried.
AI-generated outputs were treated as first-class searchable content. When a model provider produced a new output type, the search system expanded to include it as a configuration change rather than a code change.
Query Performance
Operational search has a hard latency requirement. Domain experts search within their workflow — if results take more than a second, the search disrupts the workflow rather than supporting it.
ML LABS optimized query execution to deliver sub-second results at production data volumes. The optimization was driven by actual query patterns collected during testing, not hypothetical usage. Response payloads were trimmed to display-required fields only.
Mobile Search
Domain experts increasingly worked from tablets and phones. Natural language search became the primary input method on smaller screens where the desktop filter UI could not fit.
ML LABS built mobile search using the same query engine as desktop. The adaptation was entirely in the presentation: touch-friendly result cards, appropriately sized result sets, and simplified disambiguation. Same parser, same results, same performance.
Boundary Conditions
Natural language search works when the domain vocabulary is bounded and query patterns are predictable. It struggles when underlying data quality is poor — inconsistent naming, missing fields, or duplicates produce unreliable results regardless of parser quality. The first investment in those environments is upstream data quality.
It also reaches limits with genuinely unstructured content. Searching freeform narrative notes is a full-text ranking problem, not a structured filtering problem.
First Steps
- Collect real queries from domain experts before building the parser. The distribution of actual queries determines parser scope and indexing strategy.
- Build the structured filter API first, then add the NLP layer on top. The filter API is the stable foundation; the parser translates into it.
- Measure latency at the 95th percentile under production data volumes. The queries that time out are the ones users remember.
Practical Solution Pattern
Build a constrained intent parser that maps natural language to a known set of domain entities and structured queries. Search both primary records and AI-generated outputs through a unified interface. Enforce sub-second latency optimized against actual query patterns, and support mobile as a first-class search surface.
This works because it avoids the two failure modes that kill operational search projects: over-engineering the NLP layer to handle queries users never ask, and under-engineering the data layer so that accurately parsed queries still time out. If you need to make internal knowledge or operational data searchable through natural language, AI Knowledge Search is the direct build path.