Why Can't Anyone Answer Questions About the Business? - Garrett Galow, WorkOS

By building an LLM-powered agent system (Studio) with proper sequencing, context layering, and validation, WorkOS enables employees to self-serve business qu...

By Sean Weldon

Abstract

WorkOS Studio represents an applied implementation of Large Language Model (LLM)-powered agentic systems designed to democratize enterprise data access. By orchestrating queries across heterogeneous data sources - Snowflake, Linear, and Notion - through a LangGraph-coordinated agent backed by Claude Opus, Studio enables non-technical employees to self-serve complex business questions and generate reusable analytical dashboards. The architecture employs three interconnected design principles: pre-flight validation to ensure query feasibility, hierarchical context layering to encode organizational data models, and deterministic widget-based output generation to decouple LLM inference from recurring query execution. Validation mechanisms address zero-result query failures prior to deployment. The implementation demonstrates that runtime context injection at tool-invocation time can serve as a practical alternative to retrieval-augmented generation for schema-rich environments, with meaningful implications for enterprise agentic system design.


1. Introduction

Enterprise data organizations face a structural bottleneck: non-technical stakeholders requiring analytical insights must route requests through centralized data teams, creating iterative clarification cycles that delay decisions and constrain organizational agility. Custom dashboards address recurring queries but lack flexibility when business questions evolve, while one-off SQL queries do not scale across organizations with varied technical competencies. The resulting workflow - question, clarification round, query, revision, repeat - represents a friction pattern that is pervasive and largely accepted as unavoidable.

WorkOS Studio addresses this challenge through an agentic architecture enabling employees to formulate natural-language questions and receive structured, executable analytical outputs. The system's design philosophy centers on three engineering principles: sequencing (pre-flight validation and structured tool invocation), layering (hierarchical context injection), and validation (query correctness verification before deployment). Each principle targets a specific failure mode observed in naive LLM-to-data-warehouse implementations.

This analysis examines Studio's architecture, the technical rationale behind its design decisions, and the implications for enterprise agentic system design at scale. Key concepts examined include agentic orchestration, runtime context injection, widget-based dashboard generation, and organizational access control for LLM-mediated data access.


2. Background and Related Work

Traditional business intelligence (BI) architectures separate query authorship from business question formulation. Data analysts translate organizational questions into SQL or BI tool configurations, creating a translation layer that introduces latency and communication overhead. Non-technical users lacking SQL competency cannot independently investigate data, generating dependency chains that constrain organizational responsiveness. Agentic LLM systems offer a resolution: natural-language interfaces that translate business questions into structured queries, execute them against data warehouses, and return interpretable results. However, naive implementations face challenges including incorrect schema interpretation, context window constraints, and query reliability failures.

LangGraph provides a graph-based orchestration framework for LLM agents, enabling structured tool-use patterns where agents select, invoke, and process outputs from external services. This differs from single-pass LLM inference by maintaining state across tool invocations and supporting iterative reasoning loops. Retrieval-Augmented Generation (RAG), an alternative context management strategy, dynamically retrieves relevant context chunks to augment LLM input at query time. Studio's architecture represents a deliberate departure from this paradigm, demonstrating that schema-rich environments with well-defined data models may not require dedicated retrieval infrastructure when context injection is applied at the tool-invocation boundary.


3. Core Analysis

3.1 Sequencing: Pre-flight Validation and Structured Tool Invocation

A central design decision in Studio's architecture is the sequencing of agent operations prior to any query execution. Rather than allowing the agent to invoke data source tools immediately upon receiving a user question, Studio requires completion of a pre-flight validation phase. The agent evaluates whether tools are connected correctly and whether sufficient context exists to answer the question. As Galow states, "We make it run a lot of pre-flight checks... are all the tools connected correctly? Do you have enough context to be able to answer the question?" If contextual gaps are detected, clarifying questions are surfaced to the user before any data source is queried.

This sequencing approach serves two functions. First, it reduces wasted tool invocations against cloud data warehouse resources, which carry latency and cost implications. Second, it shifts the clarification burden earlier in the workflow, replacing downstream query failures with upfront user dialogue. Tool selection is governed by a checklist mechanism rather than freeform agent reasoning, constraining the decision space and improving reliability across varied question types. Consequently, the agent's behavior becomes more predictable and auditable than unconstrained tool-selection approaches.

3.2 Layering: Hierarchical Context Injection

Studio employs a hierarchical context injection model in which context is assembled from multiple stacked sources: a base prompt, default organizational rules, and org-specific configuration blocks. Critically, schema context is not embedded in the initial system prompt but is injected at the moment a specific tool is invoked. As Galow explains, "at the time it decides to invoke a tool, that's when we inject context around how to use the tool."

This design decision has measurable consequences for context window management. By deferring schema injection to invocation time, the agent's initial context remains compact, reducing token overhead for questions that ultimately require only a subset of available tools. Snowflake integration context encodes database schema, join patterns across approximately four levels of customer entity relationships, filtering rules for deleted entities, and status column semantics - information necessary to prevent structurally valid but semantically incorrect queries. Furthermore, the system explicitly instructs the model to distrust its parametric knowledge about WorkOS products, directing it to retrieve authoritative information from primary documentation sources. This distinction between parametric and injected knowledge is significant: the architecture treats the model's training data as potentially stale and actively routes around it for domain-specific facts.

3.3 Validation: Query Correctness Verification

Studio's validation layer addresses a failure mode specific to natural language-to-SQL systems: queries that are syntactically and semantically valid yet return zero results due to incorrect filter conditions or misapplied join logic. As Galow notes, "valid SQL query, but that returns zero data - if it doesn't notice that, it's not very useful." The validation mechanism executes generated queries against live Snowflake data prior to hardcoding them into widgets, flagging zero-result returns before they are surfaced to end users.

Evals are run identically across staging and production environments, ensuring that validation behavior remains consistent during iterative development. This parity reduces the risk of environment-specific query failures reaching production systems. The validation approach is notably reactive rather than preventive - it operates on query outputs rather than attempting to statically verify query correctness - which accommodates the inherent unpredictability of LLM-generated SQL without requiring formal verification infrastructure.

3.4 Widget-Based Output and Deterministic Execution

Studio generates widgets - self-contained JavaScript sandbox units combining UI elements, API calls, and embedded queries - as its primary output artifact. A critical design property of widgets is their determinism: once generated and validated, widgets execute queries directly against data sources without re-invoking the LLM. The model is only re-engaged when a widget is explicitly modified by the user.

This architecture decouples inference cost from query frequency, enabling high-volume dashboard usage without proportional LLM expenditure. State persistence in Convex preserves question history and widget definitions across sessions, supporting organizational reuse of validated analytical outputs. The widget model also enables a form of institutional knowledge capture: validated queries become shared organizational assets rather than ephemeral one-off executions.


4. Technical Insights

Several implementation considerations emerge from Studio's architecture with direct applicability to enterprise agentic system design.

Runtime context injection versus RAG: For environments with well-defined, stable schemas, injecting context at tool-invocation time eliminates the infrastructure overhead of vector databases and embedding pipelines. This approach is most viable when data models are bounded and authoritative schema documentation exists in a form suitable for context block encoding.

Model selection trade-offs: Studio's selection of Claude Opus over lower-cost alternatives reflects an explicit quality-cost evaluation. Galow states that Opus "outperforms better than other models so much that trading the cost off would trade off quality in a way that we wouldn't deem acceptable." This observation underscores that model selection in production agentic systems should be benchmarked against task-specific quality requirements rather than optimized for cost in isolation.

Access control architecture: Studio's current per-user integration model presents scalability constraints as organizational adoption grows. The transition to org-level connectors via WorkOS Pipes will centralize access control, enabling role-based permission assignment without requiring individual credential management at scale.

Zero-result validation gap: Post-generation execution validation catches semantic failures that static analysis cannot, but introduces latency and requires live data access during the generation phase. Systems operating in environments with restricted data access or high warehouse query costs may require alternative or complementary validation strategies.


5. Discussion

Studio's architecture illustrates a broader principle in enterprise agentic system design: reliability emerges from structural constraints on agent behavior rather than from model capability alone. Pre-flight checks, hierarchical context injection, and output validation each reduce the surface area of failure modes that LLM reasoning alone cannot eliminate. This finding aligns with emerging patterns in production agentic deployments, where deterministic scaffolding around probabilistic model outputs consistently outperforms unconstrained agent architectures.

The decision to avoid RAG in favor of runtime context injection warrants further investigation. While effective for WorkOS's bounded data model, this approach may not generalize to environments with large, heterogeneous, or frequently evolving schemas where context blocks would exceed practical token limits. Future work could examine hybrid architectures that selectively apply RAG for schema-dense subdomains while using direct injection for stable, well-documented entity relationships.

The organizational access control transition from per-user to org-level connectors represents a maturation pattern likely to recur across enterprise agentic deployments. As agentic systems move from internal tooling to organization-wide infrastructure, centralized permissioning becomes a prerequisite for governance, auditability, and compliance - areas that current per-user credential models address incompletely.


6. Conclusion

WorkOS Studio demonstrates that LLM-powered self-service data access is achievable in production enterprise environments through disciplined architectural choices rather than model capability alone. The three-principle framework of sequencing, layering, and validation provides a replicable design pattern for organizations seeking to reduce dependency on centralized data teams without sacrificing query reliability or analytical correctness.

Practical takeaways for agentic system implementers include: defer context injection to tool-invocation time to preserve initial context compactness; validate query outputs against live data before surfacing results to end users; and design output artifacts to execute deterministically post-generation, decoupling LLM inference cost from query frequency. Organizations evaluating similar architectures should prioritize model selection based on task-specific quality benchmarks and plan access control for organizational scale from the initial design phase rather than retrofitting governance after adoption.


Sources


About the Author

Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.

LinkedIn | Website | GitHub