Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog
PostHog is building an automated pipeline that converts observability data into actionable pull requests, enabling products to self-improve by automatically ...
By Sean WeldonSelf-Driving Products: Automated Conversion of Observability Signals to Production Code
Abstract
This paper examines an automated pipeline architecture that transforms observability data into executable code changes without manual developer intervention. PostHog's system addresses the inefficiency of traditional reactive debugging workflows by implementing a five-stage pipeline: signal ingestion, semantic grouping, agent-based research, actionability assessment, and iterative code execution. The architecture processes trillions of monthly events across heterogeneous sources including error tracking, session replays, logs, and communication channels. Key technical contributions include a novel embedding strategy that matches LLM-generated semantic queries rather than raw signals to overcome structural similarity clustering failures, weighted report accumulation for noise reduction across disparate sources, and sandbox-based iterative pull request generation with continuous integration monitoring. Evaluation demonstrates that autonomous agents can reliably convert observability data into production-ready code changes when provided with sufficiently specific problem descriptions, with actionability strongly correlated to signal source specificity.
1. Introduction
Contemporary software engineering allocates substantial resources to reactive maintenance workflows. The conventional observability pipeline—signal detection, dashboard analysis, manual investigation, issue tracking, pull request creation, code review, and deployment—requires hours to days per incident. This reactive cycle diverts engineering capacity from feature development to routine remediation tasks, representing a significant opportunity cost in organizational productivity.
Observability data, while extensively collected through error tracking systems, session replay tools, log aggregation platforms, and communication channels, remains fundamentally underutilized in current paradigms. Existing systems emphasize data collection and visualization, positioning human developers as the primary analytical and remediation agents. These platforms generate alerts and dashboards but require manual interpretation and translation of insights into executable code changes.
This work examines an automated pipeline architecture designed to eliminate manual intervention by converting raw observability signals directly into production-ready pull requests. The system addresses a central research question: whether autonomous agents can reliably transform heterogeneous observability data into deployable code changes across diverse problem types and signal sources. The analysis proceeds through examination of pipeline architecture, signal normalization strategies, semantic grouping mechanisms, agent-based problem analysis, actionability assessment criteria, and iterative code generation with continuous integration feedback loops.
2. Background and Related Work
Traditional observability architectures focus on telemetry collection and visualization, treating developers as necessary intermediaries between data and action. Error tracking platforms capture exception traces, session replay tools record user interaction sequences, and log aggregation systems centralize application telemetry. These systems excel at data presentation but require human operators to synthesize information across sources and implement remediation strategies.
The Model Context Protocol (MCP) provides a standardized interface enabling language models to access external data sources and tools through defined server implementations. Integration of MCP servers allows agents to retrieve supplementary context beyond initial signal data, improving analytical accuracy through access to correlated information across observability platforms. This protocol proves particularly valuable when agents require additional data dimensions—such as retrieving complete log sequences alongside session replay excerpts—to formulate accurate problem diagnoses.
The Claude Agent SDK represents a framework for autonomous task execution through iterative tool use and code generation. When deployed in isolated sandbox environments such as Modal, these agents can safely execute code modifications, run test suites, and iterate on failures without affecting production systems. The SDK's capability for stateful iteration—through sandbox snapshots and rehydration—enables continuous refinement of proposed changes until continuous integration checks pass.
Semantic grouping in heterogeneous data streams presents distinct challenges when applying standard embedding techniques. Off-the-shelf embedding models prioritize structural similarity, causing signals to cluster by format rather than semantic content. This work addresses these limitations through query-based embedding strategies specifically designed for multi-source observability data.
3. Core Analysis
3.1 Pipeline Architecture and Signal Flow
The proposed system implements a five-stage pipeline architecture that transforms raw observability signals into executable code changes. The pipeline ingests trillions of events monthly from diverse sources: error tracking systems, session replay platforms, application logs, communication channels (Slack), and experimental results. Each stage progressively refines signal data, culminating in production-ready pull requests.
At pipeline entry, an LLM-based classifier filters malicious signals originating from public-facing sources. This security layer prevents attacker-crafted error messages or malicious input from propagating through the system. Following security validation, the normalization stage transforms heterogeneous signal types—stack traces, JSON logs, chart results, natural language messages—into a unified structure containing standardized fields: source identifier, product context, signal type, content payload, importance weight, and semantic embedding.
The weight assignment mechanism proves critical for downstream processing, indicating signal importance and enabling threshold-based promotion to subsequent pipeline stages. This architecture replaces manual dashboard monitoring with automated background agents, shifting developer interaction from reactive investigation to proactive review of generated solutions.
3.2 Semantic Grouping Through Query Generation
Standard embedding approaches fail catastrophically when applied to structurally diverse observability signals. Empirical evaluation demonstrates that off-the-shelf embedding models cluster signals by structural similarity rather than semantic relevance: error messages group with other errors, Slack messages cluster together, and log entries form separate groups—regardless of whether they describe the same underlying problem. This structural clustering renders traditional embedding strategies ineffective for cross-source signal correlation.
The solution implements a two-stage embedding strategy: first, an LLM generates semantic queries from each signal, extracting the core problem description independent of format; second, the system embeds these generated queries rather than raw signal content. This approach normalizes structural differences before embedding, enabling semantically similar problems to cluster regardless of originating source. A null pointer exception from error tracking and a customer Slack message describing checkout failures can successfully group when their generated queries both reference payment processing issues.
Weighted report accumulation aggregates related signals over time, with each signal contributing its assigned weight to a cumulative report score. When accumulated weight exceeds a defined threshold, the system promotes the report to the research agent stage. This mechanism provides noise reduction by requiring multiple correlated signals before triggering expensive agent analysis, while preventing premature investigation of isolated anomalies.
3.3 Agent-Based Problem Analysis and Actionability Assessment
The research agent executes within a Modal sandbox environment, running the Claude Agent SDK with three tool categories: an MCP server for supplementary data retrieval, codebase context access, and external MCP integrations (Linear, Notion). The agent produces structured outputs including problem summaries, priority scores, and git blame attribution for reviewer assignment based on code ownership patterns.
MCP server integration significantly improves diagnostic accuracy by enabling agents to pull correlated data across observability platforms. When analyzing a session replay showing user frustration, the agent can retrieve associated error logs, related exception traces, and historical incident patterns. This cross-source data access proves essential for accurate problem characterization, as initial signals frequently lack sufficient context for definitive diagnosis.
The actionability assessment stage implements a three-outcome classification that determines subsequent processing paths. Signals classified as "not actionable" due to insufficient data return to the accumulation pool, awaiting additional correlated signals. Problems requiring human judgment—typically product decisions with multiple valid solution paths—route to a review inbox for morning developer attention. Only problems classified as "immediately actionable" proceed to automated code generation.
Empirical analysis reveals strong correlation between signal source and actionability. Error tracking systems produce specific, immediately actionable problems with clear remediation paths. Conversely, Slack messages and session replays generate generic problem descriptions admitting multiple solution approaches, requiring human product judgment. This specificity gradient directly determines whether agents can generate meaningful fixes versus producing noisy, low-value pull requests.
3.4 Iterative Code Generation and Continuous Integration
The code execution stage clones the target repository into an isolated sandbox, running the Claude Agent SDK to generate proposed fixes. Upon completion, the system pushes a pull request and monitors continuous integration status. CI failures or pull request comments trigger sandbox rehydration from saved snapshots, enabling iterative refinement without repeating initial analysis.
This snapshot-based iteration mechanism proves essential for handling complex fixes requiring multiple attempts. The agent examines CI failure logs, adjusts its approach, and regenerates code until tests pass. The system delivers green, ready-to-merge pull requests without manual developer intervention on intermediate failures, fundamentally changing the developer experience from "wake up to CI failures requiring investigation" to "wake up to green PRs requiring only approval."
The iterative refinement capability extends to handling code review feedback. Developer comments on generated pull requests trigger sandbox rehydration, allowing the agent to incorporate feedback and update its proposed changes. This creates a collaborative loop where human expertise guides agent execution without requiring manual code modification.
4. Technical Insights
Implementation of production-scale signal-to-code pipelines reveals several critical technical considerations. Evaluation on representative production data proves essential; local testing with synthetic examples fails to capture the diversity and edge cases present in real customer data. The system requires continuous evaluation against actual signal distributions to maintain reliability across heterogeneous problem types.
The embedding strategy for semantic grouping requires careful consideration of data structure normalization. Direct embedding of raw signals fails due to structural similarity clustering. Successful implementations must either normalize data structure before embedding or, more effectively, generate semantic abstractions (queries) that eliminate format-specific features before vector representation.
Actionability filtering based on problem specificity prevents generation of noisy, low-value pull requests. Agents will attempt fixes even for vague problem descriptions, but these attempts rarely produce meaningful solutions. Systems must implement specificity thresholds, routing generic problems to human review rather than automated code generation.
Token costs, while significant, prove secondary during initial development phases. Running expensive agent workflows repeatedly reveals behavioral patterns that enable conversion to cheaper one-shot LLM calls or fine-tuned models. The iterative refinement of expensive agent steps yields dramatic cost reductions as consistent patterns emerge from diverse problem instances.
The integration of git blame for reviewer assignment demonstrates effective use of repository metadata to route pull requests to appropriate code owners. This attribution mechanism ensures that generated changes receive review from developers with relevant domain expertise, improving review quality and merge rates.
5. Discussion
The demonstrated pipeline architecture represents a fundamental shift in observability system design, transforming passive data collection into active remediation. The approach extends beyond simple automation of existing workflows; it reconceptualizes the role of observability data as direct input to code generation rather than indirect input to human decision-making.
Several implications emerge for autonomous software development systems. First, the specificity gradient across signal sources suggests that different observability tools produce varying degrees of actionable information. Error tracking systems, with their structured exception data and stack traces, enable more reliable automated remediation than natural language communication channels. This finding has implications for observability tool design: systems intended to support automated remediation should prioritize structured, specific signal generation over general-purpose data collection.
Second, the query-based embedding approach addresses a broader challenge in multi-modal data processing. When working with heterogeneous data sources, structural normalization through semantic abstraction proves more effective than direct embedding of raw data. This principle likely generalizes beyond observability signals to other domains requiring semantic grouping across diverse data formats.
The future trajectory points toward fully autonomous product development cycles. The system could automatically deploy experiments, measure impact through observability data, and iterate on implementations without manual experiment management. Low-risk changes could auto-approve and deploy behind feature flags, with automatic rollback on negative signals. Each outcome—rejected pull requests, deployment issues, production error resolution—feeds into subsequent PR generation, creating a continuous learning loop.
However, significant challenges remain. The system's reliance on problem specificity for actionability limits its applicability to well-defined technical issues. Product decisions, architectural choices, and ambiguous requirements still require human judgment. The boundary between automated and human-directed development requires further investigation to optimize resource allocation.
6. Conclusion
This analysis demonstrates that autonomous agents can reliably convert observability data into production-ready code changes when provided with sufficiently specific problem descriptions. The five-stage pipeline architecture—signal ingestion, semantic grouping through query generation, agent-based research with MCP integration, actionability assessment, and iterative code execution—successfully processes trillions of monthly events across heterogeneous sources.
Key contributions include the query-based embedding strategy that overcomes structural similarity clustering failures, weighted report accumulation for noise reduction across disparate sources, and sandbox-based iterative refinement enabling delivery of green pull requests without manual intervention on CI failures. The strong correlation between signal source specificity and actionability provides actionable guidance for observability system design.
Practical applications extend beyond bug remediation to autonomous feature development, experiment management, and continuous product improvement. Organizations implementing similar systems should prioritize evaluation on representative production data, invest in embedding strategies that normalize structural differences, and implement specificity-based filtering to maintain high signal quality in generated pull requests. The demonstrated approach represents a foundational step toward self-improving software products that autonomously detect, diagnose, and remediate issues while learning from every outcome.
Sources
- Self Driving Products: Product Signals to Pull Requests — Joshua Snyder, PostHog - Original Creator (YouTube)
- Analysis and summary by Sean Weldon using AI-assisted research tools
About the Author
Sean Weldon is an AI engineer and systems architect specializing in autonomous systems, agentic workflows, and applied machine learning. He builds production AI systems that automate complex business operations.