Unified Monitoring & Alerting
Building an extensible monitoring platform that replaced fragmented legacy tools and became the foundation for Sumo Logic's alerting ecosystem.
Building for the future
Replace fragmented legacy tools with an extensible monitoring foundation for Sumo Logic's alerting ecosystem.
Sumo Logic needed to transform its monitoring capabilities to retain customers and compete effectively against DataDog, Splunk, and New Relic. The goal wasn't just to build a better monitoring tool — it was to create an extensible foundation that would support the company's alerting ecosystem for years to come.
We set out to replace fragmented legacy tools with a unified platform that could gracefully accommodate future capabilities: outlier alerting, smart alerts, SLO monitoring, and eventually the Alert Response Platform.
Monitor definition: the extensible framework
Ad-hoc tools, expert-level requirements
Legacy scheduled search required deep query expertise, complete upfront configuration, and had no central management.
Sumo Logic had built ad-hoc reporting tools that customers were using as makeshift monitoring and alerting solutions. These tools required deep understanding of Sumo's query language to fully exploit — and even experienced users often needed professional services interventions to achieve their desired outcomes.
The legacy scheduled search feature exemplified these problems. It was only available from the log search page, required complete configuration before deployment, hid settings across separate pages, and was notoriously difficult to find once set up. Users couldn't quickly create a simple monitor — they had to configure everything upfront or nothing at all.
Legacy Scheduled Search
- Only accessible from log search page
- Required complete configuration before deployment
- Settings hidden across separate pages
- Difficult to locate and manage after creation
- Deep query language expertise required
What Customers Needed
- Create monitors from anywhere in the app
- Start simple, add complexity incrementally
- All settings visible and accessible
- Central location for monitor management
- Visual configuration without query expertise
Monitor status page: the management landscape
Leading the design vision
Primary UX lead for a newly formed team, owning end-to-end design and mentoring a junior designer.
A new team was formed specifically for this initiative. The core team included a Product Manager, Senior Staff Engineer, three front-end engineers, and more than a dozen back-end engineers. I served as the primary UX lead, with a junior designer reporting to me whom I mentored on the output message design component.
What I owned
- Competitive research leadership — Led research efforts with the PM, Sr. Staff Engineer, and other product leaders, analyzing DataDog, Splunk, and New Relic's monitoring solutions
- End-to-end design — Responsible for the entire design from concept through implementation, including new components (icons, modal design, and interactive elements)
- Framework architecture — Created the end-to-end UX framework that would guide future product development
- Design governance — Conducted regular UX review meetings with team leads and the extended team, with periodic reviews across the broader UX organization
Central monitor list: the team's core deliverable
Research, iteration, refinement
Competitive analysis, customer interviews, and iterative design cycles with Sumo's UX research team.
We began by understanding current monitoring practices — both in the field and among our competitors. Through research reviews, customer interviews, and conversations with internal support teams, we learned what our customers were looking for and where we needed to focus.
Competitive analysis
Deep dive into DataDog, Splunk, and New Relic monitoring solutions. Reviewed these products with stakeholders to understand industry standards and identify differentiation opportunities.
Customer & support interviews
Conducted interviews with customers to understand pain points and desired outcomes. Spoke with internal support teams to identify common struggles and professional services escalations.
Design iteration
Multiple rounds of proposals focusing initially on log search and metrics tracking. Practiced a regular pattern of design, research, and refactoring with support from Sumo's UX research team.
Key research findings
- Customers wanted visual configuration — not just query-based setup
- Alert fatigue was a universal pain point across all platforms
- Integration with existing workflows (PagerDuty, Slack) was essential
- The system needed to scale from simple thresholds to complex ML-based detection
Monitor configuration: iterating on the trigger interface
Designed for extensibility
Application-wide monitor access, severable configuration areas, and reimagined notifications.
Three key design decisions shaped the monitoring system's success and established patterns that would guide future products.
Application-wide accessibility
Unlike previous feature implementations that siloed configuration to feature-specific pages, we made monitor configuration available across the entire Sumo Logic application.
Users could create and edit monitors from any search result, dashboard panel, or metrics exploration — and eventually from security dashboards and reports as well. This framework made it easy to extend the system to accommodate different monitor types as the platform evolved.
Severable configuration areas
We partitioned the modal into independent sections, enabling users to configure base monitor settings quickly and add complexity incrementally. A functioning, near-real-time monitor required only three things: a query, one trigger condition, and a name.
This was a stark contrast to legacy scheduled search, which necessitated complete configuration, hid sections across separate pages, and required all fields to be configured before deployment.
Simplified notifications
Notifications were completely reimagined. They became optional rather than required, easy to configure, and supported multiple notification channels simultaneously. Users could enable or disable individual notifications without affecting the rest of their configuration.
This flexibility meant teams could iterate on their alerting strategy without rebuilding monitors from scratch — a common frustration with the legacy system.
Independent configuration sections: start simple, add complexity over time
A platform transformed
Steady enterprise adoption, legacy sunset by 2022, and an architecture that scaled to smart alerts and anomaly detection.
Upon release, unified monitoring was an immediate success. We measured steady growth and adoption from introduction, onboarding several large-scale enterprise customers who were able to effectively translate existing monitors from third-party solutions.
Customer feedback was generally positive, and the UI was universally well received. We extended feature capabilities significantly beyond what the original scheduled searches offered, without changing the foundational patterns.
2020 — Platform launch
Unified monitoring released with log search and metrics tracking. Immediate adoption and positive reception from enterprise customers.
2021 — Alert Response Platform
Built the Alert Response Platform on top of unified monitoring, providing incident context, related alerts, and automated playbooks.
2022 — Feature parity & legacy sunset
Reached feature parity with legacy scheduled search. Notification capabilities — initially an area where we struggled — matched and then exceeded the original tools. Successfully sunset the scheduled search real-time reporting feature in favor of the unified monitoring solution.
2023 — Smart Alerts & Anomaly Detection
The extensible architecture paid off as we added advanced analytics capabilities. The framework accommodated these new monitor types without architectural changes.
Central monitor list: view, manage, and organize all monitors
Lessons learned
A concession on trigger configuration and hard-won lessons in navigating stakeholder dynamics.
If I could revisit this project, I would have pushed harder for a simplified trigger configuration design. Instead, I conceded to a natural language interface approach that others on the team advocated for. The NLP-based trigger configuration ended up being difficult to maintain and was the only part of the design that proved problematic over time.
This project also gave me a better understanding of stakeholder dynamics. I learned to identify which leaders could be relied upon to pursue better UX versus those for whom convenience and shipping velocity were paramount. That awareness has shaped how I navigate design decisions and build consensus in subsequent projects.
The most successful enterprise tools feel like consumer products — even in high-stakes environments, every interaction should feel intuitive and deliberate.
The configuration modal: accessible from anywhere in the app