As the Senior Director of Engineering at Clari, Balaji Narayanan’s responsibilities break down into two buckets.
First, he’s focused on ensuring that Clari users have the best experience possible. This means staying on top of system availability, reliability and addressing any performance degradations—an absolute must, given how essential their revenue platform is. And second, he works to enable his engineering teams to focus on building and shipping products quickly. With a tried-and-true incident response process, this latter responsibility becomes easier.
Unfortunately, Clari’s incident response workflow, which they outgrew due to their increasing size and criticality, was highly manual and inconsistent. These challenges created significant barriers to efficient incident response and made learning from them a considerable challenge.
“When you’re in an incident, your focus should be on recovery. After you have recovered, you should focus on learning. We were struggling to run the process correctly. So we couldn’t focus on the right things,” says Balaji.
So when incident.io came into the fold, they were able to meaningfully improve the way they ran their incidents from start to finish. Not only that, but with the platform rolled out, they'd be able to achieve one of their biggest goals: to roll out a global incident management program to ensure consistency across regions and teams.
Life before incident.io
Before adopting incident.io, Clari dealt with one specific issue that affected its incident response—inconsistency.
“We had a very manual and bespoke process. We did have certain guidelines on how to start incidents, how to run them, and how to do follow-ups, but we were basically following instructions from Confluence or Google Docs,” says Balaji.
Beyond declaration and channel naming, this inconsistency also impacted other parts of their response flow, too.
Ad hoc response processes
Software Architect Kurt Andersen has worked alongside Balaji Narayanan on the Infrastructure team for over a year. During his tenure, he noticed firsthand how challenging it was to respond to incidents at Clari.
One thing he noticed, particularly, was how the incident response process was ultimately tied to who was leading it.
“We had an incident command rotation with a primary, secondary, and third-level fallback. But it was, I'd say, very tribal knowledge and isolated as to what people knew and how they did things,” says Kurt.
“It was not just bespoke, but it was very much at the whim of the individual incident commander. They weren't necessarily being pulled in effectively or quickly.”
Challenges with context sharing
One of the most critical aspects of incident response is context sharing. When someone joins an incident channel midway through, they should be able to get up to speed quickly without disrupting the flow of response. Unfortunately, this was exactly one of the issues both Kurt and Balaji were dealing with.
“There would be lots of ad hoc Zoom calls and very little documentation in the Slack channel,” says Kurt.
Some responders shared updates regularly, while others didn’t. So, if someone joined the channel halfway through, someone would need to take responsibility for sharing as much information as possible.
In the end, this meant that context sharing wasn’t a reliable process or baked into their workflows—it was highly unpredictable.
It’s time to make the switch
Put together, Balaji knew it was only a matter of when, not if, they’d be on the search for a dedicated incident response platform. “When I joined in March of last year, I was happy that Balaji was already convinced of the importance of this,” says Kurt.
So began their evaluation process for a tool to address all the pain points they were dealing with.
“We had specific requirements. The UX needed to be relatively intuitive. The more intuitive it was, the better it scored for the product. And we wanted support getting communications out to all the relevant parties that needed to be kept in the loop on what was going on with incidents,” says Kurt.
Here were some of their other requirements, based on an evaluation document Balaji and Kurt shared with us:
- Intuitiveness of incident declaration and coordination: It is easy to "do the right thing" and let the tool run automations so we can focus on the problem
- Incident visibility: As a manager or executive, it's easy to self-serve information, search live incidents, and request updates when necessary
- Data visualization and insights: We can capture relevant data points to learn from our incidents and improve/adapt over time
Better visibility, communication, an intuitive platform, and a partnership: life with incident.io
Now that a dedicated incident response platform is in the fold, the Clari team has been able to run better incidents from start to finish. From helping roll out their global incident management program to building a partnership with our product team, here’s how adopting incident.io has improved incident response at Clari.
Running a global incident management program
One of the most significant benefits that incident.io has enabled is giving Clari the tools and confidence it needed to launch a global incident management platform—something that would have been difficult previously with ad hoc workflows.
Thanks to the intuitiveness of the incident.io platform, they were able to effectively scale incident management responsibilities across several time zones.
“We wanted to roll out the same process to every team. Ultimately, you want to ensure a unified view if you have different products under your portfolio like we do. So I believe having a standardized tool and practice and removing some of the manual cognitive load helped.”
A single, unified way to declare incidents
Before adopting incident.io, incident response workflows varied from team to team, leading to process gaps. But since switching to the platform, they’ve been able to lean on a single process to roll out to incident responders across the globe.
“The biggest advantage we have right now is that there’s only one way to declare an incident. There’s only one way to track an incident and one way to track an incident evaluation. That in itself is a big win for us,” says Balaji
Another benefit of this simplicity? Newcomers can seamlessly run incidents without added context.
“We recently had a product manager fire up an incident for the first time. He had never done one before but knew enough to kick it off. And then I could come in and guide him and say, ‘Okay, great. You’re off to a good start,’” says Kurt.
Not just customers, but partners
Many companies have experienced vendors who practically disappear after onboarding. At incident.io, we focus on building partnerships with our customers well beyond the rollout phase, something the team at Clari has witnessed firsthand.
“The support team has been excellent. Herbert and George—when you were rolling out the AI assistant—were accommodating. I'm still impressed at the speed with which fixes are turned around,” says Balaji.
When Clari raises support requests for issues they’re experiencing, they’ve noticed how quickly the team jumps to resolve them—not in weeks, but days, if not hours.
“I think the biggest thing I see is when I comment in our Slack channel, it gets added to your product board. And then someone comes back and says, ‘Hey, by the way, this is fixed.’ And it’s not weeks or months for turnaround and fixes. It's days, which is super, super impressive.”
incident.io is for more than just incidents
Finally, beyond the ability to roll the platform out globally for core incident response, Clari has also been able to use incident.io in more non-traditional ways that require coordination across teams—namely, product releases and maintenance events.
“We started as a global incident management program for customer-facing incidents. But we've also incorporated security incidents into it. I think the critical thing is that we have a blueprint to replicate for different problems,” says Balaji.
“There's one way you do your process, and then you can do the same for any other. So that's what I see as the critical value that incident.io brings.”