Responding to incidents both large and small is an essential workflow for every team, but not every team has invested in a fully-fledged incident management strategy.
In our recent webinar, Effective Incident Management: How to Improve DevOps Efficiency, Mattermost VP of Engineering Chris Overton and Customer Engineer Paul Rothrock share their experiences creating best practice incident response workflows and demonstrate how real-time messaging enables DevOps teams accelerate incident response times.
During the webinar, Chris shared his thoughts on the key components of effective incident management. Read on to learn about these elements of incident management and how your team can ensure that you’re well-prepared to handle any incident that comes your way.
1. A well-defined incident management process
It’s not uncommon for incident management to be completely ad hoc. Teams wait for something to happen and then take a “dive and catch” approach to incidents. Other teams have some processes in place, but those plans may be missing key components. In both cases, these teams struggle with slower time to resolution and often waste resources in the process of responding to incidents.
A well-defined, effective incident management process helps create faster, smoother responses by ensuring that every person on the team understands their role and where to access information. Putting that process in place before you need it ensures that your team can react swiftly and effectively.
2. Tooling that supports and accelerates incident response
“Most of our systems are complex enough that it’s nearly impossible to manage incidents manually these days,” Chris says during the webinar. He also notes that even for teams who have an incident process in place, poor tooling can hold them back:
“One of the problems we tend to have with tooling is that it tends to be quite scattered. We get different pieces of information that exist in different systems, and it’s really hard to have your tooling work for you versus having tooling be a place to go and get information.”
The right tools centralize and surface information efficiently and help keep every stakeholder on the same page. In the second half of the webinar, Paul shares how teams can use Mattermost’s incident management functionality to codify and streamline incident response and why the right tools can make a major impact on your team’s efficacy.
3. Internal and external communications plans
Having a communications plan for incidents might seem like it’s just for customers. But Chris stresses that both internal and external communication practices are an essential part of an effective incident management strategy.
“Probably the biggest problem for teams that struggle with incident management is visibility,” says Chris. “When you’re in the heat of the moment and you have an ongoing incident, it’s sometimes very hard—especially for people who didn’t start at the beginning of the incident. The very first thing that person will ask is, ‘What is the current state of things?’” Being able to provide every member of the team of the team with the visibility they need to jump in and play their part quickly is key.
4. Clear incident process documentation
A well-documented incident management strategy is an effective one. Having everything written down ahead of time is essential for everything from training new team members on what they need to know about incidents, to managing dry runs, and keeping the team on the same page when actually dealing with an incident.
For an example of great incident response documentation, check out PagerDuty’s incident documentation, which covers everything from expectations for being on-call to what to do after an incident.
Watch the webinar to learn more about incident management best practices
To learn more about incident management best practices and to see what incident management looks like within Mattermost, watch Effective Incident Management: How to Improve DevOps Efficiency.