User story: How automating reporting helps a top 3 bank improve MTTR by 90%
Imagine being one of the world’s largest banking institutions and experiencing a critical severity security incident that affects millions of customers and billions of dollars. Now imagine it takes 20 minutes from the first notification to log a response and track down all relevant information from a variety of global systems just to understand the context and nature of the incident. At long last, the security team has all the info they can find and sends a secure email to a group of key stakeholders. Then the tough job of resolving the issue begins!
The challenges of coordinating incident response at enterprise scale
The IT organization at this bank is massive and global, with multiple incident response groups covering various products and regions. At the top is a command center that manages all the distributed teams and systems. During a critical incident, this group pulls team members in from other areas to join the response on live calls. Some of the key challenges they face include:
- Understanding the situation. It takes a lot of time to manually track down affected systems or monitoring tools, open links, specify report parameters, and estimate the impact, among other things.
- Documenting what’s being done. The team also needed to spend time logging the initial steps taken, agreeing on the severity of the breach, determining who is responsible for the next steps, and estimating the time to fix.
- Communicating with internal and external stakeholders. In addition to advising leadership on the progress of the response, the team needs to establish what is communicated to customers and the public. Who needs to know what? How can they be contacted?
This organization was working with a legacy chat system that didn’t offer persistent chat and only supported disappearing, ephemeral group messaging. As a result, every new team member joining a response effort had to be manually briefed on the situation, which slowed down the process immensely with repetitive updates. Making matters worse, everything shared could be easily lost unless more time was taken to document and archive discussions.
Using Mattermost to accelerate time to resolution
This bank made the decision to adopt Mattermost for secure and flexible messaging. In an instant, they gained:
- A self-hosted, secure server for communication with colleagues across the IT organization, around the world.
- Private channels populated with select members from across the organization to discuss an ongoing incident.
- Persistent messages that enable colleagues who join incident response channels later to learn what’s been done and get up to speed by reviewing the channel history.
- Channel headers, pinned messages, and file sharing to store and share key information.
- Smooth handoffs across time zones with in-channel standups to communicate what’s been done, what comes next, and any blockers.
With Mattermost as their messaging system, the bank team reduced their incident response time by 90% to 2 minutes. The multi-channel workspace with dedicated, specific channels for various stakeholders helped speed this process significantly for each individual incident as well as all incidents going forward thanks to the saved, accessible information.
Secure, compliant data access across the organization
Even more critically, incident response in self-managed Mattermost meets custom compliance requirements for data control and history archiving. Not only does the team save time and money on the actual response reaction, but in the future, they can quickly pull additional reports, metrics, logs, and anything needed for audit or archival purposes.
With that success under their belt, the team has also expanded their use of Mattermost to cover a wider variety of incidents and monitoring across their IT organization. The decreased time to respond stems from sharing relevant information in secure, persistent channels, and staying in sync across distributed teams and incidents. As a result, the team is able to repeat success — not repeat the same work over and over again.
Want to learn more about how teams use Mattermost to improve incident response workflows, increase productivity, and more? Read more user stories here.