PagerDuty Build It | Ship It | Own It Thu, 22 Aug 2024 17:31:43 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Modernize your Operations Center and Build Operational Resilience with the Latest Features from PagerDuty by Cristina Dias https://www.pagerduty.com/blog/ops-center-modernization-latest-features-2024/ Tue, 20 Aug 2024 13:00:50 +0000 https://www.pagerduty.com/?p=88883 Global IT disruptions and outages are becoming the new normal, testing the operational resilience of businesses everywhere. How well prepared your team is to handle...

The post Modernize your Operations Center and Build Operational Resilience with the Latest Features from PagerDuty appeared first on PagerDuty.

]]>
Global IT disruptions and outages are becoming the new normal, testing the operational resilience of businesses everywhere. How well prepared your team is to handle major incidents determines how fast the business can return to normal. Operations Centers are relied on to manage these disruptions and ensure quick recovery. They’re the point of entry for incoming data that holds important signals of impending failure that impact customers, the business, and the bottom line.

When we talk to customers about their modernization initiatives for their operations centers, we hear common challenges. Many companies are currently incurring high costs for low-value work while introducing business risks. However, leading companies are using automation to manage chaos, drive innovation, and build the operational resilience required for modern digital businesses. It’s key to ensure that your operations center is using best-in-class capabilities—including AI and automation—to get ahead of issues, let machines serve as the first line of defense, and provide immediate context to the right teams.

Here are four new enhancements to the PagerDuty Operations Cloud that can help Operations Centers do just that.

Operations Console

Many organizations struggle with the increase in data and the disparate observability tools pumping in too much noise. With manual processes and eyes-on-glass methods to handle this information, operations center engineers experience alert fatigue, making them prone to missing key signals and incorrectly prioritizing issues. This puts the company at risk for loss of revenue and poor customer experiences.

However, with the right amount of visibility, Operations Centers can reduce alerts and optimize monitoring signals by correlating data from observability tools, telemetry data, and customer signals into one unified view. This can reduce operating costs, eliminate redundancy, and potentially help streamline tooling. It’s a win-win for the business and the subject matter experts. For instance, if an outage occurs, having a unified view can help teams quickly identify and resolve issues, minimizing the impact on customer experience.

The PagerDuty Operations Console helps teams create a customized live dashboard to triage and take action on issues immediately. Users can leverage configurable tabular and filter components to zero in on relevant information such as priority, severity, and more. This feature ensures that team members are working from a single source of truth in one centralized location. This reduces noise and allows you to mobilize a more focused, effective response when your operations teams are notified.

Operations Console Dashboard

The Operations Console is generally available to PagerDuty AIOps customers. Take the product tour.

Dynamic Escalation Policy Assignment and Dynamic Routing

Operations Centers need to run as efficiently as possible. And yet, too often resources and capacity are wasted attempting to resolve issues manually at the L1-L2 level when really they need to be routed or escalated immediately. When customer experience is on the line, there’s no room for error and wasted time comes at a high cost.

Operations Centers need to immediately know whether an issue can be resolved via automation or by L1-L2, or whether it needs to be sent to the right team or person. And, if the incident does need to be rerouted or escalated, teams cannot rely on manual processes. Using automation to accomplish this based on historical data and highly customizable rules allows teams to achieve faster resolutions, improve customer experience, and boost team morale.

With Dynamic Escalation Policy Assignment, organizations can centrally and automatically manage how Escalation Policies work during a variety of circumstances, scaling incident management best practices across teams. This reduces cost and customer impact. With Dynamic Routing, organizations can leverage historical data and dynamically configure routing rules to appropriately send problems to the right team at the right time every time. Managing these routing rules is easier than ever and can be controlled centrally for a more standardized approach.

Edit Event Rule

Dynamic Escalation Policy Assignment and Dynamic Routing are now generally available for AIOps customers.

Global Intelligent Alert Grouping

Alert storms are a common challenge in modern Operations Centers, leading to noise fatigue and delayed responses, significantly impacting network performance and customer experience. By intelligently grouping alerts across services using both built-in machine learning models and customizable logic, this feature not only consolidates related alerts into fewer, more manageable incidents, but also improves mean time to resolution (MTTR) by helping responders quickly identify and act on the most critical issues.

NOC teams can consolidate multiple alerts into a single incident, minimizing the creation of redundant alerts and simplifying incident management, so they can focus on addressing real issues rather than getting overwhelmed by a flood of notifications. This is especially crucial during major incidents—like outages—as it allows teams to mobilize a focused and effective response. Deploying automation throughout your incident management process can expedite diagnostics and fixes in the aftermath of large-scale incidents, ensuring services are restored quickly and efficiently.

In addition to reducing alert noise, Global Intelligent Alert Grouping enhances the understanding of the incident scope. By grouping alerts across services, teams gain a clearer view of the incident’s impact, ensuring that the right teams are engaged and coordinating effectively. This leads to a more organized and efficient cross-functional response, ultimately improving operational reliability and customer satisfaction.

Alert Grouping

Teams can now customize their Intelligent Alert Grouping by selecting their preferred alert fields (up to 5 fields) for textual similarity analysis. Global Intelligent Alert Grouping and Intelligent Grouping with Advanced Options are in Early Access for AIOps customers only. Sign up here.

PagerDuty Advance

Operations Centers often struggle to identify and address the root causes of issues due to the overwhelming data noise, making it challenging to determine what’s important and how issues originated. This leads to wasted valuable time searching for information that AI could easily surface, creating bottlenecks in incident detection and diagnosis and making proactive responses difficult.

PagerDuty Advance modernizes operations, transforming the traditional, human-intensive model of NOCs into a streamlined process that moves from Event to Resolution with minimal toil and increased speed. Our AI assistance allows teams to ask questions to accelerate action, gather context, and receive proactive guidance directly from Slack during incidents, enabling faster triage and remediation. This in-depth contextual support throughout the incident lifecycle lightens the mental load on responders, allowing them to focus on higher-value activities while outsourcing drafting and knowledge-gathering tasks to AI.

PagerDuty customers leveraging PagerDuty Advance have experienced many benefits:

  • Reduced and eliminated toil of information gathering and analysis during critical operations work.
  • Reduced the time and coordination needed to craft tailored communication updates to all stakeholders.
  • Reduced time to create post-incident reviews and provide recommendations for future improvements.
  • Achieved a 360° view of customer impact, breaking organizational silos.
  • Immediate and relevant insights through a conversational UI, and more.

PagerDuty Advance

Learn more about Generative AI (GenAI) at PagerDuty.

Building Resilient Operations Centers

With these latest features, the PagerDuty Operations Cloud is providing customers with an even more robust solution for modernizing their Operations Centers. We’ve been supporting operations centers and positively impacting businesses by saving millions annually through resilient systems and tool consolidation, boosting productivity by reducing noise and manual toil, and mitigating risk by preventing incidents and reducing downtime costs.

And don’t forget to use every unplanned incident as a chance to learn. Although challenging, major incidents offer valuable insights into your process and prevent future disruptions. Investing in your incident management process helps reduce risks when major issues arise. While cost pressures are common, prevention is more cost-effective than dealing with incidents, so it’s key to build resilience and redundancy into your infrastructure. Always consider the long-term costs and risks before consolidating technology for short-term savings.

To further boost your operations center’s resilience, join our upcoming webinar, on September 10, 2024, at 8 AM PT / 11 AM ET / 4 PM BST. Hear from PagerDuty’s Frank Emery and Frances Wang as they explore how AIOps can enhance your incident management and outage response. Register now to gain valuable insights and strategies for future-proofing your operations center.

If you’re looking to harness AI and automation in your organization to get more efficient and respond faster to incidents, try us out today for free.

The post Modernize your Operations Center and Build Operational Resilience with the Latest Features from PagerDuty appeared first on PagerDuty.

]]>
Set Responders Up for Success with New User Onboarding by Cristina Dias https://www.pagerduty.com/blog/set-responders-up-for-success-with-new-user-onboarding/ Wed, 01 Nov 2023 12:00:26 +0000 https://www.pagerduty.com/?p=84791 Effective incident response plays a critical role in maintaining smooth operations at organizations of all sizes. When built up correctly, operational resilience–that ability to bounce...

The post Set Responders Up for Success with New User Onboarding appeared first on PagerDuty.

]]>
Effective incident response plays a critical role in maintaining smooth operations at organizations of all sizes. When built up correctly, operational resilience–that ability to bounce back quickly after failure–can act as a shield that guards your customer experience, ensuring that even when incidents inevitably happen, you’re back online in no time. But in order to stand up the strongest foundation you can to resolve faster and deliver world-class reliability–and to get the most out of a system of action like PagerDuty–you first need to be set up for success. That’s why we’re thrilled to introduce the general availability of two game-changing features. Meet the new PagerDuty onboarding flow and the User Onboarding Report: two partners to help you and your team smoothly onboard with PagerDuty.

New Onboarding Flow: Shortening the time to effective response

So you’ve signed up with PagerDuty–great! We’re excited to have you aboard our best-in-class platform for managing incidents. It may be new to your team, however, and new tools and technology take time to learn before they start working seamlessly in your processes. We want to make sure that the PagerDuty platform is doing whatever we can to help ensure that your team is equipped with the knowledge and skills to respond effectively to incidents right away. That’s why we’ve designed an updated onboarding flow to provide essential support, ensuring that every member of your team is not just competent but highly proficient in handling incidents.

Reducing the learning curve

For organizations just getting started with incident response, adopting the incident response process and methodology can be a bit of a puzzle. Who does what? How does it all work? Typically, administrators shoulder the responsibility of introducing these processes, which can be a daunting task without proper onboarding guidance. Consequently, new users may struggle to grasp their roles, leading to delays in resolving incidents. The new onboarding flow is the key to solving this puzzle. What we’ve done is simplified the onboarding process by allowing responders to customize their notification preferences upfront.

The onboarding flow also surfaces the PagerDuty Mobile App right up front so they can download and start acknowledging and resolving incidents anywhere, anytime, right from their mobile devices.

We’re also providing them with easily digestible video training on key incident response concepts. It acts like a crash course in becoming a more effective responder so they can start working towards reducing Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR), faster.

Welcoming you to PagerDuty

1. Welcome Email: Our comprehensive onboarding experience begins with a warm welcome. Expect to receive a personalized email invitation to join PagerDuty.

Screenshot of the welcome email to new users, prompting them to join PagerDuty.

2. Phone Number Setup: We care about your convenience and security. As your next step, you’ll be prompted to provide your phone number and, for good measure, receive a test notification.

Screenshot of the phone number setup.

3. Customize Your Notifications: PagerDuty understands that urgency varies. You can set up your timezone and notification preferences for high-priority incidents. Choose between push notifications, SMS, or a direct call as your first contact method to ensure you’re always in the loop.

Screenshot of the notifications preferences setup.

4. Mobile App Integration: If you haven’t already added your phone number in step one, don’t worry. We’ll gently remind you to do so at this point. Plus, you can download the PagerDuty Mobile App to stay connected on the go.

Screenshot of the mobile app integration.

5. Learn the Basics: To get you up to speed, we’ll prompt you to explore the fundamentals of PagerDuty in under 5 minutes. This includes resources like “Incident Response 101” and a self-paced tour on “Schedule Basics.”

Screenshot of the learn the basics.

Just for Admins: User Onboarding Report

In many organizations, effective utilization and license planning in PagerDuty can be a delicate balancing act. You don’t want to overcommit, but you also don’t want to be caught off guard. Enter the User Onboarding Report.

With the User Onboarding Report, Account Owners and Admins can say goodbye to license management hassles, get real-time insights, and boost efficiency!

Screenshot of the User Onboarding Report.

What does it do?

As an admin, the User Onboarding Report is your secret weapon. You get a crystal-clear view of PagerDuty usage. And, with this knowledge in hand, you can make informed decisions to optimize resource allocation and ensure that incident management runs smoothly and efficiently.

For example, an admin could filter to understand all of their users who aren’t yet assigned to an Escalation Policy by looking at users who are using licenses, but aren’t active as responders. That can then be your hit list to see if they still need that seat, or if they need support in getting onboarded properly and using the platform to its fullest potential. 

Another example would be that they could see all users who have been invited to join the platform, but have not yet accepted their invitation. You can then use that to facilitate conversations with managers to curate the best path forward for utilization. 

Getting the most out of your investment in PagerDuty

With the new onboarding flow and the User Onboarding Report, PagerDuty is putting the power of seamless incident response and resource optimization in your hands.

So, are you ready to take your incident response game to the next level? Dive into these features, empower your team, and watch as your incident management becomes smoother, faster, and more efficient than ever before. PagerDuty has your back, and together, we’ll keep your organization safe and sound, no matter what challenges come your way.

Demo: Improving the onboarding experience for you

To see these great new features in action, Senior Product Manager for Growth, Alex Quintana, deep dives into them and teases a cool upcoming feature in this video.

Get in the game today

Learn more about the User Onboarding Report in this Knowledge Base article. Joining and setting up accounts in PagerDuty just became easier, why not give us a go for 14-days free of charge?

The post Set Responders Up for Success with New User Onboarding appeared first on PagerDuty.

]]>
6 Best Practices for Seamless Notifications with International SMS by Cristina Dias https://www.pagerduty.com/blog/6-best-practices-for-seamless-notifications-with-international-sms/ Tue, 05 Sep 2023 12:00:43 +0000 https://www.pagerduty.com/?p=83907 There’s no denying it: in today’s interconnected world, Application-to-Person (A2P) SMS notifications have become an integral part of our daily lives. Whether it’s receiving crucial...

The post 6 Best Practices for Seamless Notifications with International SMS appeared first on PagerDuty.

]]>
There’s no denying it: in today’s interconnected world, Application-to-Person (A2P) SMS notifications have become an integral part of our daily lives. Whether it’s receiving crucial banking alerts, getting updates from our favorite retailers, or even surfacing a notification from PagerDuty when your service is down–SMS keeps us informed and connected. But have you ever wondered about the intricacies behind this seemingly straightforward technology? It’s more complex than you might think.

Here at PagerDuty, we are dedicated to making sure that notifications reach their intended recipients and we’ve learned a lot about international SMS best practices along the way. In this blog, we delve into international SMS best practices and the critical factors that can make or break your SMS notifications game so that you know how to optimally configure your notifications to never miss a page from PagerDuty.

Navigating the Challenges of International SMS

Who better to share learnings than the woman behind the curtain, responsible for our notifications experience? We interviewed Abby Allen, Senior Product Manager of the Notifications Experience Team, to help shed light on the challenges of international A2P SMS and offer some insights for improving SMS deliverability. 

The Illusion of Reliability

Contrary to popular belief, Abby warns that “SMS is far less reliable than people think!” A2P SMS often faces disruptions due to network outages or planned maintenance, affecting message delivery. This unreliability is hidden behind the seamless communication we experience in person-to-person SMS, which is very opaque by design and helps prevent recipients from noticing issues with the underlying carrier networks. Imagine your business-critical alert going undelivered due to an SMS outage–the consequences could be detrimental. Therefore, PagerDuty “monitors our international deliverability and encourages everyone to have a backup channel to reach their users. If your system also offers SMS, make sure there’s a backup communication option for your users, too.”

Global Audience, Unique Needs

Expanding your SMS strategy beyond local borders is an opportunity to tap into a vast international market. Nonetheless, each region has its own regulations, carrier restrictions and user preferences.

Abby highlights some particularly challenging regulations. For instance, nearly every country requires opt-in to confirm a recipient actually wants SMS from you. An easy way to do that is to ask the recipient to “Reply Yes” or click a link to confirm. However…

  • France, Vietnam, and many others require all A2P SMS to be sent from an alphanumeric sender ID. This kind of SMS sender ID is required in a growing number of countries but doesn’t allow replies at all. Your “Reply Yes” won’t work for millions of international users.
  • Norway and other countries will actively block any URL from standard link shorteners like bit.ly. 
  • China and Romania block all SMS with any URL. This makes a workflow with a link to click to confirm opt-in challenging.
  • Singapore: to send SMS to Singapore, you must register your content templates with the Singapore government and pay substantial fees. Any content that doesn’t comply with your registered templates is subject to blocking and filtering. Even if you have a great workflow that doesn’t require replies or links, you still need to get it approved by a government entity. 

Neglecting the needs of your international audience can lead to missed opportunities and a less effective communication strategy.

Legal Compliance Matters

International SMS law is a complex web of regulations that can have financial repercussions for non-compliance. Sending unauthorized messages, violating content restrictions, or spamming recipients can result in penalties. Familiarize yourself with the regulations in your target regions to ensure your SMS campaigns are both effective and legally sound.

6 Best Practices for Optimizing Internal SMS and Notifications Settings

Understanding the challenges is only half the battle. Let’s explore actionable strategies to ensure your SMS notifications hit the mark–regardless of borders.

1. Default to Push, Keep SMS as Backup

Staying proactive is key. Regularly monitor the deliverability of your international SMS to identify potential issues. However, relying solely on SMS is risky. Consider SMS as a supplementary channel rather than your primary one. At the time of writing, push notifications are subject to far less international regulation and provide significantly more opportunities for engaging, interactive messaging. Abby recommends push notifications as one of the most reliable alternatives to ensure your message reaches its destination, along with email and phone calls.

If you’re a PagerDuty customer, why don’t you give the PagerDuty mobile app a go? This critical component of the PagerDuty Operations Cloud empowers our users with unmatched convenience, agility, and adaptability, ensuring a flawless orchestration of incident management and collaboration across borders. Better yet, push notifications within the app typically deliver 4 to 6 times faster than SMS.

2. Give Other Messaging Platforms a Go

Instead of sticking to traditional SMS, consider exploring other messaging platforms, like Slack, Microsoft Teams or WhatsApp. These platforms provide an international experience with fewer regulatory hurdles, making it easier to reach your global audience seamlessly.

PagerDuty is opening Early Access for Slack as an incident contact method to customers the week of September 20. If you’re interested in participating in the program, you can sign up here. Starting incident response in chat enables customers to immediately reap the benefits of improved MTTA and MTTR. The use of chat will directly tie collaboration to incident management, minimize context switching, and automate manual tasks thus enabling faster incident remediation.

3. Tailor Your SMS Content to Industry Regulations

Before hitting send, ensure your SMS content adheres to the regulations of the target region. Certain industries, such as banking, finance and adult content, face stringent content restrictions to protect citizens against spam. If you manage an application that supports any of these industries, involve your legal team to avoid violations that could lead to hefty fines and damage your brand’s reputation.

4. Plan for Change

SMS regulations are a moving target. What works today may not work tomorrow due to shifting requirements. Prepare for change by having backup communication channels in place and being adaptable to evolving regulations.

5. Optimize for User Experience

Put yourself in your users’ shoes. Craft SMS messages that are concise, relevant, and valuable. This user-centric approach enhances engagement and minimizes the chances of your messages being marked as spam.

6. Monitor Success at the Country Level

When conducting international SMS deliveries, it’s advisable not to depend solely on a single global SMS deliverability metric to validate the effectiveness of your campaigns. Instead, consider delving into insights at the country-specific level. This approach ensures that the achievements of larger and established markets with successful SMS campaigns don’t overshadow potential deliverability hurdles in your emerging ones.

As technology bridges global divides, international SMS remains a potent tool for businesses to connect with audiences worldwide. However, the path to SMS excellence is paved with challenges that demand proactive measures. By embracing best practices such as diverse messaging platforms, regulatory compliance and robust monitoring, you can unlock the true potential of international SMS and deliver unparalleled user experiences.

Ready to Dive Deeper?

Abby, along with PagerDuty colleagues Girish Shankarraman and Vivek Raj Saxena, joined Mandi Walls for a “How To Happy Hour” Twitch stream to share these learnings and more. Catch the recording to learn from their expertise and enhance your SMS strategies. Join in the discussion in the comments section.

If you’re interested in trying the PagerDuty mobile application, you can simply scan or click the QR Codes below and download it today.

 

The post 6 Best Practices for Seamless Notifications with International SMS appeared first on PagerDuty.

]]>
What’s New: Enhanced PagerDuty Analytics for Faster Insights and Smarter Recommendations by Cristina Dias https://www.pagerduty.com/blog/whats-new-enhanced-pagerduty-analytics-for-faster-insights-and-smarter-recommendations/ Thu, 31 Aug 2023 12:00:18 +0000 https://www.pagerduty.com/?p=83623 Data has become the lifeblood of businesses, empowering organizations to make more informed decisions, drive innovation, and gain a competitive edge. McKinsey touts the benefits...

The post What’s New: Enhanced PagerDuty Analytics for Faster Insights and Smarter Recommendations appeared first on PagerDuty.

]]>
Data has become the lifeblood of businesses, empowering organizations to make more informed decisions, drive innovation, and gain a competitive edge. McKinsey touts the benefits of adopting data-supported capabilities, referring to the various ways data is utilized to enable and enhance the functioning of an organization. These capabilities enable faster and more powerful insights, leading to “better decision making as well as automating basic day-to-day activities and regularly occurring decisions.”

But there’s another side to data—it’s everywhere. And when data is dispersed across various tools and silos, that poses enormous challenges for incident response teams looking to efficiently handle unplanned incidents. This leads to delays in resolving real-time incidents that can lead to lost revenue and eroded customer trust. Periodic data consolidation to a business intelligence tool further exacerbates the issue by providing stale information.

At PagerDuty, we seek to empower our customers to make data-driven decisions in their journey toward a resilient operational foundation that keeps costly downtime to a minimum. You can only improve what you measure. For teams looking to shift their approach to digital operations towards a more proactive and preventative state, access to the right actionable data is paramount. That’s why with the PagerDuty Operations CloudSM , we are opening access to a comprehensive suite of analytics capabilities specifically crafted to address the intricacies of modern digital operations.

So far, we’ve witnessed that users who interacted with PagerDuty Analytics observed a 26% enhancement in their mean time to acknowledge (MTTA). This improvement was accompanied by a more balanced allocation of tasks and consistent response times, resulting in a total annual time-saving of 100 hours (as per PagerDuty internal calculation based on product metrics).

Our latest release of PagerDutyAnalytics introduces dynamic new reporting with simplified data filtering, enabling responders to achieve faster incident resolution with more intelligent insights. We’re excited to announce the general availability of a comprehensive list of analytics capabilities, including:

  1. Insight Reports
  2. Analytics API
  3. Recommendations Report
  4. Operational Reviews

These new releases are now accessible to all paying PagerDuty customers. With PagerDuty Analytics, you can harness the true potential of data-driven decision-making in your efforts to drive operational excellence, optimize customer experiences, and build resilience.

Gain visibility and control with PagerDuty Analytics

We’ve invested significantly in providing granular visibility and control in our newest reports. The new Insights Reports offer expanded metrics, providing extensive coverage of the incident response process and actionable outcomes to make better decisions. With enhanced customization and functionality, you gain access to interactive visualizations, drill-down capabilities, and detailed data on responder efforts. Additionally, copy, paste, and download options of incident data are available for sharing across teams. Users can easily compare current performance to past incidents, enabling better decision-making. Priority and urgency filters make it possible to focus on critical incidents.

With the Analytics API, you can now access incident response data directly in your preferred analytics platform, which includes detailed Responder data.

Getting to know the Insights Reports and how to use them

We are democratizing critical data and helping teams go beyond just responding to incidents to being able to easily measure the impact of incidents on their Response Teams and business. The Insights Reports give you valuable insights that help you answer key operational questions you may have about your digital operations, which may include:

  • How many sleep/off-hour interruptions are my Responders receiving?
  • What are my most impacted Services?
  • Is our Team meeting our Incident Management SLA?
  • How quickly are Responders acknowledging incidents?
  • How much time are Responders spending on call resolving incidents?
  • Are we improving over time?

Incident Activity Report

Screenshot of the Incident Activity Report

The Incident Activity Report provides an overview of specific incidents and allows users to dive into instances they want to investigate further. It includes summary metrics such as the number of incidents and response effort over time. Users can apply various filters, edit columns, and visualize incident volume to understand and analyze their incident management process effectively.

Service Performance Report

Screenshot of the Service Performance Report

The Service Performance Report focuses on the health and performance of different services. It provides metrics related to MTTR (Mean Time to Resolve), MTTA (Mean Time to Acknowledge), interruptions, notifications, and the total number of responders for each service. The report enables users to evaluate service health, SLA adherence, and impact on the team’s workload.

Responder Report

Screenshot of the Responder Report

The Responder Report is designed to help make better decisions regarding team management and responder health. It provides information on how many sleep-hour or off-hour interruptions responders are receiving. Users can identify responders who are frequently being interrupted during off-hours and adjust their schedules accordingly. The report allows users to analyze incidents assigned to each responder and their resolution times. With these valuable insights, team leaders can foster a healthier and more efficient work environment for their responders,  increase employee satisfaction, and avoid employee burnout.

Team Report

Screenshot of the Team Report

The Team Report offers insights into the performance and workload of different teams. Users can compare teams based on incident volume and make informed decisions regarding team assignments and scheduling. The report assists in ensuring equal distribution of incidents across teams and provides a comprehensive understanding of team performance and the effect of incidents on their operations.

Escalation Policy Report

Screenshot of the Escalation Policy Report

The Escalation Policy Report provides metrics related to escalation policies, ensuring that they are set up correctly and efficiently. This report provides a deeper understanding of how incidents are handled, equipping users with information to make informed decisions about their escalation policies and schedule changes. Users can analyze the effectiveness of their escalation policies and optimize their incident response strategies more effectively.

Other additions to the analytics suite:

Analytics API

With the Analytics API, customers can access incident response data directly in their preferred analytics platform. The new API is available to all plans and now includes data from all the Insights Reports and detailed responder data.

If you want to continue using an API to collect incident information out of PagerDuty, you can continue to do so with the Analytics API.

Recommendations Report

Screenshot of the Recommendations Report

The Recommendations Report is now available to Professional, Business, and Digital Operations plans. This report shows services that could benefit from AIOps’ Intelligent Alert Grouping for noise reduction. This feature can help teams reduce the number of alerts received, saving more time to be spent solving incidents instead.

User Onboarding Report

In the realm of operational efficiency, organizations seek to streamline their incident management processes. The introduction of the User Onboarding Report serves as a pivotal tool in this pursuit. This report empowers leadership and administrators with a comprehensive overview of PagerDuty’s utilization, facilitating well-informed decisions that resonate with optimized resource allocation. By harnessing the insights from the User Onboarding Report, organizations can seamlessly navigate the dynamic landscape of incident management, ensuring that resources are judiciously allocated while maintaining the highest standards of service delivery.

Operational Reviews

The Operational Reviews are now also available to Business plans. The Operational Reviews feature offers metrics for different types of reviews. Each review type includes scorecards intended to help facilitate operational review meetings:

  • Team On-Call Handoff Reviews for weekly reviews
  • Service Performance Reviews for monthly reviews
  • Business Performance Reviews for quarterly reviews

Demo: Improving the PagerDuty Analytics experience for you

To see all the great capabilities of PagerDuty Analytics in action, Senior Product Manager for Analytics, Anojan Gunasekaran, demos newly launched and upcoming features in this video.

Get started today

PagerDuty Analytics empowers teams with comprehensive insights on incident response performance, enabling quick identification of top-priority incidents and understanding the number of responders involved. With key metrics like MTTA and MTTR, teams can make informed decisions, design guardrails, and achieve faster resolutions, ultimately improving customer satisfaction and reducing employee burnout.

Learn more about the Insights Reports in this Knowledge Base article. To leverage the power of Analytics and improve your incident response process, try the PagerDuty 14-day free trial.

The post What’s New: Enhanced PagerDuty Analytics for Faster Insights and Smarter Recommendations appeared first on PagerDuty.

]]>
10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability by Cristina Dias https://www.pagerduty.com/blog/10-years-of-failure-friday-at-pagerduty-fostering-resilience-learning-and-reliability/ Tue, 25 Jul 2023 12:00:28 +0000 https://www.pagerduty.com/?p=83351 In today’s fast-paced and ever-evolving world of technology, failure is inevitable. Organizations should embrace failure as a learning opportunity for how to build and deliver...

The post 10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability appeared first on PagerDuty.

]]>
In today’s fast-paced and ever-evolving world of technology, failure is inevitable. Organizations should embrace failure as a learning opportunity for how to build and deliver more resilient services. At PagerDuty, we’ve practiced Failure Friday for 10 years now. Failure Friday–a practice inspired by the chaos engineering space–involves intentionally injecting failures into our systems to improve reliability and foster a proactive engineering culture.

We’ve interviewed Stevenson Jean-Pierre (SJP), a Senior Engineering Manager, and Mandi Walls, a DevOps Advocate, to help us understand the Failure Friday practice at PagerDuty and how the practice is adopted across the industry.

The origins and evolution of Failure Friday

On June 28, we marked ten years of Failure Friday at PagerDuty. What started out as a weekly practice, quickly evolved beyond that. As the involved teams continued to learn and grow, and in the spirit of continuous learning, Failure Friday has evolved over the decade as well.

SJP manages database reliability and infrastructure teams. For Failure Friday process leaders like him, this initiative serves multiple purposes. It deepens engineers’ understanding of systems, allows teams to get creative while testing failure scenarios, and enables controlled production changes with key stakeholders involved. As SJP explains, “Instead of waiting for things to fail in natural ways in our environments, we produce failure scenarios in our own infrastructure to better understand how they work.”

While the core concept of Failure Friday remains consistent, SJP and his teams have evolved their approach over time. Initially, automated failure testing–randomly disrupting parts of the infrastructure or stopping services–was common. Now, the focus has shifted towards intentional failure testing, targeting areas like load and performance. This deliberate shift allows teams to gain actionable insights into specific failure modes, bottlenecks and optimize their systems more effectively.

Additionally, Failure Friday is no longer limited to a specific day of the week and occurs based on the team’s needs. SJP emphasizes that failure can happen any day, and embraces this concept beyond Fridays, calling it “Failure Any Day.”

So how does it work?

The Failure Friday process at PagerDuty involves planning and executing failure scenarios, with specific objectives and hypotheses. SJP and his team identify the system to be tested, define the failure scenario, and select the stakeholders involved. An Incident Commander (IC) leads the process, ensuring it mirrors the response during a real incident. The process is documented and analyzed, allowing stakeholders to provide their perspectives. A postmortem is conducted to identify areas for improvement, with its scope depending on the scenario being tested. If there are enough surprises or lessons learned, the team gives the postmortem a more formal treatment or invites a wider audience, as the level of detail and rigor needed for the review so justifies. Otherwise, it is conducted by the team that owned the Failure Friday.

For SJP, one key takeaway from Failure Friday is the realization of how complex digital infrastructure has become with all of its interconnected data and systems. Emergent behaviors can surprise even experienced engineers. By inducing failures, his teams gain a more holistic understanding of the system and uncover hidden dependencies. This knowledge equips SJP’s teams to better handle real-world incidents and prevent customer impact. It empowers them to design more robust software and ultimately enhance system reliability. 

SJP mentioned that Failure Friday also builds trust within the organization. The engineering teams have successfully built a culture of reliability, where a proactive approach to failure is embraced and valued.

Fostering a culture of innovation and learning in DevOps with Failure Friday

Failure Friday is a practice that has gained popularity in the DevOps community. It’s used as a means to foster innovation, enhance system reliability, and improve the customer experience. Mandi shed light on the significance of Failure Friday in the DevOps sphere and its impact on organizational culture.

For her, the clear focus of Failure Friday is the customer experience. Mandi emphasized the need for graceful error handling, clear communication to users, and minimizing disruptive errors that result in poor customer experiences. For her, simple actions can significantly improve customer experience: “Do you have more graceful handling of certain errors? Do you pop up a nice message to the user? Or do you just report a 503 error? That’s not a very good customer experience.”

According to Mandi, implementing Failure Friday or similar practices may come with benefits and challenges for organizations. These practices foster collaboration among various stakeholders, including engineers, product managers and business owners. Integrating Failure Friday into DevOps processes promotes better alignment and understanding of failure impacts. Additionally, Failure Friday contributes to developing a positive organizational culture by creating a low-stakes environment for open discussions and learning. This encourages psychological safety and a blame-free atmosphere, facilitating honest conversations and a proactive approach to system resilience.

However, challenges may arise in introducing intentional errors into production systems due to stability concerns and limited testing capabilities. Tools and services can help mitigate these concerns, making testing in production more secure and accessible. Fostering collaboration between stakeholders requires effective communication and coordination, often necessitating adjustments to existing workflows and structures. Moreover, embracing Failure Friday and cultivating a blame-free environment may require a shift in organizational culture, which can be challenging but essential for the success of such initiatives.

Ultimately, for Mandi, Failure Friday positively impacts team collaboration and communication within an organization. The practice encourages teams to engage in honest discussions, enhances trust, and fosters a proactive approach to system resilience and customer satisfaction. At the end of the day, investing in building resilience will pay off in better digital experiences for your customers.

How to start Failure Friday in your organization

For those interested in experimenting with Failure Friday, SJP suggests starting small and gradually scaling up. Mandi’s top suggestion is for organizations to prioritize building psychological safety and creating blame-free environments. As she says, “Failure Friday is not just a practice; it’s an opportunity to foster a culture of collaboration and resilience.”

SJP strongly believes that “other teams and organizations can benefit from adopting a similar approach.” Recognizing that failures are inevitable, he emphasizes the value of understanding systems comprehensively and adopting a failure-mode mindset. Whether running full Failure Friday exercises or starting with tabletop exercises, organizations can enhance their engineering practices and cultivate a culture of resilience.

If you’re eager to learn more and join the discussion…

Don’t miss out on a chance to learn from industry experts and discover how Failure Friday can revolutionize your organization and DevOps practices. Watch the recorded Twitch stream and be part of the dialogue in the comments. Get ready to embrace failure as a pathway to success in the world of technology!

The post 10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability appeared first on PagerDuty.

]]>
What’s New in PagerDuty iOS and Android Mobile Applications by Cristina Dias https://www.pagerduty.com/blog/whats-new-in-pagerduty-ios-and-android-mobile-applications/ Fri, 21 Jul 2023 12:00:08 +0000 https://www.pagerduty.com/?p=83288 The PagerDuty Operations Cloud is your platform for action in critical moments. By harnessing the capabilities of AI and automation, it has the ability to...

The post What’s New in PagerDuty iOS and Android Mobile Applications appeared first on PagerDuty.

]]>
The PagerDuty Operations Cloud is your platform for action in critical moments. By harnessing the capabilities of AI and automation, it has the ability to detect and diagnose disruptive incidents, assemble the appropriate team members for prompt response, and optimize your digital operations by streamlining infrastructure and workflows.

Meeting users where they work is a key part of the PagerDuty experience. Whether it’s desktop, ChatOps, API, or mobile, we invest heavily in making it as easy as possible to access the information you need, when you need it. Even as we’ve evolved our product offerings over the years, the PagerDuty mobile application has been a critical component of the platform. Our goal is to provide our users with unparalleled convenience, responsiveness, and flexibility to ensure seamless incident management and collaboration across your organization. We’ve made significant investments in iOS and Android to help your teams resolve critical work from anywhere, anytime.

In this blog post, we’ll cover some of the key improvements we’ve made to the app in the past year to enhance your mobile experience. These include a new and modernized home page, incident workflows and custom fields for mobile, maintenance windows, and the latest OS requirements for Android and iOS.

Experience modernized navigation in your PagerDuty App

We’ve modernized the navigation experience for our mobile app to make user experiences faster and more efficient. Now you can navigate more easily between screens without having to return to the hidden hamburger menu. And, you can relaunch screens! This means you can move from one screen to another quickly and without any extra steps. Your user history will be preserved even while simultaneously navigating multiple screens. Make sure to read our Knowledge Base article about this update.

Screenshot of the modernized navigation on the PagerDuty mobile app

Trigger Incident Workflows – even on the go

Incident Workflows are now available on mobile! You can trigger a preconfigured Incident Workflow with just one click, directly from your incident detail screen. That means, if you always run through the same 5 actions every time you have a P1, such as creating an incident-specific Slack channel, starting up a conference bridge, or adding responders–all those steps can run automatically and immediately from a manual or conditional trigger.  Automating these manual steps gives valuable time back to the responder so they can jump right and start resolving the incident. Learn more about triggering Incident Workflows from your mobile device in the Knowledge Base or from our blog post.

Screenshot of the incident workflows feature on the PagerDuty mobile app

Easily create, update, and delete Maintenance Windows

Maintenance Windows are now available in the PagerDuty mobile app, enabling responders to temporarily disable services and their integrations during maintenance. This reduces unnecessary interruptions when the team needs to focus. Read this blog on how to create and manage Maintenance Windows on mobile. Learn more about this feature by reading this Knowledge Base article.

Access Custom Fields on Incidents on your phone

We’ve introduced more flexibility by adding Custom Fields to incidents. Custom Fields allow teams to pull in important incident data from any system of record and provide responders with additional contextual information. This enables teams to triage and resolve incidents faster.

You can now configure, view and edit Custom Fields through both our web UI and APIs. And when you’re on the go, the PagerDuty mobile app allows you to view Custom Fields, so you can have more context around the incident. You can learn more about Custom Fields in our Knowledge Base. Also read about the top use cases for using this feature in our blog.

Important: Ensure your mobile experience is secure

Security is important. As such, we’ve been implementing updated minimum OS requirements to ensure that your mobile experience is up to date with security standards. As of April 10, 2023, the PagerDuty mobile app required Android 10.0 and iOS 15.0 or later versions. Hopefully you’ve upgraded by now, but if not, this is an important reminder to do so to guarantee seamless access to forthcoming mobile app updates.

Demo: Improving the PagerDuty mobile experience for you

To see all the features mentioned above in action, Senior Product Manager for Mobile, Vivek Saxena, demos all these features in this video.

There’s more to come on Mobile

At PagerDuty, we work hard to provide you with the best possible experience for your digital operations, no matter the device you use. This year we modernized and added new features to the PagerDuty mobile app to enable responders to adapt their work style according to their preferences without compromising efficiency. 

But that’s not all! We are already thinking about new fantastic features that will make your incident response process even better.

Stay tuned for our upcoming releases, including:

  • Status Widget on the Home Screen: empowers users to quickly grasp the status of selected Business Services.
  • Home Screen Customization: allows users to select and prioritize the widgets that are important to them.
  • Analytics on Mobile: helps you understand how your organization is performing.
  • Slack Information: provides the ability to join Slack channels associated with the incidents.

If you aren’t a PagerDuty customer yet, try it free for 14 days and explore how PagerDuty can enhance your incident management process.

For the mobile app, you can simply scan or click the QR Codes below and download it today.

iOS QR CodeiOS QR code to download PagerDuty mobile app

Android QR Code
Android QR code to download PagerDuty mobile app

The post What’s New in PagerDuty iOS and Android Mobile Applications appeared first on PagerDuty.

]]>