Applying the 80/20 Rule to Monitoring

The Pareto Law or the more commonly known 80/20 Rule is a powerful, unexpected principle that suggests that 20% of inputs or efforts usually produces 80% of outputs or results. Alternatively, 80% of inputs or efforts only produces 20% of outputs or results which suggests a large amount of wasted time, money and resources.

This principle has been leveraged by savvy individuals, investors and corporations for decades. Used to identify their most profitable products, least profitable products, increase profits, decrease costs and generally achieve more with less. While the ratio may not always be exactly 80:20 it has been proven across many use cases in various industries that there is an underlying imbalance between between causes and results, inputs and outputs and effort and rewards.

So if this is true, how can the 80/20 rule be applied and leveraged by operational monitoring teams to increase efficiency and add more value to an organization?

Let’s start by thinking about the day to day operational work done by IT organizations to monitor their application stack. Most modern enterprise organizations have multiple monitoring tools in place; being managed by a DevOps team, NOC team or individual subject matter experts across multiple teams. These teams likely deal with hundreds or thousands of alerts each week, dozens of incident tickets and are constantly tuning their monitoring tools to increase the value or proactivity of the monitors in place and decrease the amount of non actionable alerts (noise) while continually receiving requests for new services and scenarios to be monitored accordingly.

This can become quite cumbersome and often organizations attempt to solve the problem by hiring (or firing) people, buying more tools and generally adding more resources which in the end just adds more complexity to the problem and reduces the value of the monitoring in place.

If you can relate to the above scenario then I would suggest that you and your team take a step back, don’t make any drastic decisions and do something along these lines.

  1. Schedule a 2-3 day “80/20 workshop”. Invite the key technical subject matter experts but keep the list short to ensure a focused, non political, value adding session.
    • Make sure to still have some folks at the helm so the ones in the workshop can stay focused and don’t get pulled away for operational issues.
  2. Capture 2-3 months worth of historical monitoring and ITSM data. This could include alerts triggered, notifications sent, incident tickets, problem tickets, tuning or config changes, dashboard usage stats or any other data that you feel may be related.
  3. Analyze the data by comparing the relationships of each data set. Categorize and graph the data in bar, pie or line charts looking for imbalances between inputs and outputs.
    • Remember, the ratio may not always been exactly 80/20 or 20/80.
  4. Identify the imbalances and plan action items that leverage your new understanding of your environment. Ex. Tune the 20% of monitors causing 80% of the non critical alerts.
  5. Rinse and repeat every few months or better yet build out a bi-weekly 80/20 “scrum” to ensure more rapid and iterative feedback cycles.

These steps have been generalized to some extent for the purpose of this article and the process should be customized to make sense for your organization but if you follow these steps I can guarantee you will walk away with new insight on your environment and an increased understanding of what your monitoring team’s efforts should be focused on.

Below are some common operational insights produced by 80/20 analysis. Again, the ratio may not always be extreme as 80/20 or 20/80. The imbalance between inputs / outputs is what is important to identify.

  • 20% of monitors causing 80% of triggered alerts.
  • 20% of servers causing 80% of triggered alerts.
  • 80% of tuning efforts is on 20% of servers being monitored.
  • 20% of monitoring tools being used 80% of the time.
  • 80% of high severity incidents caused by 20% of applications
  • 20% of dashboards built are being used by technical teams 80% of the time.
  • 80% of dashboards built are being used only 20% of the time.

Additionally, the 80/20 rule can be applied to high level strategy and APM vendor market shares. I will explore this topic in a separate article but here are a few examples of this.

  • 80% of the money spent on monitoring tools only produces 20% of value to the business or end user experience.
  • SaaS products that are 10% better at AIOps will take 100% more market share.

If you found this article valuable please share, give it a like or leave a comment / question. Your feedback is much appreciated. Thanks!  

Leave a Comment