Layer 2 Network Connection Awareness Improves Root Cause Analysis

At the end of July, Zenoss delivered a customer feature straight from our Customer Advisory Board's requests.

The Layer 2 ZenPack helps customers identify application and network root causes.

  • Root cause analysis for application services can now automatically include supporting network switch faults.
  • A huge cause of event floods has been eliminated, replaced with a single network root cause event.

The ZenPack extends existing network device support with information about neighbor switches and southbound client MAC addresses using CISCO-CDP-MIB, LLDP-MIB, and BRIDGE-MIB. It works for IPv4 and IPv6 addresses, which will help protect Andrew Kirch from falling sky pieces.

Let’s look at those two use cases.

Application Root Cause Analysis

With the new Layer 2 support, we have immediate root cause support for the classic silo question, “Is it the server or the network?”

Zenoss Layer 2 Application Root Cause AnalysisHere’s a quick sketch of a common application. Four guest operating systems are running in virtualization hosts supported by what turns out to be two different top-of-rack switches.

Without the Layer 2 support, we wouldn’t know which physical switches were supporting the virtualization hosts. If Switch B failed, we’d see root cause analysis suggesting that virtualization host 7 had failed. Contacting the virtualization admin wouldn’t help, and would only delay resolution.

With Layer 2 support, root cause analysis will correctly show that Switch B’s failure has caused the application failure, affecting both the host and our guest OS. Because of the Zenoss automatic relationship model, we’ll always know which switches are supporting our application, even if we shuffle our VMs around to new hosts in new racks with new top-of-rack switches.

Event Flood Suppression

A classic cause of event floods is switch failures triggering a cascade of events from the attached devices.

Zenoss Event Flood Suppression with Layer 2This simple diagram shows a collection of servers attached to a switch.

Should Switch A fail, Zenoss will detect that Switch A is down and that it can’t contact servers 1-6. The seven messages in the event console won’t help in resolution.

With Layer 2 support, Zenoss automatically recognizes the MAC addresses of the downstream servers. If Switch A fails, Zenoss then suppresses new events for those servers. This helps people using the event console focus on the real problem of a failed switch rather than false symptoms of servers that can no longer be reached.

Immediate Value

The customer issues are quite important, and we’ve managed to address them both with almost no customer work. Customers need only to install the ZenPack and add the modeler plugin to the /Network device class. Zenoss will automatically discover which network devices expose the required information and begin supporting both the application network dependency and event flood suppression use cases.



Enter your email address in the box below to subscribe to our blog.

Zenoss Cloud Product Overview: Intelligent Application & Service Monitoring
Analyst Report
451 Research: New Monitoring Needs Are Compounding Challenges Related to Tool Sprawl

Enabling IT to Move at the Speed of Business

Zenoss is built for modern IT infrastructures. Let's discuss how we can work together.

Schedule a Demo

Want to see us in action? Schedule a demo today.