Perhaps you set up your monitoring infrastructure when your business consisted of 10 employees and 20 devices it has since G R O W N. Jeff, AKA King of all Solutions, is no longer with the company and deciphering/maintaining his concoctions has become too labor-intensive. Or you can no longer keep up with your SLAs without diverting half your IT staff in the process.
If any of these scenarios make you wince, you may be a candidate for Zenoss Service Dynamics, AKA Zenoss Enterprise. I don’t say this lightly given that anyone who has managed any sort of open-source monitoring solution really knows his stuff. In a recent webinar, Zenoss Community Manager Andrew Kirch said that to use Zenoss Core, you need a “competent in-house Linux system administrator [or] you’re going to find yourself in the deep weeds.”
But no matter how talented you are, you can’t clone yourself, let alone get those clones to work for free. So it may be time to consider migrating to Zenoss Enterprise. Here are three common signs that you’ve outgrown your monitoring solution – and how migrating to Zenoss Enterprise will solve these problems.
Sign 1: You can no longer scale.
As you add more and more resources to your infrastructure, you start to notice:
Performance issues with underlying services that are exacerbated by high system loads.
Slow UI response and you can’t immediately determine the cause.
Gaps in performance data graphs.
It’s akin to being a lobster in a pot of water. At first the water is comfortable, and before you know it, you’re being served on a platter with a lemon wedge and a ramekin of melted butter (sound far-fetched? I’ve heard worse things coming from the C-level suite).
With Zenoss Enterprise you’re no longer beholden to a single CPU or being I/O-bound on collection. It offers several scaling features, including:
The ability to scale out and load balance additional web server processes
A mechanism to configure and run sets of Zenoss daemons on multiple distributed machines
Multiprocessing capabilities so that daemons can use more than one CPU core.
I spoke with Dave Winter, director of hosted services and leader of the ZaaS team, to get a better sense of these bullet points. He said that Zenoss Commercial comes with two authored and supported Commercial ZenPacks called Distributed Collector and Webscale. The Distributed Collector ZenPack lets you communicate and scale out the Zenhub, which is Zenoss Enterprise’s “main processing brain,” in ways that allow you to overcome geographic constraints and load balancing issues, among other things.
If you need to monitor devices in Australia, it is not a good idea to do that from Kansas. Putting a collector locally into a geography allows collection to be much more reliable and timely.
For overcoming load, if you just have SO many devices (or data points rather) that a single collector isn't doing the job, you can just add more collectors.
According to Dave, the Webscale Commercial ZenPack “allows you to scale the web front end of Zenoss for larger simultaneous user access, or high volume API call access to Zenoss, [which] also plays in report generation.”
Sign 2: You experience too many outages – and have difficulties figuring out the root cause.
Outages leave everyone frustrated – your end users because the application they’re trying to access is unavailable and your team because you’re finding that performing root-cause analysis (RCA) has become treacherous at best.
After all, what do you do when a single failure leads to several hundred alerts coming from a bunch of different sources? Too often you’re faced with an event storm that needs to be sorted through to find the event that is creating this outage. The problem only becomes more complicated as you attempt to scale up for more devices and resources.
Zenoss Enterprise combats this set of problems through its Service Impact tool. Here are a few of the things Service Impact can do:
Automated RCA (Root Cause Analysis)
Automated service assurance and remediation
Dynamic service impact analysis
Impact automatically builds and manages service dependency mappings via real-time discovery and topology modeling. Senior software developer Evan Powell explained how Impact goes beyond what [competing products do]:
You can configure services and policies that will notify you only when real service-level problems are occurring, and you can detect at-risk services and alert yourself to the problems before end users are affected. Service events include [the aforementioned] root-cause analysis, so that you can quickly tell what actual component failure caused an outage and fix the important points.
The Resource Manager component of Zenoss Enterprise handles the detection (just as the Zenoss Core does), but Impact provides you with tools like diagnostics and the ability to triage the manner you handle events in order of importance.
You may find the emergency room terminology overblown, but really, it’s not. At the very least, the ability to identify and prioritize a series of events can mean the difference between the life and death of your business – and in the case of some organizations, the difference between life and death period.
Sign 3: Your cost-to-benefits ratios are out of whack.
As a smaller upstart organization, you undoubtedly found immense value in the Zenoss Core product and enjoyed the benefits of a very knowledgeable and active community. However, as your IT needs have matured and demands from the business have grown and the tolerance for delays has dropped. You are seeking ways to expedite service restoration, to minimize the need to allocate expensive IT professionals each time there is a tremor, and to get your cost-benefit ratios back in order. What was a huge value-add (and at a great price )is now putting stress on your efficiency and your resources.
You suddenly find yourself throwing more of your limited IT resources (human and financial) to clean up these problems to avoid costs associated with downtime.
Cost of labor, when your employees are idle as a result of the outage
Moreover, you increasingly have to defer your IT employees (often your best ones to that) from doing their normal tasks to take on troubleshooting activities. Instead of working on projects that grow your business, they’re turned into glorified help-desk workers.
Using a free tool while misusing your talent is, as my mom would say, “pennywise and pound-foolish.” Yes, Zenoss Enterprise costs money up-front (albeit less than you may think), but with all of its features, you also get troubleshooting support from Zenoss professionals, and your team can reorient themselves toward tangible business goals.
Some other signs...
Your business is not getting timely reports on the state of IT Ops
You need historical utilization and other data to strengthen your Capacity planning and infrastructure optimization processes
You are looking to improve your RCA capabilities
Zenoss Enterprise also provides additional features, including analytics for business intelligence (BI), the ability to maintain SLAs and adhered to compliance regulations dictated by your industry.
In pointing out these advantages, I’m in no way insinuating that open source tools like Zenoss Core can’t do what you need them to do. After all, Zenoss Enterprise is built right on top of Zenoss Core. Instead I like Deepak’s analogy about Zenoss Core vs. Zenoss Enterprise, which he likens to the difference between a stock Aston Martin and the James Bond version:
Mr. Bond needs certain “capabilities” to perform his job – front-firing rockets, hood-mounted target-seeking guns, spike-producing tires, passenger ejector seats, etc. – that differ significantly from the standard model. The requirements for these capabilities depend, of course, on his mission location, end goal, and cast of supporting characters.
Likewise with Zenoss, the capabilities you need to manage your IT operations – slightly less glamorous things like root cause analysis, service mapping, multi-geo deployment – rest on such variables as your operational goals, IT staff skill sets, and existing datacenter resources.
Both models are great. You just need to honestly assess whether your infrastructure needs this added functionality – and for many of you, the answer may lie in upgrading to the best enterprise monitoring source around...