blog

Home > Zenoss: Why I Jumped from Stability to Start-up

Zenoss: Why I Jumped from Stability to Start-up

Michael DeSimone
January 17, 2012

Before I jump right into how I found myself working at Zenoss, I would like to give you a little background on myself. I have been an IT professional for over 18 years. I have worked in large and small companies, public, private and the government. I have managed people and systems. I have been in customer facing roles and have been hidden in the proverbial "sys-admin closet". My primary focus throughout my career has been Unix Systems and related technology. Almost my entire adult life I have carried the on-call pager. I am the guy that gets called at 2 AM when “the website is down”. I am the guy that gets called when “my email is slow”.

Personally, I am married to my high school sweetheart and have two daughters in high school, one a senior getting ready to graduate this June. After moving around the country for the last thirteen years or so, I promised my wife and family that we would stay in one place until the children had graduated high school. That one place was the Dallas area working at Southwest Airlines.

Southwest Airlines is an amazing place to work. The culture is incredible; it truly is a big family. The benefits are outstanding. Who still matches 401k, and at over 9%? I could fly for free, anywhere in the country. I was part of a very solid technical team and had the best manager I have had in years. The work wasn’t always the most exciting, but I did have my share of engaging projects and was never bored. Most importantly, it was solid, stable employment during a double dip recession. Southwest Airlines has never laid a person off. Ever. They have never furloughed a pilot or flight attendant. They have an unprecedented streak of profitability in the airline industry. It was the perfect place for me to be able to keep my promise to my family.

Now to Zenoss. A very good friend of mine went to work at Zenoss about a year and a half ago. Almost immediately he started hounding me for my resume. I told my friend “that sounds great but I am at Southwest Airlines. No way am I going to a startup right now.” He would tell me about all the cool things going on at Zenoss, how they were changing the way operations works. I blew it off. All startups are doing “amazing, cool things”, right? We would talk every month or two and it would be the same dance.

Finally, last Memorial Day, our families were together for the holiday. After pouring me very cold beer from his new Kegerator, he popped his laptop open and said “let me show you what we are doing at Zenoss.” He gave me the nickel tour of the UI. Then he dropped into a demo of the (at the time beta version) new Impact module. He showed me how you define a service in Impact and how components can me automatically modeled. He showed me how easy it is to define the relationships between the components that make up a service. I could define relationships without having to learn another programming language!? This was starting to get interesting.

Chassis Down, Zenoss Impact View — Image 1

Then he downed one of the servers supporting a Service and I saw, instantly, how the Service was affected. [Image 1]

Service Events, Zenoss Impact, Confidence Interval — Image 2

He showed me the Service Event Console with the root cause analysis. There were 13 or so events sorted by most likely to be the root cause of the Impact to the Service. What was the most likely event? The server going down. [Image 2]

All Events, Zenoss Event Console — Image 3

Then he showed me the normal event console. It looked like the typical operations event screen rainbow of 30 or more events. [Image 3] This ISsomething new!

The demo was in a virtualized environment and there was a VM running, as part of the service, on the server that was taken offline. When that VM was migrated to another server the Service View was updated, in real time, and the Service was returned to good health. Up/down on a VM is nice but the new server it is running on automatically mapped into the Service it is a part of? This is new and exciting! After watching this, the first thing that went through my head was “If this was a service I supported, how would this have looked to me?” If you have worked in Operations or as a Systems Admin – of any kind – you know exactly how it would have looked and what would have happened. You would have been paged when the server went down. If your monitoring was configured well, you might have been alerted that the VM went offline or was migrated. In a traditional environment you would likely have known what Service or Application was affected. In a Virtualized environment you probably would not know what was affected when this server when down. Would you have even known that the VMs being migrating or being offline were related to the server going down? What would your customers have seen? Would your L1 support have known there was an issue with that Service when customers started calling in reporting that their application was “slow”? I really was looking at a game changer.

He went on to tell me some anecdotal stories about testing Impact. In one test, a major networking company kicking the tired, caused an outage that generated something like 1,600 events. Impact was able to distill it down to less than 10 events and the root cause analysis showed, with 80% confidence, what that the actual event was that caused the whole outage.

IT organizations have been moving away from the “webserver01” based infrastructure for a long time now. However, the tools we use to monitor our infrastructure are, for the most part, unchanged from those days. Let’s face it; most organizations are notified by their customers when there are issues long before their infrastructure monitoring tools let them know. If they are notified of an issue a head of the curve, there is little to no correlation between what the customers experience and what the tools tell us. Zenoss Impact gives you the opportunity to get ahead of your customers when the Services you support are impacted.

This is why I decided to make the leap to Zenoss from arguably one the most stable companies in the world, and one of the best places to work. Turning in my Southwest Airlines badge was honestly the hardest, most emotional thing I have done in my professional career. I really loved the company and my team there. The opportunity at Zenoss was too exciting to pass up. I really believe that the product that the amazing team at Zenoss produces will change the lives of people like me and I am very proud to have the opportunity to represent Zenoss to our future customers.

Image via Flickr

In my last blog post on feature extraction, I mentioned something called the bag-of-words (BoW) technique. I decided to write a little bit more on ...

Root Cause Analysis

Enabling IT to Move at the Speed of Business

Zenoss is built for modern IT infrastructures. Let's discuss how we can work together.

Schedule a Demo

Want to see us in action? Schedule a demo today.

blog

Zenoss: Why I Jumped from Stability to Start-up

Categories

Subscribe

Enabling IT to Move at the Speed of Business

Schedule a Demo

PRODUCT

SOLUTIONS

blog

Zenoss: Why I Jumped from Stability to Start-up

Categories

Subscribe

Related Posts

Mastering Full-Stack Monitoring in Your IT Operations

Future-Proof Your IT Ecosystem: The Road to IT Optimization

AI Explainer: Bag-of-Words Technique

AI Explainer: Feature Extraction

A Comprehensive Guide to IT Capacity Planning

Observability vs. Monitoring: How Do They Work?

Enabling IT to Move at the Speed of Business

Schedule a Demo

PRODUCT

SOLUTIONS