Commercial Open Source Application, Systems & Network Monitoring

Zenoss Newsletter
Monitored by Zenoss
SourceForge.net Logo

Create a HighAvailable zenoss

This howto explains how to create a zenoss cluster using Opensource tools.

Note: This is still WIP so please be patient

 

 

Introduction

 

Our previous monitoring solutions was Nagios.

To achieve a 99% uptime for our monitoring we had a cold standby for nagios.

Now we are using zenoss since 1 jan 2007 and it was time to look to an other solution to achieve the uptime. (without monitoring we are blind)

So one of the idea's is to create a HA cluster for zenoss.

Let me go in depth on this.
 

Setup

 

I have created a 2 node cluster.

One system is a virtual one running on an ESX v2 server
The other system is Dell PowerEdge 1900 physical server.

To create a cluster you will need a shared storrage medium.
You could use:

 

Here is a drawing of my setup:

Zenoss HA setup

 

Okay this seems a bit difficult but in fact it isn't.
Let me try to expain this a bit more.

Hearbeat is a linux HighAvailable project to create clusters.

I have an active server and a standby server.
All services like:

  • Mysql
  • apache
  • my SMS daemon
  • and ofcourse zenoss

 

Are configured to run on the active server.

If heartbeat detects the active server down, or having problems with one of these services hearbeat will be smart enough to move these services to the standby server.

 

This also includes switching the virtual IP to the standby server.

 

I'm using heartbeat v2 and this allows me to create rules for running the services.

One of the rules is to prefer the hardware node.

So lets say your hardware server went down and need some maintenance.

heartbeat will move the services to the other node (virtual)

Once the hardware is back up and running, heartbeat will notice this and move the services back to the hardware server.

 

This means there is a small interruption of services while moving them to the other node.

So heartbeat is good for High Uptime stuff but not for Load clustering.

 

 

You may have used heartbeat before.
If not, than is a good thing to start reading about it.
This is a nice video presentation about heartbeat : Video Presentation by Alan Robertson

DRBD what is this? 

 

DRBD is a cheap way to create a shared storage

This is my /etc/drbd.conf

#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd.conf
#
#

global {
   minor-count 1;
   usage-count no; # Participate in DRBD's online usage counter at http://usage.drbd.org
}

resource zenoss {
    protocol C;
    startup {
       wfc-timeout        30;
       degr-wfc-timeout   60;
    }
    disk {
       on-io-error detach;
       fencing resource-only;  
    }
    handlers {
       pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; # reboot the system after a connection fail
       pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
       outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater";  
    }
    net {
       after-sb-0pri discard-least-changes; # Self healing if split brean
       after-sb-1pri call-pri-lost-after-sb;
       max-buffers 2048; # datablock buffers used before writing to disk.
       ko-count 4; # Peer is dead if this count is exceeded.
    }
    syncer {
       rate   12M;
       al-extents 257;
    }
    on zenoss0101 {
       device      /dev/drbd0;
       disk        /dev/sda3;
       address     192.168.1.90:7789;
       meta-disk   internal;
    }
    on zenoss0102 {
       device      /dev/drbd0;
       disk        /dev/sdb1;
       address     192.168.1.91:7789;
       meta-disk   internal;
   }
}

 

g

 

Hearbeat how does it work?

 

dfdsfds

 

 

This is my /etc/ha.d/ha.cf
# Created by wdhaeseleer
# /etc/ha.d/ha.cf
# Config created on 19-03-2008

use_logd yes              # Log to the the daemon
debug 1                   # Set debug level

udpport 694               # Send on this UDP port
ucast eth0 192.168.1.90   # Use unicast to send the hearbeat
ucast eth0 192.168.1.91   # Use unicast to send the hearbeat
keepalive 1000ms          # Send a heartbeat every
warntime 7000ms           # A node is in danger after
deadtime 30000ms          # Declare a node down after
initdead 40000ms          # Declare a node down on startup after

autojoin any              # allow autojoining
crm on                    # This is special to enable hearbeat v2

watchdog /dev/watchdog
respawn hacluster /usr/lib/heartbeat/dopd  
apiauth dopd gid=haclient uid=hacluster  

node zenoss0101           # This is node1
node zenoss0102           # This is node2

 

 Here is an explanation of each item: http://www.linux-ha.org/ha.cf

The authkeys configuration file contains information for Heartbeat to use when authenticating cluster members. It cannot be readable or writable by anyone other than root.

This is needs to be identical on each node of the cluster.
In our case on zenoss0101 and zenoss0102.
If not, the node cannot join the cluster.

 

Read here

for more info.

 

this is my /etc/ha.d/authkeys

auth 1
1 sha1 e75dd0d3d97ea86bc07480ae6d9406d0

Hearbeat need to startup at boot time.
Since it controls what to start where it's considered critical to start.

Execute this to start heartbeat on boot.
(works for most linux distributions)

# sudo chkconfig heartbeat on

if this does not work you could do it manually.

sudo ln -s ../init.d/heartbeat /etc/rc0.d/K25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc1.d/K25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc2.d/S25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc3.d/S25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc4.d/S25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc5.d/S25heartbeat
sudo ln -s ../init.d/heartbeat /etc/rc6.d/K25heartbeat

Monitor this cluster

 

We have zenoss running so this is the most ideal situation you can have.

This makes monitoring the cluster really easy?

 

What should we monitor:

  • DRDB replication link
  • Config files are the same on both nodes

 

 

 

 

Document Actions