Community
Zenoss Newsletter

Monitored by Zenoss
SourceForge.net Logo
Views

Edit history

Edit: -1 of 1
Time: 2008-09-18 07:29:45
Note: /production/www/portal/public-website/community/manage_importObject

changed:
-
<h2>Problem statement</h2>
<p>
SNMP is a good protocol for simple, lightweight monitoring of machine state.  However, there are situations where SNMP may not be the best solution:
</p>
<ul>
<li>Process monitoring.  If the entire process table of a machine is scanned and the information is sent across the network, this approach will not scale very well.</li>
<li>Regular expression matching against log files / syslog.  It is impractical to consider sending all log files and/or all syslog messages to a central server and have it sort things out.</li>
<li>SNMP traps are stateless, so there is no way to determine if an important alert was received or not.</li>
<li>If the SNMP server is unavailable (due to the process or server being down, or the network link goes away), no monitoring or actions can occur.</li>
</ul>
<p>An agent installed on the client should address these issues</p>

<h2>Principles of design</h2>
<ol>
<li>Filter as close to the source as possible</li>
<li>Rely on a queued message delivery for sending / receiving messages,  with a simple interface.</li>
<li>Needs to be able to carry on monitoring and statistics gathering even if the Zenoss server is unavailable (eg service down, network isolation).</li>
<li>"One touch" installation and management.  As much as possible, require the administrator to have to log into the machine only once.</li>
<li>The power is in the protocol.  No python-specific information should be sent over the wire.  While binaries or python eggs may be sent to update the agent, the actual protocol itself should not rely on (say) the way that python encodes arrays.  It should be possible to interact with these agents no matter what language was used to write the agent's code.</li>
</ol>

<h2>Functional specifications</h2>
<ul>
<li>Allow secure remote access (interactive and non-interactive?)</li>
<li>Run arbitrary python programs cyclically</li>
<li>Arbitrary logfile monitoring (ie look for regex matches in lines of log file). This list of files and regular expressions must be configurable remotely</li>
<li>Process Monitoring - look in the process table for regex matches</li>
<li>Windows Service Monitoring <a href="http://dev.zenoss.com/trac/ticket/22">#22</a> - monitor windows services and provide a list of configured services to modeler.</li>
<li>Allow new modules to be pushed to the agent - maybe use python egg format?</li>
</ul>

<h2>Non-functional specifications</h2>
<ul>
<li>Configuration file should be in XML for easy import/export into other  formats.</li>
<li>Must be easy to install - agent must be a single binary that can be copied to a managed server. All future configuration and installation can be handled remotely.</li>
</ul>

<h2>Agent management</h2>
<ul>
<li>Inventory and version information of all modules, including checksums</li>
<li>Download new functionality</li>
<li>Upgrade or revert agent or functionality</li>
<li>Commit to new functionality (ie remove old code)</li>
<li>Query and update agent's schedules and actions</li>
<li>Query agent information from a particular date or event number (ie message queue id)</li>
<li>NFS /Samba export out shared disk, supply credentials and install a bootstrap agent that can be used to download more code.
</ul>

<h2>Implementation possibilities</h2>
<h3>Low-level Protocol</h3>
<p>
First thought is to use Twisted SSH daemon as method to access agent remotely. This would allow a manager to connect to the agent using a normal ssh client and run the same command set as the zenoss management system.
</p>
<p>
A second thought would be to write an agent with <a href="http://twistedmatrix.com/projects/core/documentation/howto/pb-intro.html">Twisted Perspective Broker </a>. The problem with this is the "binary" bit. We would have to explore the feasibility of creating a pb binary executable. Alternatively, there would be a Zenoss "agent installer" that would create a dedicated python instance with Twisted libraries installed. As for source code migration/live upgrades, exarkun (of the Twisted crew) wrote a paper on just this, and it (as well as the source code) is availbale in his sandbox here. One of the super-huge benefits of pb is that it's a secure RPC, and thus preferable to standard RPC implementations for sensitive networks.
</p>
<p>
A third thought: there is a new "asynchronous messaging protocol" in Twisted called <a href="http://twistedmatrix.com/trac/browser/trunk/twisted/protocols/amp.py">AMP</a> (examples <a href="http://twistedmatrix.com/trac/browser/trunk/doc/core/examples/ampclient.py">here </a> and <a href="http://twistedmatrix.com/trac/browser/trunk/doc/core/examples/ampserver.py">here</a>). Here are some things that glyph had to say about it:
</p>
<blockquote>
    To make a long story short, it's less powerful than PB, but a whole lot simpler. It has asynchronous messaging and argument marshalling between endpoints, but not arbitrary objects. I have found that this hits the sweet spot of more applications than PB (although PB is still better for communicating parallel simulations) [ed: and more security than that provided by XML-RPC].
</blockquote>
<p>We could use the AMP protocol to build a killer agent that would be accessible by any language (and/or that language's libraries) while at the same time being easy to implement and maintain (PB is definitely pretty intense stuff and not something that a twisted newcomer could help maintain).
</p>
<p>
A WMI -> PyWBEM proxy or bridge would be interesting. Could this be WMI -> XML-RPC?
</p>
<h3>Miscellaneous</h3>
<ul>
<li>Allocate a large (eg 100MB or so) chunk of room specifically for messages so that out of disk conditions won't affect message delivery</li>
<li>Ideally, some sort of discovery protocol.  The Zenoss server would have the software inventory list so that Zenoss would be able to suggest different monitoring modules.  A process regex table (regexes downloaded from Zenoss server) would allow the client to dynamically determine the services, and suggest modules to download.</li>
<li>Would need to store secrets eg username/passwords.  These secrets would need to be stored in a revertible format so that they could be extracted to log into applications etc.  Perhaps the secret key to the secret keys could be stored on the Zenoss server, so the agents need to talk to the server on startup to retrieve this agent's unique key (ie losing the key to one server doesn't compromise everyone), then the agent can unlock its local keys.</li>
</ul>

<pre>
From kupjones Tue Jul 31 18:52:16 +0000 2007
From: kupjones
Date: Tue, 31 Jul 2007 18:52:16 +0000
Subject: Just a few comments 
Message-ID: <20070731185216+0000@www.zenoss.com>
</pre>


As long as we are talking about general purpose host systems I think the Problem Statement sells SNMP short given the vast improvements in  Net-SNMP. For example:

<ul>
<li>Processes: The agent can be configured to watch any number of local processes and "trained" to react in any number of localized ways - which matches your #1 principle.  Alerts can be generated for only those processes that require outside intervention.

<li> Net-SNMP supports regex expressions and logfile monitoring - again, all localized.  Embedded perl (and other languages) is also supported.

<li> SNMP v3 traps are stateful, much more informative, and require acknowledgement.

<li> Net-SNMP with the appropriate coding can cache all monitored information and relay when the management station returns.
</ul>

In essence, Net-SNMP *becomes* the local, intelligent agent.  The most glaring shortcomings that I have run into is the lack of ability to remote manage the configuration of the agent.  It would seem that leveraging the Net-SNMP strengths and augment the weaknesses is the way to go. 
<br><br>
But, those are just my thoughts.

</p>