Accelerate Your Path to Full-Stack Monitoring and Alerting
Register for this live webinar featuring Zenoss and VictorOps today!
Why Customers Choose Us
Discover why the largest companies in the world choose Zenoss.
Customer Support Portal
Zenoss Learning Center
Zenoss Partner Portal
Become a Partner
Top 5 Focus Areas to Succeed With DevOps
Forrester shares the tools, technologies and best practices to meet the challenges of today's modern IT environments.
Learn. Discuss. Participate.
Join thousands of Zenoss users and experts to learn, discuss and participate in the Zenoss Community.
Hybrid IT Monitoring
Zenoss provides complete visibility into physical, virtual, cloud and converged environments.
Request A Demo
Monitoring for VMware vSphere environments.
This ZenPack provides support for monitoring VMware vSphere. Monitoring is performed through a vCenter server using the vSphere API.
This ZenPack supersedes an earlier ZenPack named ZenPacks.zenoss.ZenVMware. If you have ZenPacks.zenoss.ZenVMware installed on your system, please read the Transitioning from ZenVMware section below.
When monitoring vCenter with this zenpack, it is important to also verify the version of ESX/ESXi installed on the hosts it manages. The ESX hosts must be running a compatible version according to the VMware product Compatibility Matrix as well as one that has not reached End Of General Support in VMware's Lifecycle Matrix
In particular, ESX/ESXi 4.x is known to cause serious problems when monitored through a current version of vCenter.
In addition, if running Zenoss 5.2.x and monitoring ESXi 5.5, ESXi must be updated to build 4722766 or later. There is an SSL compatibility issue between TLS 1.2 (shipped with Java 1.8 and thus with Zenoss 5.2.x) and OpenSSL-1.0.1t (shipped with previous builds of ESXi 5.5). Until ESXi is updated, the vSphere ZenPack will be unable to authenticate with the VMWare API or to collect any information. Earlier versions of Zenoss and later versions of ESXi are unaffected. Only this particular combination causes issues.
We also recommend that monitored vSphere environments be deployed in accordance with VMware's recommendations and best practices, such as:
The features added by this ZenPack can be summarized as follows. They are each detailed further below.
The following components will be automatically discovered through the vCenter address, username and password you provide. The properties and relationships will be continually maintained by way of a persistent subscription to vSphere's updates API.
The following metrics will be collected every 5 minutes by default. Any other vSphere metrics can also be collected by adding them to the appropriate monitoring template.
In addition, any other metric exposed by vSphere may be added to a Zenoss monitoring template. This must be done cautiously, however. It is critical to only add metrics that are applicable to the component that the template is bound to, and which are supported by the VMware instances you are monitoring.
Because Zenoss batches multiple performance queries together, adding an unsupported metric may cause unrelated performance graphs to break. The 'Test' function must be used to verify that any newly added datasource will work correctly.
The following event classes and their subclasses will be continually collected and passed into the Zenoss event management system.
Various information encoded in these event classes will be used to automatically determine as best as possible the following Zenoss event fields.
Events collected through this mechanism will be timestamped based on the time they occurred within vCenter. Not by the time at which they were collected.
When combined with the Zenoss Service Dynamics product, this ZenPack adds built-in service impact and root cause analysis capabilities for services running on VMware vSphere. The service impact relationships shown in the diagram and described below are automatically added. These will be included in any services that contain one or more of the explicitly mentioned components.
Most of the impacts described above follow the default policy of a node being in the worst state of the nodes that impact it. For example, a datacenter failure will imply that all related hosts are also failed. In some cases this is not appropriate and custom policies are used.
The following operational reports are included with this ZenPack. They can be found in the vSphere report organizer.
Use the following steps to start monitoring vSphere using the Zenoss web interface.
Unlike other device types, it is possible to add the same vSphere Endpoint multiple times with the same IP but under different device IDs, which will result in duplicate data collection.
Alternatively you can use zenbatchload to add vSphere endpoints from the command line. To do this, you must create a file with contents similar to the following. Replace all values in angle brackets with your values minus the brackets. Multiple endpoints can be added under the same /Devices/vSphere or /Devices/vSphere/Subclass section.
/Devices/vSphere loader='VMware vSphere', loader_arg_keys=['title', 'hostname', 'username', 'password', 'ssl', 'collector']
vcenter1 hostname='<address>', username='<username>', password='<password>'
You can then load the endpoint(s) with the following command.
If you are using VMWare NSX to provide software defined networking functionality in your vSphere environment, each of your ESX hosts must be properly prepared and configured, in order to utilize NSX. This feature requires that you have both NSX and vSphere ZenPacks installed on your Zenoss instance.
In monitoring ESX host preparation, Zenoss must communicate directly with the ESX host, rather than through the vCenter endpoint. This requires additional configuration, detailed in the following steps.
The zVSphereHostCollectionClusterWhitelist is a list property which accepts multiple entries. The patterns entered are used in an OR pattern, rather than an AND pattern. That is to say, if a cluster matches ANY pattern, it will be monitored. This means that if you use the wildcard option to match all clusters, there is no reason to enter anything else in the zProperty. Remember also, this whitelist is empty by default. Thus, until values are entered here no hosts will be monitored.
The host preparation monitoring tracks the following values:
This monitoring is based on recommendations for troubleshooting connectivity on ESX hosts with NSX: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107951
The checks recommended in this article are performed by the host collection, and an event is generated if the checks fail, alerting your team and allowing them to take steps to resolve the issue.
When collecting data from ESX hosts, certain assumptions are made about your host configuration:
If any of these statements is not correct for your host, host data collection will not be possible, and this feature will not be usable for that host.
If you are installing this ZenPack on an existing Zenoss system or upgrading from an earlier Zenoss version you may have a ZenPack named ZenPacks.zenoss.ZenVMware already installed on your system. You can check this by navigating to Advanced -> ZenPacks.
This ZenPack functionally supersedes ZenPacks.zenoss.ZenVMware, but does not automatically migrate monitoring of your VMware vSphere resources when installed. The ZenPacks can coexist gracefully to allow you time to manually transition monitoring to the newer ZenPack with better capabilities.
Depending on how heavily loaded your vCenter and Zenoss server(s) are you may wish to avoid monitoring the same vSphere resources twice in parallel. If this is the case, you should use the following instructions to first remove the existing vSphere monitoring before adding the new monitoring.
If you're comfortable monitoring the vCenters twice in parallel for a time, you can simply follow the instructions under Adding vSphere Endpoint then delete the old vCenters from the /VMware device class once you're satisfied with the new monitoring.
Installing this ZenPack will add the following items to your Zenoss system.
If any issues are encountered with the functionality offered by this ZenPack, the following checklist should be followed to verify that all configurations are correct.
As of version 3.3.0, modeling for vSphere devices is performed continuously by the zenvsphere collector daemon. Because the modeling is handled by a daemon, it may not be performed immediately upon adding a new vSphere endpoint. This does not necessarily mean anything is wrong. Wait a few minutes and see if model data begins to populate.
If you do not see any data after waiting a few minutes, check the logs to see if there were any errors encountered during modelling, see Logging.
One common problem when adding a new vSphere endpoint is incorrect login details. In the logs, you should see something like this:
ERROR zen.PythonClient: Authentication error: [com.vmware.vim25.InvalidLoginFaultMsg] Cannot complete login due to an incorrect user name or password. Check `zVSphereEndpointUser` and `zVSphereEndpointPassword`.
If this occurs, you should verify and re-enter your vSphere login credentials, see Login Credentials.
If you see any other errors, consult the included documentation, this wiki, Zenoss Support, and whatever other resources you have at your disposal.
Normally, it should not be necessary to restart zenvsphere since any model updates will be received automatically from vSphere. But if you deem it necessary to restart the collector daemon, you can do so, see Daemon Restart. Please note the caveats outlined in that section before proceeding.
The vSphere ZenPack uses its own service or daemon which runs continuously in the background to collect modeling data, events, and performance data about your vSphere environment. Running as a separate service allows for greater flexibility in handling some constraints of the vSphere architecture and allows for greater performance.
Under normal circumstances, you should never need to restart the daemon. It automatically pulls in data about new devices and components, even changes zProperties and other configuration values. Additionally, this will cause a significant delay in data collection. Restarting the collector daemon closes the open session with the vSphere API, which means the collector has to start over the modelling data collection process over from the very beginning, rather than pulling incremental updates. In a smaller environment, this may not be a serious issue, but in complex vSphere environments, with a lot of components, this could be very time-consuming. Thus, if you restart the daemon in such an environment, expect to wait a while.
However, if the daemon is clearly malfunctioning, or you have otherwise determined it is necessary to force remodelling, you can restart the collector daemon using the steps described below.
If you are running Zenoss 5.0 or greater, the zenvsphere daemon runs in its own Docker container, managed by Control Center. To restart it, follow these steps:
If you are running an older version of Zenoss than 5.0, there is no Control Center and there are no Docker containers. Instead, follow these steps:
An often-overlooked Zenoss deployment issue is the synchronization of time between data sources and data collectors, especially when the collected data is timestamped at the source. It is not unusual for the hardware clocks in computers to drift by seconds per day, accumulating to significant error in just a week or two.
The Zenoss data collector for vSphere will adjust for time zone and clock skew differences every 2 hours (by comparing the server current time with the collector's) so even without NTP, events and graphs will have correct timestamps in Zenoss. If the clocks are not synchronized, however, the timestamps shown in zenoss and vSphere for the same event will not match, and this can cause considerable confusion.
On Zenoss 4, the log file can be found at $ZENHOME/log/<collectorname>/zenvsphere.log. On Zenoss 5, the log file is inside the zenvsphere container, and may be accessed as follows:
serviced service attach zenvsphere
By default only INFO and higher priority messages will be logged. To temporarily increase the logging level to include DEBUG messages you can run zenvsphere debug as the zenoss user without restarting the daemon. The next time it restarts, logging will resume at the preconfigured level. Alternatively you can run zenvsphere debug again to return logging to the preconfigured level.
On Zenoss 5, you must attach to the zenvsphere container before running 'zenvsphere debug', or alternatively, you may skip that step by using the command 'serviced service debug zenvsphere'
Note that if you have multiple collectors under Zenoss 5, you will need to specify the specific zenvsphere service ID rather than 'zenvsphere' in the serviced commands above.
The zenvsphere daemon spawns a java process named zenvmware-client as needed to communicate with vSphere endpoints. It may be possible that this java process encounter a problem that zenvsphere is unable to detect or recover from. If you're experiencing issues you can run pkill -9 -f zenvmware-client as the zenoss user to kill the java process. It will be automatically restarted.
pkill -9 -f zenvmware-client
On Zenoss 5, the java process runs within each zenvsphere container, so you may attach to the container (serviced service attach zenvsphere) and kill the process, or you may restart the entire service (which will restart both zenvsphere and the java subprocess) with the command serviced service restart zenvsphere. (If you have multiple zenvsphere containers, you will need to specify the service ID)
serviced service restart zenvsphere
For support purposes, an overall health report is generated for each vSphere device, indicating elapsed time and any relevant errors encountered during the last polling cycle for that device.
This report may be accessed by visiting the device in the Zenoss UI, and changing the URL from
Detailed interpretation of this report is best done by a Zenoss support engineer, however the header will show the "Last valid data collected" time and "Last perf collection task total elapsed time" fields.
If the last time is not recent, or the elapsed time is greater than 300 seconds, this may indicate that tuning is required.
Some key metric from the health report are now available in the device graphs. Simply navigate to your device in the web interface and click Graphs in the left navigation bar.
In vSphere 5.5U2d and 6.x, a limitation has been introduced by VMware on how many performance metrics may be queried at a time. Since Zenoss performs bulk queries of all monitored metrics every 5 minutes, it can run into this limit.
If it does, the query is automatically split in half and retried, so there should not be any functional problem, but it is less efficient, and can crowd the zenvsphere.log file with errors. (These errors will mention querySpec.size (5.5) or vpxd.stats.maxQueryMetrics (6.0))
The limit normally defaults to 64 on current versions of vSphere, and zenoss has a zproperty (zVSpherePerfQueryVcChunkSize) which should be set to the same, or lower, value to avoid these errors. This zProperty defaults to 64 as well, so it is not normally necessary to adjust it unless you see errors like those describe above.
For more details on this limit and how to change it, see https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107096
If you are having problems with Resource Pool graphs specifically, with the following symptoms:
This problem has been observed in at least some versions of vSphere 5.5 and 6.0.
The root cause these three specific datapoints (memOverhead, cpuEntitlement, and memEntitlement) on Resource Pools, vSphere does not properly calculate the number of metrics being queried.
For example, in vSphere, each Cluster has a top-level resource pool that contains all resources (sub-pools, and ultimately VMs) in the cluster. If we query for a single metric on that top-level pool, say, memEntitlement, instead of being counted as one metric, it is counted as one per VM in the entire cluster. That is, vSphere is expanding our one query into a bunch of internal queries, and counting them all against the maxQueryMetrics limit described above. In practice, this means that in any cluster that has more than 64 VMs, these three metrics will not be available on that top-level resource pool, because there is no way to query them without it being counted as more than 64 metrics.
The same problem applies to all resource pools that contain, directly or through sub-pools, more than 64 VMs.
The only workaround is to disable the vpxd.stats.maxQueryMetrics setting on the vSphere side, as described above, or to raise it to a very high number (under current versions of the zenpack, it would need to be a bit more than 3 times the number of VMs in the cluster - future versions will subdivide these queries further so that it only needs to be the number of VMs)
Note that there will also be gaps in the "CPU Usage" graph because the Zenoss collector will stop trying to collect any metrics from a given resource pool after it encounters these errors, for 20 minutes by default (configurable via zVSpherePerfRecoveryMinutes). It will then try again, and fail again. Occasional data may get through due to the way zenoss batches it queries, which can cause partial data to show up on affected resource pools.
If you are need these graphs to work on these pools, but are unable to raise or disable vpxd.stats.maxQueryMetrics, you may disable the memOverhead, cpuEntitlement, and memEntitlement datapoints in the ResourcePool monitoring template. This should avoid the problem.
As of version 3.3, the performance collection mechanism of the vSphere ZenPack is more optimized and self-tuning, and generally will not require adjustment. This is primarily due to the fact that the time window of data polled in each cycle is now dynamic, and is based on what data has already been collected. Therefore, rather than tuning the specific window as was done in previous versions of the zenpack using the (no longer supported) zVSpherePerfWindowSize option, the collector will automatically collect as much data as is available since the last time it collected each metric.
This, in combination with other changes, means that the collector is better able to compensate for large vCenter installations or slow response times.
However, a number of configuration variables are still available for advanced tuning:
When adjusting the chunk sizes, there may be tradeoffs to values that are too large or too small. On systems with large numbers of datastores and resource pools or vApps, it may be beneficial to raise zVSpherePerfQueryVcChunkSize. However, if vSphere 6.x is being used, zVSpherePerfQueryVcChunkSize must be 64 or less. This restriction can be lifted if vpxd.stats.maxQueryMetrics, an advanced vCenter configuration property, is also adjusted, as described at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107096
NOTE: As of version 3.3 of the vSphere ZenPack, the zVSpherePerfWindowSize configuration property is no longer used, and the meaning of zVSpherePerfQueryChunkSize has been changed slightly. It used to describe a number of query specifications (groups of metrics on a specific entity), and now it describes the number of metrics instead. This change was made to align its meaning with that of vpxd.stats.maxQueryMetrics.
Normally, the set of objects and attributes to be modeled and how they should be processed is predefined in this ZenPack. However, a limited amount of tuning is possible via the properties zVSphereModelIgnore and zVSphereModelCache.
zVSphereModelIgnore is used to suppress collection of an attribute or attributes, perhaps because it changes too frequently or is not of interest in a particular environment. Its value is a list of regular expressions which must match against a string of the format "<vmware managed resource class>:<vmware property name>" For example, "VirtualMachine:summary.guest.toolsStatus" would be matched by "toolsStatus" or "VirtualMachine.*toolsStatus".
NOTE: These are values are VMware classes and properties, not Zenoss classes and properties. They are generally similar, but there are exceptions. For a list of allowed values, consult $ZENHOME/ZenPacks/ZenPacks.zenoss.vSphere*/ZenPacks/zenoss/vSphere/gather_config.json
zVSphereModelCache is used to configure caching (change detection) for a specific set of properties. This is also defined as a set of regular expressions, only this time, the value is "<zenoss class name>:<zenoss property/method name>". It is especially rare that you would need to use this configuration option. It is specifically intended for situations where issues with the vSphere data model or API cause use to be repeatedly notified about an property which has not actually changed its value. When caching is turned on for the affected attribute, the collector will keep track of the prior value of the property, and will only notify zenhub if the value actually changes. This saves unnecessary load on zenhub. A list of properties that need this functionality are built into the collector, and this configuration option is only used to add additional properties to the list, should a new one be found. There is generally no benefit to using this caching feature where it is not needed, but no major downside either, other than increased memory usage on the collector. Should you find a property where this caching is needed, please notify Zenoss so that we may make this behavior the default in future versions of this ZenPack.
Advanced tuning of modeling to accomodate the addition very large vSphere instances to Zenoss, which may otherwise time out during their initial modeling may be performed using the zVSphereModelMpLevel, zVSphereModelMpObjs, zVSphereModelMpIndexObjs variables. At this time we do not recommend that these values be changed without assistance from Zenoss Support.
By default, the management IP reported by VMware for each host will be pinged. This may be disabled completely by setting zPingMonitorIgnore to true.
In some situations, a host may have multiple management IPs. In this case, the default is for the first one to be pinged. (according to the order that the physical NICs are reported for that host in the VMware API). If this default is not acceptable, the zVSphereHostPingBlacklist property may be used to filter out undesired IP addresses.
Its value is a list of regular expressions which must match against a string of the format "<hostname>:<pnic name>:<ip address>".
For example "esx1.zenoss.com:vmk0:10.0.2.3" could be matched by "esx1.zenoss.com", ":vmk0", "10\.0\.2\.3", or other patterns.
This can be used to filter out specific NICs (":vmk0:"), subnets (":10\.0\.2"), or hosts ("esx1.zenoss.com")
zVSphereHostPingBlacklist may be combined with zVSphereHostPingWhitelist to create exceptions, for instance, one could set zVSphereHostPingBlacklist to ignore vmk1 on all hosts, but then zVSphereHostPingWhitelist to make an exception fo a specific hostname or subnet.
One useful way to combine these is to set the zVSphereHostPingBlacklist to ".*" (that is, disable all ping monitoring), and then specifically enable your management subnets in zVSphereHostPingWhitelist, one pattern for each subnet.
Note that since these are regular expressions, any "." in the IP address should be escaped with a backslash, as shown above. If this is not done, the "." will match any character.
This ZenPack provides additional support for Zenoss Analytics. Perform the following steps to install extra reporting resources into Zenoss Analytics after installing the ZenPack.
You can now navigate back to the vSphere ZenPack folder in the repository to see the resources added by the bundle.
The vSphere Domain can be used to create ad hoc views using the following steps.
Analytics stores history for devices and components. In some cases this may cause reports to show 2x/3x/... larger data. Turning off "deleted dimensions" may help.
To change this setting permanently in the data warehouse to not keep deleted dimensions around you can do the following on your analytics server:
If you want to test this out before making the setting permanent to see it's impact in your environment before you make the change permanent, you can execute the following command on your analytics server:
mysql -u root reporting -e "call remove_deleted_dimensions;"
Note that you will also need to "blow out" all the jaspersoft ad hoc caches that cache results of queries to see the impact of changing the data in the database in your view.
To do this:
When upgrading from 3.5.x to a newer version, a message such as "ERROR Monitoring template /vSphere/LUN has been modified since the ZenPacks.zenoss.vSphere ZenPack was installed. These local changes will be lost as this ZenPack is upgraded or reinstalled. Existing template will be renamed to 'LUN-upgrade-1484782049'. Please review and reconcile local changes:" may be displayed.
If the only difference shown is the 'component' field on the diskReadRequests datasource and a change in ordering of some other properties on the template, this may be disregarded, and the LUN-upgrade-<number> template may be deleted if desired.
In addition, as described in the Resource Manager Administration Guide, all zenoss services except zeneventserver should be stopped prior to installing or upgrading any zenpack. This is particularly *critical* with this upgrade. Failure to stop the services may cause migration steps required for this upgrade to fail.
overall polling cycle time.
View the discussion thread.
This ZenPack is developed and supported by Zenoss Inc. Commercial ZenPacks are available to Zenoss commercial customers only. Contact Zenoss to request more information regarding this or any other ZenPacks. Click here to view all available Zenoss Commercial ZenPacks.